TALK DETAILS VIDEO RECORDING DOI QR CODE RELATED TALKS SEARCH POSTERS

TALK DETAILS

Reconstruction-guided attention improves the robustness and shape processing of neural networks

Seoyoung Ahn — Hossein Adeli, Gregory Zelinsky

Show Affils
First Author
► Seoyoung Ahn — Stony Brook University

Contributors
► Hossein Adeli — Stony Brook University
► Gregory Zelinsky — Stony Brook University
28 September 2022
Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and used it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution digit recognition dataset, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. Our model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedforward CNNs and other adversarially trained networks. Our model is particularly robust to corruptions such as blur, noise, and occlusion, where shape perception plays an important role. Lesion studies further reveal two complementary roles of spatial and feature-based attention in robust object recognition, with the former largely consistent with spatial masking benefits in the attention literature (the reconstruction serves as a mask) and the latter mainly contributing to the model's inference speed (i.e., number of time steps to reach a certain confidence threshold) by reducing the space of possible object hypotheses. We also observed that the model's reconstruction engine sometimes hallucinates a non-existing pattern out of noise, leading to highly interpretable human-like errors. Our study shows that modeling reconstruction-based feedback endows AI systems with a powerful attention mechanism, which will in turn help us to understand the role of generated reconstructions in human visual processing.
doi.org/10.57736/nmc-c17b-d9f4📋

VIDEO RECORDING

QR CODE

TALKS YOU MIGHT BE INTERESTED IN

📃 Differential representation of natural and manmade images in the human ventral visual stream
Mrugsen Nagsen Gopnarayan — Deeksha Rathore, Fabio Bauer, Jasper Hilliard, Prerita Chawla, Raffe Sharif
📃 Saccade Mechanisms for Image Classification, Object Detection and Tracking
Zachary Daniels — Saurabh Farkya, Zachary Daniels, Aswin Nadamuni Raghavan, David Zhang, Michael Piacentino
📃 Neurocomputational underpinnings of predictive perception
Luca Tarasi — Giuseppe di Pellegrino, Vincenzo Romei
📃 Comparing CNNs and the brain: sensitivity to images altered in the frequency domain
Alexander Claman — Xu Pan, Vanessa Aguiar-Pulido, Odelia Schwartz