Resources
Authors & Affiliations
Bahareh Tolooshams, Yuelin Shi, Anima Anandkumar, Doris Tsao
Abstract
An important open question in vision neuroscience is how the brain integrates prior knowledge with sensory data to perceive the world. The Bayesian brain hypothesis suggests that the brain performs posterior inference based on an internal model to construct representations that explain the stimulus. Indeed, there is support for this hypothesis in vision. Vision is not a passive process; rather, it actively recovers the features of an object from the signal cast on the retina. Even when faced with a cluttered scene of overlapping objects, the visual system quickly parses positional relationships and presents a coherent visual world. The goal of the visual system seems to be inferring the causes of any visual representation and conveying the most plausible hypothesis to our consciousness. While the literature offers substantial support for this generative-based inference hypothesis in the brain, there is a scarcity of deep generative-based learning approaches to study it. We propose studying vision through diffusion-based deep generative networks, which model the recurrent activity of neural circuits to sample from distributions. This approach a) provides temporal dynamics, b) uses a generative model to iteratively sample from a learned distribution of face images, and c) employs a feedback mechanism to correct neural representations and better explain sensory data. We explore how the brain removes degradations from occluded faces and discuss the role of feedback. Additionally, we draw parallels between artificial and biological neural dynamics in the Macaque face patch system in the inferotemporal cortex. For instance, degraded images, as opposed to clear face images, cause a delay in face cell responses; similarly, we observed that facial information in the artificial network's feedback is delayed when degradation is present. Our results aim to lay the foundation of brain studies with generative deep learning.