Human Vision
human vision
Real-world scene perception and search from foveal to peripheral vision
A high-resolution central fovea is a prominent design feature of human vision. But how important is the fovea for information processing and gaze guidance in everyday visual-cognitive tasks? Following on from classic findings for sentence reading, I will present key results from a series of eye-tracking experiments in which observers had to search for a target object within static or dynamic images of real-world scenes. Gaze-contingent scotomas were used to selectively deny information processing in the fovea, parafovea, or periphery. Overall, the results suggest that foveal vision is less important and peripheral vision is more important for scene perception and search than previously thought. The importance of foveal vision was found to depend on the specific requirements of the task. Moreover, the data support a central-peripheral dichotomy in which peripheral vision selects and central vision recognizes.
Coarse-to-fine information integration in human vision
Demystifying the richness of visual perception
Human vision is full of puzzles. Observers can grasp the essence of a scene in an instant, yet when probed for details they are at a loss. People have trouble finding their keys, yet they may be quite visible once found. How does one explain this combination of marvelous successes with quirky failures? I will describe our attempts to develop a unifying theory that brings a satisfying order to multiple phenomena. One key is to understand peripheral vision. A visual system cannot process everything with full fidelity, and therefore must lose some information. Peripheral vision must condense a mass of information into a succinct representation that nonetheless carries the information needed for vision at a glance. We have proposed that the visual system deals with limited capacity in part by representing its input in terms of a rich set of local image statistics, where the local regions grow — and the representation becomes less precise — with distance from fixation. This scheme trades off computation of sophisticated image features at the expense of spatial localization of those features. What are the implications of such an encoding scheme? Critical to our understanding has been the use of methodologies for visualizing the equivalence classes of the model. These visualizations allow one to quickly see that many of the puzzles of human vision may arise from a single encoding mechanism. They have suggested new experiments and predicted unexpected phenomena. Furthermore, visualization of the equivalence classes has facilitated the generation of testable model predictions, allowing us to study the effects of this relatively low-level encoding on a wide range of higher-level tasks. Peripheral vision helps explain many of the puzzles of vision, but some remain. By examining the phenomena that cannot be explained by peripheral vision, we gain insight into the nature of additional capacity limits in vision. In particular, I will suggest that decision processes face general-purpose limits on the complexity of the tasks they can perform at a given time.
Neural mechanisms of active vision in the marmoset monkey
Human vision relies on rapid eye movements (saccades) 2-3 times every second to bring peripheral targets to central foveal vision for high resolution inspection. This rapid sampling of the world defines the perception-action cycle of natural vision and profoundly impacts our perception. Marmosets have similar visual processing and eye movements as humans, including a fovea that supports high-acuity central vision. Here, I present a novel approach developed in my laboratory for investigating the neural mechanisms of visual processing using naturalistic free viewing and simple target foraging paradigms. First, we establish that it is possible to map receptive fields in the marmoset with high precision in visual areas V1 and MT without constraints on fixation of the eyes. Instead, we use an off-line correction for eye position during foraging combined with high resolution eye tracking. This approach allows us to simultaneously map receptive fields, even at the precision of foveal V1 neurons, while also assessing the impact of eye movements on the visual information encoded. We find that the visual information encoded by neurons varies dramatically across the saccade to fixation cycle, with most information localized to brief post-saccadic transients. In a second study we examined if target selection prior to saccades can predictively influence how foveal visual information is subsequently processed in post-saccadic transients. Because every saccade brings a target to the fovea for detailed inspection, we hypothesized that predictive mechanisms might prime foveal populations to process the target. Using neural decoding from laminar arrays placed in foveal regions of area MT, we find that the direction of motion for a fixated target can be predictively read out from foveal activity even before its post-saccadic arrival. These findings highlight the dynamic and predictive nature of visual processing during eye movements and the utility of the marmoset as a model of active vision. Funding sources: NIH EY030998 to JM, Life Sciences Fellowship to JY
Human color perception and double-opponent cells in V1 cortex
Top-down Modulation in Human Visual Cortex
Human vision flaunts a remarkable ability to recognize objects in the surrounding environment even in the absence of complete visual representation of these objects. This process is done almost intuitively and it was not until scientists had to tackle this problem in computer vision that they noticed its complexity. While current advances in artificial vision systems have made great strides exceeding human level in normal vision tasks, it has yet to achieve a similar robustness level. One cause of this robustness is the extensive connectivity that is not limited to a feedforward hierarchical pathway similar to the current state-of-the-art deep convolutional neural networks but also comprises recurrent and top-down connections. They allow the human brain to enhance the neural representations of degraded images in concordance with meaningful representations stored in memory. The mechanisms by which these different pathways interact are still not understood. In this seminar, studies concerning the effect of recurrent and top-down modulation on the neural representations resulting from viewing blurred images will be presented. Those studies attempted to uncover the role of recurrent and top-down connections in human vision. The results presented challenge the notion of predictive coding as a mechanism for top-down modulation of visual information during natural vision. They show that neural representation enhancement (sharpening) appears to be a more dominant process of different levels of visual hierarchy. They also show that inference in visual recognition is achieved through a Bayesian process between incoming visual information and priors from deeper processing regions in the brain.
Human reconstruction of local image structure from natural scenes
Retinal projections often poorly represent the structure of the physical world: well-defined boundaries within the eye may correspond to irrelevant features of the physical world, while critical features of the physical world may be nearly invisible at the retinal projection. Visual cortex is equipped with specialized mechanisms for sorting these two types of features according to their utility in interpreting the scene, however we know little or nothing about their perceptual computations. I will present novel paradigms for the characterization of these processes in human vision, alongside examples of how the associated empirical results can be combined with targeted models to shape our understanding of the underlying perceptual mechanisms. Although the emerging view is far from complete, it challenges compartmentalized notions of bottom-up/top-down object segmentation, and suggests instead that these two modes are best viewed as an integrated perceptual mechanism.
Blindspots in Computer Vision - How can neuroscience guide AI?
Scientists have worked to recreate human vision in computers for the past 50 years. But how much about human vision do we actually know? And can the brain be useful in furthering computer vision? This talk will take a look at the similarities and differences between (modern) computer vision and human vision, as well as the important crossovers, collaborations, and applications that define the interface between computational neuroscience and computer vision. If you want to know more about how the brain sees (really sees), how computer vision developments are inspired by the brain, or how to apply AI to neuroscience, this talk is for you.
Inferring the order of stable and context dependent perceptual biases in human vision
COSYNE 2023