Gaze Shifts
gaze shifts
“Development and application of gaze control models for active perception”
Gaze shifts in humans serve to direct high-resolution vision provided by the fovea towards areas in the environment. Gaze can be considered a proxy for attention or indicator of the relative importance of different parts of the environment. In this talk, we discuss the development of generative models of human gaze in response to visual input. We discuss how such models can be learned, both using supervised learning and using implicit feedback as an agent interacts with the environment, the latter being more plausible in biological agents. We also discuss two ways such models can be used. First, they can be used to improve the performance of artificial autonomous systems, in applications such as autonomous navigation. Second, because these models are contingent on the human’s task, goals, and/or state in the context of the environment, observations of gaze can be used to infer information about user intent. This information can be used to improve human-machine and human robot interaction, by making interfaces more anticipative. We discuss example applications in gaze-typing, robotic tele-operation and human-robot interaction.
Stability of visual processing in passive and active vision
The visual system faces a dual challenge. On the one hand, features of the natural visual environment should be stably processed - irrespective of ongoing wiring changes, representational drift, and behavior. On the other hand, eye, head, and body motion require a robust integration of pose and gaze shifts in visual computations for a stable perception of the world. We address these dimensions of stable visual processing by studying the circuit mechanism of long-term representational stability, focusing on the role of plasticity, network structure, experience, and behavioral state while recording large-scale neuronal activity with miniature two-photon microscopy.
Exploring fine detail: The interplay of attention, oculomotor behavior and visual perception in the fovea
Outside the foveola, visual acuity and other visual functions gradually deteriorate with increasing eccentricity. Humans compensate for these limitations by relying on a tight link between perception and action; rapid gaze shifts (saccades) occur 2-3 times every second, separating brief “fixation” intervals in which visual information is acquired and processed. During fixation, however, the eye is not immobile. Small eye movements incessantly shift the image on the retina even when the attended stimulus is already foveated, suggesting a much deeper coupling between visual functions and oculomotor activity. Thanks to a combination of techniques allowing for high-resolution recordings of eye position, retinal stabilization, and accurate gaze localization, we examined how attention and eye movements are controlled at this scale. We have shown that during fixation, visual exploration of fine spatial detail unfolds following visuomotor strategies similar to those occurring at a larger scale. This behavior compensates for non-homogenous visual capabilities within the foveola and is finely controlled by attention, which facilitates processing at selected foveal locations. Ultimately, the limits of high acuity vision are greatly influenced by the spatiotemporal modulations introduced by fixational eye movements. These findings reveal that, contrary to common intuition, placing a stimulus within the foveola is necessary but not sufficient for high visual acuity; fine spatial vision is the outcome of an orchestrated synergy of motor, cognitive, and attentional factors.
A new computational framework for understanding vision in our brain
Visual attention selects only a tiny fraction of visual input information for further processing. Selection starts in the primary visual cortex (V1), which creates a bottom-up saliency map to guide the fovea to selected visual locations via gaze shifts. This motivates a new framework that views vision as consisting of encoding, selection, and decoding stages, placing selection on center stage. It suggests a massive loss of non-selected information from V1 downstream along the visual pathway. Hence, feedback from downstream visual cortical areas to V1 for better decoding (recognition), through analysis-by- synthesis, should query for additional information and be mainly directed at the foveal region. Accordingly, non-foveal vision is not only poorer in spatial resolution, but also more susceptible to many illusions.
A dynamic sequence of visual processing initiated by gaze shifts
COSYNE 2023
Flexible reconfiguration of visual working memory across gaze shifts
COSYNE 2025