Gaze
gaze
Chris Reinke
The internship aims to develop a controller for a social mobile robot to have a conversation with people using large language models (LLMs) such as ChatGPT. The internship is part of the European SPRING project, which aims to develop mobile robots for healthcare environments. The intern will develop a controller (Python, ROS) for ARI, the social robot. The controller will navigate towards a human (or group), have a conversation with them, and leave the conversation. The intern will use existing components from the SPRING project such as mapping and localization of the robot and humans, human-aware navigation, speech recognition, and a simple dialogue system based on ChatGPT. The intern will also investigate how to optimally use LLMs such as ChatGPT for natural and comfortable conversation with the robot, for example, by using prompt engineering. The intern will have the chance to develop and implement their own ideas to improve the conversation with the robot, for example, by investigating gaze, gestures, or emotions.
Jean-Marc Odobez
We invite applications for key roles in the TUNASBE project, funded by the Swiss SNSF, which aims to improve AI’s ability to understand human states, activities, and complex social behaviors in natural settings. The project focuses on interpreting non-verbal cues—such as gaze, gestures, and attention—through multi-task learning models that capture the subtleties of human interactions. The project will be supported by two PhD students and one postdoctoral researcher, to lead the design and development of computational models, focusing on gaze, attention, and multi-cue integration for human-centric computer vision analysis of social behaviors at the subject and scene levels in general settings.
Go with the visual flow: circuit mechanisms for gaze control during locomotion
“Development and application of gaze control models for active perception”
Gaze shifts in humans serve to direct high-resolution vision provided by the fovea towards areas in the environment. Gaze can be considered a proxy for attention or indicator of the relative importance of different parts of the environment. In this talk, we discuss the development of generative models of human gaze in response to visual input. We discuss how such models can be learned, both using supervised learning and using implicit feedback as an agent interacts with the environment, the latter being more plausible in biological agents. We also discuss two ways such models can be used. First, they can be used to improve the performance of artificial autonomous systems, in applications such as autonomous navigation. Second, because these models are contingent on the human’s task, goals, and/or state in the context of the environment, observations of gaze can be used to infer information about user intent. This information can be used to improve human-machine and human robot interaction, by making interfaces more anticipative. We discuss example applications in gaze-typing, robotic tele-operation and human-robot interaction.
Stability of visual processing in passive and active vision
The visual system faces a dual challenge. On the one hand, features of the natural visual environment should be stably processed - irrespective of ongoing wiring changes, representational drift, and behavior. On the other hand, eye, head, and body motion require a robust integration of pose and gaze shifts in visual computations for a stable perception of the world. We address these dimensions of stable visual processing by studying the circuit mechanism of long-term representational stability, focusing on the role of plasticity, network structure, experience, and behavioral state while recording large-scale neuronal activity with miniature two-photon microscopy.
Sensory Consequences of Visual Actions
We use rapid eye, head, and body movements to extract information from a new part of the visual scene upon each new gaze fixation. But the consequences of such visual actions go beyond their intended sensory outcomes. On the one hand, intrinsic consequences accompany movement preparation as covert internal processes (e.g., predictive changes in the deployment of visual attention). On the other hand, visual actions have incidental consequences, side effects of moving the sensory surface to its intended goal (e.g., global motion of the retinal image during saccades). In this talk, I will present studies in which we investigated intrinsic and incidental sensory consequences of visual actions and their sensorimotor functions. Our results provide insights into continuously interacting top-down and bottom-up sensory processes, and they reify the necessity to study perception in connection to motor behavior that shapes its fundamental processes.
Direction-selective ganglion cells in primate retina: a subcortical substrate for reflexive gaze stabilization?
To maintain a stable and clear image of the world, our eyes reflexively follow the direction in which a visual scene is moving. Such gaze stabilization mechanisms reduce image blur as we move in the environment. In non-primate mammals, this behavior is initiated by ON-type direction-selective ganglion cells (ON-DSGCs), which detect the direction of image motion and transmit signals to brainstem nuclei that drive compensatory eye movements. However, ON-DSGCs have not yet been functionally identified in primates, raising the possibility that the visual inputs that drive this behavior instead arise in the cortex. In this talk, I will present molecular, morphological and functional evidence for identification of an ON-DSGC in macaque retina. The presence of ON-DSGCs highlights the need to examine the contribution of subcortical retinal mechanisms to normal and aberrant gaze stabilization in the developing and mature visual system. More generally, our findings demonstrate the power of a multimodal approach to study sparsely represented primate RGC types.
Real-world scene perception and search from foveal to peripheral vision
A high-resolution central fovea is a prominent design feature of human vision. But how important is the fovea for information processing and gaze guidance in everyday visual-cognitive tasks? Following on from classic findings for sentence reading, I will present key results from a series of eye-tracking experiments in which observers had to search for a target object within static or dynamic images of real-world scenes. Gaze-contingent scotomas were used to selectively deny information processing in the fovea, parafovea, or periphery. Overall, the results suggest that foveal vision is less important and peripheral vision is more important for scene perception and search than previously thought. The importance of foveal vision was found to depend on the specific requirements of the task. Moreover, the data support a central-peripheral dichotomy in which peripheral vision selects and central vision recognizes.
Development and evolution of neuronal connectivity
In most animal species including humans, commissural axons connect neurons on the left and right side of the nervous system. In humans, abnormal axon midline crossing during development causes a whole range of neurological disorders ranging from congenital mirror movements, horizontal gaze palsy, scoliosis or binocular vision deficits. The mechanisms which guide axons across the CNS midline were thought to be evolutionary conserved but our recent results suggesting that they differ across vertebrates. I will discuss the evolution of visual projection laterality during vertebrate evolution. In most vertebrates, camera-style eyes contain retinal ganglion cell (RGC) neurons projecting to visual centers on both sides of the brain. However, in fish, RGCs are thought to only innervate the contralateral side. Using 3D imaging and tissue clearing we found that bilateral visual projections exist in non-teleost fishes. We also found that the developmental program specifying visual system laterality differs between fishes and mammals. We are currently using various strategies to discover genes controlling the development of visual projections. I will also present ongoing work using 3D imaging techniques to study the development of the visual system in human embryo.
The role of top-down mechanisms in gaze perception
Humans, as a social species, have an increased ability to detect and perceive visual elements involved in social exchanges, such as faces and eyes. The gaze, in particular, conveys information crucial for social interactions and social cognition. Researchers have hypothesized that in order to engage in dynamic face-to-face communication in real time, our brains must quickly and automatically process the direction of another person's gaze. There is evidence that direct gaze improves face encoding and attention capture and that direct gaze is perceived and processed more quickly than averted gaze. These results are summarized as the "direct gaze effect". However, in the recent literature, there is evidence to suggest that the mode of visual information processing modulates the direct gaze effect. In this presentation, I argue that top-down processing, and specifically the relevance of eye features to the task, promotes the early preferential processing of direct versus indirect gaze. On the basis of several recent evidences, I propose that low task relevance of eye features will prevent differences in eye direction processing between gaze directions because its encoding will be superficial. Differential processing of direct and indirect gaze will only occur when the eyes are relevant to the task. To assess the implication of task relevance on the temporality of cognitive processing, we will measure event-related potentials (ERPs) in response to facial stimuli. In this project, instead of typical ERP markers such as P1, N170 or P300, we will measure lateralized ERPs (lERPS) such as lateralized N170 and N2pc, which are markers of early face encoding and attentional deployment respectively. I hypothesize that the relevance of the eye feature task is crucial in the direct gaze effect and propose to revisit previous studies, which had questioned the existence of the direct gaze effect. This claim will be illustrate with different past studies and recent preliminary data of my lab. Overall, I propose a systematic evaluation of the role of top-down processing in early direct gaze perception in order to understand the impact of context on gaze perception and, at a larger scope, on social cognition.
Visual Decisions in Natural Action
Natural behavior reveals the way that gaze serves the needs of the current task, and the complex cognitive control mechanisms that are involved. It has become increasingly clear that even the simplest actions involve complex decision processes that depend on an interaction of visual information, knowledge of the current environment, and the intrinsic costs and benefits of actions choices. I will explore these ideas in the context of walking in natural terrain, where we are able to recover the 3D structure of the visual environment. We show that subjects choose flexible paths that depend on the flatness of the terrain over the next few steps. Subjects trade off flatness with straightness of their paths towards the goal, indicating a nuanced trade-off between stability and energetic costs on both the time scale of the next step and longer-range constraints.
What are you looking at? Adventures in human gaze behaviour
Exploring fine detail: The interplay of attention, oculomotor behavior and visual perception in the fovea
Outside the foveola, visual acuity and other visual functions gradually deteriorate with increasing eccentricity. Humans compensate for these limitations by relying on a tight link between perception and action; rapid gaze shifts (saccades) occur 2-3 times every second, separating brief “fixation” intervals in which visual information is acquired and processed. During fixation, however, the eye is not immobile. Small eye movements incessantly shift the image on the retina even when the attended stimulus is already foveated, suggesting a much deeper coupling between visual functions and oculomotor activity. Thanks to a combination of techniques allowing for high-resolution recordings of eye position, retinal stabilization, and accurate gaze localization, we examined how attention and eye movements are controlled at this scale. We have shown that during fixation, visual exploration of fine spatial detail unfolds following visuomotor strategies similar to those occurring at a larger scale. This behavior compensates for non-homogenous visual capabilities within the foveola and is finely controlled by attention, which facilitates processing at selected foveal locations. Ultimately, the limits of high acuity vision are greatly influenced by the spatiotemporal modulations introduced by fixational eye movements. These findings reveal that, contrary to common intuition, placing a stimulus within the foveola is necessary but not sufficient for high visual acuity; fine spatial vision is the outcome of an orchestrated synergy of motor, cognitive, and attentional factors.
Is it Autism or Alexithymia? explaining atypical socioemotional processing
Emotion processing is thought to be impaired in autism and linked to atypical visual exploration and arousal modulation to others faces and gaze, yet evidence is equivocal. We propose that, where observed, atypical socioemotional processing is due to alexithymia, a distinct but frequently co-occurring condition which affects emotional self-awareness and Interoception. In study 1 (N = 80), we tested this hypothesis by studying the spatio-temporal dynamics and entropy of eye-gaze during emotion processing tasks. Evidence from traditional and novel methods revealed that atypical eye-gaze and emotion recognition is best predicted by alexithymia in both autistic and non-autistic individuals. In Study 2 (N = 70), we assessed interoceptive and autonomic signals implicated in socioemotional processing, and found evidence for alexithymia (not autism) driven effects on gaze and arousal modulation to emotions. We also conducted two large-scale studies (N = 1300), using confirmatory factor-analytic and network modelling and found evidence that Alexithymia and Autism are distinct at both a latent level and their intercorrelations. We argue that: 1) models of socioemotional processing in autism should conceptualise difficulties as intrinsic to alexithymia, and 2) assessment of alexithymia is crucial for diagnosis and personalised interventions in autism.
A new computational framework for understanding vision in our brain
Visual attention selects only a tiny fraction of visual input information for further processing. Selection starts in the primary visual cortex (V1), which creates a bottom-up saliency map to guide the fovea to selected visual locations via gaze shifts. This motivates a new framework that views vision as consisting of encoding, selection, and decoding stages, placing selection on center stage. It suggests a massive loss of non-selected information from V1 downstream along the visual pathway. Hence, feedback from downstream visual cortical areas to V1 for better decoding (recognition), through analysis-by- synthesis, should query for additional information and be mainly directed at the foveal region. Accordingly, non-foveal vision is not only poorer in spatial resolution, but also more susceptible to many illusions.
Vision in dynamically changing environments
Many visual systems can process information in dynamically changing environments. In general, visual perception scales with changes in the visual stimulus, or contrast, irrespective of background illumination. This is achieved by adaptation. However, visual perception is challenged when adaptation is not fast enough to deal with sudden changes in overall illumination, for example when gaze follows a moving object from bright sunlight into a shaded area. We have recently shown that the visual system of the fly found a solution by propagating a corrective luminance-sensitive signal to higher processing stages. Using in vivo two-photon imaging and behavioural analyses we showed that distinct OFF-pathway inputs encode contrast and luminance. The luminance-sensitive pathway is particularly required when processing visual motion in contextual dim light, when pure contrast sensitivity underestimates the salience of a stimulus. Recent work in the lab has addressed the question how two visual pathways obtain such fundamentally different sensitivities, given common photoreceptor input. We are furthermore currently working out the network-based strategies by which luminance- and contrast-sensitive signals are combined to guide appropriate visual behaviour. Together, I will discuss the molecular, cellular, and circuit mechanisms that ensure contrast computation, and therefore robust vision, in fast changing visual scenes.
Mutual gaze with a robot influences social decision-making
COSYNE 2022
Mutual gaze with a robot influences social decision-making
COSYNE 2022
A dynamic sequence of visual processing initiated by gaze shifts
COSYNE 2023
Flexible reconfiguration of visual working memory across gaze shifts
COSYNE 2025
Strategic and dynamic use of social gaze for successful cooperation in marmoset dyads
COSYNE 2025
Analysis of gaze control neuronal circuits combining behavioural experiments with a novel virtual reality platform
FENS Forum 2024
Behavioral and electrophysiological characteristics of real-world head movement during gaze shift in humans
FENS Forum 2024
Exploring gaze movements in lampreys: Insights into vertebrate neural mechanisms for stabilizing and goal-oriented eye movements
FENS Forum 2024
Use of high-tech eye gaze augmentative and alternative communication system to enhance communication and quality of life in multiple sclerosis: A single case study
FENS Forum 2024