Visual Input
visual input
“Development and application of gaze control models for active perception”
Gaze shifts in humans serve to direct high-resolution vision provided by the fovea towards areas in the environment. Gaze can be considered a proxy for attention or indicator of the relative importance of different parts of the environment. In this talk, we discuss the development of generative models of human gaze in response to visual input. We discuss how such models can be learned, both using supervised learning and using implicit feedback as an agent interacts with the environment, the latter being more plausible in biological agents. We also discuss two ways such models can be used. First, they can be used to improve the performance of artificial autonomous systems, in applications such as autonomous navigation. Second, because these models are contingent on the human’s task, goals, and/or state in the context of the environment, observations of gaze can be used to infer information about user intent. This information can be used to improve human-machine and human robot interaction, by making interfaces more anticipative. We discuss example applications in gaze-typing, robotic tele-operation and human-robot interaction.
Restoring Sight to the Blind: Effects of Structural and Functional Plasticity
Visual restoration after decades of blindness is now becoming possible by means of retinal and cortical prostheses, as well as emerging stem cell and gene therapeutic approaches. After restoring visual perception, however, a key question remains. Are there optimal means and methods for retraining the visual cortex to process visual inputs, and for learning or relearning to “see”? Up to this point, it has been largely assumed that if the sensory loss is visual, then the rehabilitation focus should also be primarily visual. However, the other senses play a key role in visual rehabilitation due to the plastic repurposing of visual cortex during blindness by audition and somatosensation, and also to the reintegration of restored vision with the other senses. I will present multisensory neuroimaging results, cortical thickness changes, as well as behavioral outcomes for patients with Retinitis Pigmentosa (RP), which causes blindness by destroying photoreceptors in the retina. These patients have had their vision partially restored by the implantation of a retinal prosthesis, which electrically stimulates still viable retinal ganglion cells in the eye. Our multisensory and structural neuroimaging and behavioral results suggest a new, holistic concept of visual rehabilitation that leverages rather than neglects audition, somatosensation, and other sensory modalities.
The development of visual experience
Vision and visual cognition is experience-dependent with likely multiple sensitive periods, but we know very little about statistics of visual experience at the scale of everyday life and how they might change with development. By traditional assumptions, the world at the massive scale of daily life presents pretty much the same visual statistics to all perceivers. I will present an overview our work on ego-centric vision showing that this is not the case. The momentary image received at the eye is spatially selective, dependent on the location, posture and behavior of the perceiver. If a perceiver’s location, possible postures and/or preferences for looking at some kinds of scenes over others are constrained, then their sampling of images from the world and thus the visual statistics at the scale of daily life could be biased. I will present evidence with respect to both low-level and higher level visual statistics about the developmental changes in the visual input over the first 18 months post-birth.
Direction-selective ganglion cells in primate retina: a subcortical substrate for reflexive gaze stabilization?
To maintain a stable and clear image of the world, our eyes reflexively follow the direction in which a visual scene is moving. Such gaze stabilization mechanisms reduce image blur as we move in the environment. In non-primate mammals, this behavior is initiated by ON-type direction-selective ganglion cells (ON-DSGCs), which detect the direction of image motion and transmit signals to brainstem nuclei that drive compensatory eye movements. However, ON-DSGCs have not yet been functionally identified in primates, raising the possibility that the visual inputs that drive this behavior instead arise in the cortex. In this talk, I will present molecular, morphological and functional evidence for identification of an ON-DSGC in macaque retina. The presence of ON-DSGCs highlights the need to examine the contribution of subcortical retinal mechanisms to normal and aberrant gaze stabilization in the developing and mature visual system. More generally, our findings demonstrate the power of a multimodal approach to study sparsely represented primate RGC types.
The Effects of Negative Emotions on Mental Representation of Faces
Face detection is an initial step of many social interactions involving a comparison between a visual input and a mental representation of faces, built from previous experience. Whilst emotional state was found to affect the way humans attend to faces, little research has explored the effects of emotions on the mental representation of faces. Here, we examined the specific perceptual modulation of geometric properties of the mental representations associated with state anxiety and state depression on face detection, and to compare their emotional expression. To this end, we used an adaptation of the reverse correlation technique inspired by Gosselin and Schyns’, (2003) ‘Superstitious Approach’, to construct visual representations of observers’ mental representations of faces and to relate these to their mental states. In two sessions, on separate days, participants were presented with ‘colourful’ noise stimuli and asked to detect faces, which they were told were present. Based on the noise fragments that were identified as faces, we reconstructed the pictorial mental representation utilised by each participant in each session. We found a significant correlation between the size of the mental representation of faces and participants’ level of depression. Our findings provide a preliminary insight about the way emotions affect appearance expectation of faces. To further understand whether the facial expressions of participants’ mental representations reflect their emotional state, we are conducting a validation study with a group of naïve observers who are asked to classify the reconstructed face images by emotion. Thus, we assess whether the faces communicate participants’ emotional states to others.
Probabilistic computation in natural vision
A central goal of vision science is to understand the principles underlying the perception and neural coding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a rich spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception. In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we are leveraging this framework to develop deep probabilistic algorithms for natural image and video segmentation.
Implementing structure mapping as a prior in deep learning models for abstract reasoning
Building conceptual abstractions from sensory information and then reasoning about them is central to human intelligence. Abstract reasoning both relies on, and is facilitated by, our ability to make analogies about concepts from known domains to novel domains. Structure Mapping Theory of human analogical reasoning posits that analogical mappings rely on (higher-order) relations and not on the sensory content of the domain. This enables humans to reason systematically about novel domains, a problem with which machine learning (ML) models tend to struggle. We introduce a two-stage neural net framework, which we label Neural Structure Mapping (NSM), to learn visual analogies from Raven's Progressive Matrices, an abstract visual reasoning test of fluid intelligence. Our framework uses (1) a multi-task visual relationship encoder to extract constituent concepts from raw visual input in the source domain, and (2) a neural module net analogy inference engine to reason compositionally about the inferred relation in the target domain. Our NSM approach (a) isolates the relational structure from the source domain with high accuracy, and (b) successfully utilizes this structure for analogical reasoning in the target domain.
Keeping visual cortex in the back of your mind: From visual inputs to behavior and memory
A novel form of retinotopy in area V2 highlights location-dependent feature selectivity in the visual system
Topographic maps are a prominent feature of brain organization, reflecting local and large-scale representation of the sensory surface. Traditionally, such representations in early visual areas are conceived as retinotopic maps preserving ego-centric retinal spatial location while ensuring that other features of visual input are uniformly represented for every location in space. I will discuss our recent findings of a striking departure from this simple mapping in the secondary visual area (V2) of the tree shrew that is best described as a sinusoidal transformation of the visual field. This sinusoidal topography is ideal for achieving uniform coverage in an elongated area like V2 as predicted by mathematical models designed for wiring minimization, and provides a novel explanation for stripe-like patterns of intra-cortical connections and functional response properties in V2. Our findings suggest that cortical circuits flexibly implement solutions to sensory surface representation, with dramatic consequences for large-scale cortical organization. Furthermore our work challenges the framework of relatively independent encoding of location and features in the visual system, showing instead location-dependent feature sensitivity produced by specialized processing of different features in different spatial locations. In the second part of the talk, I will propose that location-dependent feature sensitivity is a fundamental organizing principle of the visual system that achieves efficient representation of positional regularities in visual input, and reflects the evolutionary selection of sensory and motor circuits to optimally represent behaviorally relevant information. The relevant papers can be found here: V2 retinotopy (Sedigh-Sarvestani et al. Neuron, 2021) Location-dependent feature sensitivity (Sedigh-Sarvestani et al. Under Review, 2022)
Response of cortical networks to optogenetic stimulation: Experiment vs. theory
Optogenetics is a powerful tool that allows experimentalists to perturb neural circuits. What can we learn about a network from observing its response to perturbations? I will first describe the results of optogenetic activation of inhibitory neurons in mice cortex, and show that the results are consistent with inhibition stabilization. I will then move to experiments in which excitatory neurons are activated optogenetically, with or without visual inputs, in mice and monkeys. In some conditions, these experiments show a surprising result that the distribution of firing rates is not significantly changed by stimulation, even though firing rates of individual neurons are strongly modified. I will show in which conditions a network model of excitatory and inhibitory neurons can reproduce this feature.
How does seeing help listening? Audiovisual integration in Auditory Cortex
Multisensory responses are ubiquitous in so-called unisensory cortex. However, despite their prevalence, we have very little understanding of what – if anything - they contribute to perception. In this talk I will focus on audio-visual integration in auditory cortex. Anatomical tracing studies highlight visual cortex as one source of visual input to auditory cortex. Using cortical cooling we test the hypothesis that these inputs support audiovisual integration in ferret auditory cortex. Behavioural studies in humans support the idea that visual stimuli can help listeners to parse an auditory scene. This effect is paralleled in single units in auditory cortex, where responses to a sound mixture can be determined by the timing of a visual stimulus such that sounds that are temporally coherent with a visual stimulus are preferentially represented. Our recent data therefore support the idea that one role for the early integration of auditory and visual signals in auditory cortex is to support auditory scene analysis, and that visual cortex plays a key role in this process.
Categories, language, and visual working memory: how verbal labels change capacity limitations
The limited capacity of visual working memory constrains the quantity and quality of the information we can store in mind for ongoing processing. Research from our lab has demonstrated that verbal labeling/categorization of visual inputs increases its retention and fidelity in visual working memory. In this talk, I will outline the hypotheses that explain the interaction between visual and verbal inputs in working memory, leading to the boosts we observed. I will further show how manipulations of the categorical distinctiveness of the labels, the timing of their occurrence, to which item labels are applied, as well as their validity modulate the benefits one can draw from combining visual and verbal inputs to alleviate capacity limitations. Finally, I will discuss the implications of these results to our understanding of working memory and its interaction with prior knowledge.
Visual working memory representations are distorted by their use in perceptual comparisons
Visual working memory (VWM) allows us to maintain a small amount of task-relevant information in mind so that we can use them to guide our behavior. Although past studies have successfully characterized its capacity limit and representational quality during maintenance, the consequence of its usage for task-relevant behaviors has been largely unknown. In this talk, I will demonstrate that VWM representations get distorted when they are used for perceptual comparisons with new visual inputs, especially when the inputs are subjectively similar to the VWM representations. Furthermore, I will show that this similarity-induced memory bias (SIMB) occurs for both simple (e.g. , color, shape) and complex stimuli (e.g., real world objects, faces) that are perceptually encoded and retrieved from long-term memory. Given the observed versatility of the SIMB, its implication for other memory distortion phenomena (e.g., distractor-induced distortion, misinformation effect) will be discussed.
Networks for multi-sensory attention and working memory
Converging evidence from fMRI and EEG shows that audtiory spatial attention engages the same fronto-parietal network associated with visuo-spatial attention. This network is distinct from an auditory-biased processing network that includes other frontal regions; this second network is can be recruited when observers extract rhythmic information from visual inputs. We recently used a dual-task paradigm to examine whether this "division of labor" between a visuo-spatial network and an auditory-rhythmic network can be observed in a working memory paradigm. We varied the sensory modality (visual vs. auditory) and information domain (spatial or rhythmic) that observers had to store in working memory, while also performing an intervening task. Behavior, pupilometry, and EEG results show a complex interaction across the working memory and intervening tasks, consistent with two cognitive control networks managing auditory and visual inputs based on the kind of information being processed.
A Cortical Circuit for Audio-Visual Predictions
Team work makes sensory streams work: our senses work together, learn from each other, and stand in for one another, the result of which is perception and understanding. Learned associations between stimuli in different sensory modalities can shape the way we perceive these stimuli (Mcgurk and Macdonald, 1976). During audio-visual associative learning, auditory cortex is thought to underlie multi-modal plasticity in visual cortex (McIntosh et al., 1998; Mishra et al., 2007; Zangenehpour and Zatorre, 2010). However, it is not well understood how processing in visual cortex is altered by an auditory stimulus that is predictive of a visual stimulus and what the mechanisms are that mediate such experience-dependent, audio-visual associations in sensory cortex. Here we describe a neural mechanism by which an auditory input can shape visual representations of behaviorally relevant stimuli through direct interactions between auditory and visual cortices. We show that the association of an auditory stimulus with a visual stimulus in a behaviorally relevant context leads to an experience-dependent suppression of visual responses in primary visual cortex (V1). Auditory cortex axons carry a mixture of auditory and retinotopically-matched visual input to V1, and optogenetic stimulation of these axons selectively suppresses V1 neurons responsive to the associated visual stimulus after, but not before, learning. Our results suggest that cross-modal associations can be stored in long-range cortical connections and that with learning these cross-modal connections function to suppress the responses to predictable input.
Arousal modulates retinal output
Neural responses in the visual system are usually not purely visual but depend on behavioural and internal states such as arousal. This dependence is seen both in primary visual cortex (V1) and in subcortical brain structures receiving direct retinal input. In this talk, I will show that modulation by behavioural state arises as early as in the output of the retina.To measure retinal activity in the awake, intact brain, we imaged the synaptic boutons of retinal axons in the superficial superior colliculus (sSC) of mice. The activity of about half of the boutons depended not only on vision but also on running speed and pupil size, regardless of retinal illumination. Arousal typically reduced the boutons’ visual responses to preferred direction and their selectivity for direction and orientation.Arousal may affect activity in retinal boutons by presynaptic neuromodulation. To test whether the effects of arousal occur already in the retina, we recorded from retinal axons in the optic tract. We found that, in darkness, more than one third of the recorded axons was significantly correlated with running speed. Arousal had similar effects postsynaptically, in sSC neurons, independent of activity in V1, the other main source of visual inputs to colliculus. Optogenetic inactivation of V1 generally decreased activity in collicular neurons but did not diminish the effects of arousal. These results indicate that arousal modulates activity at every stage of the visual system. In the future, we will study the purpose and the underlying mechanisms of behavioural modulation in the early visual system
Time is of the essence: active sensing in natural vision reveals novel mechanisms of perception
n natural vision, active vision refers to the changes in visual input resulting from self-initiated eye movements. In this talk, I will present studies that show that the stimulus-related activity during active vision differs substantially from that occurring during classical flashed-stimuli paradigms. Our results uncover novel and efficient mechanisms that improve visual perception. In a general way, the nervous system appears to engage in sensory modulation mechanisms, precisely timed to self-initiated stimulus changes, thus coordinating neural activity across different cortical areas and serving as a general mechanism for the global coordination of visual perception.
A new computational framework for understanding vision in our brain
Visual attention selects only a tiny fraction of visual input information for further processing. Selection starts in the primary visual cortex (V1), which creates a bottom-up saliency map to guide the fovea to selected visual locations via gaze shifts. This motivates a new framework that views vision as consisting of encoding, selection, and decoding stages, placing selection on center stage. It suggests a massive loss of non-selected information from V1 downstream along the visual pathway. Hence, feedback from downstream visual cortical areas to V1 for better decoding (recognition), through analysis-by- synthesis, should query for additional information and be mainly directed at the foveal region. Accordingly, non-foveal vision is not only poorer in spatial resolution, but also more susceptible to many illusions.
A discrete model of visual input shows how ocular drift removes ambiguity
COSYNE 2022
The dynamical regime of mouse visual cortex shifts from cooperation to competition with increasing visual input
COSYNE 2022
Joint coding of visual input and eye/head position in V1 of freely moving mice
COSYNE 2022
Joint coding of visual input and eye/head position in V1 of freely moving mice
COSYNE 2022
Coregistration of heading and visual inputs in retrosplenial cortex
COSYNE 2023