Visual Features
visual features
Trends in NeuroAI - Meta's MEG-to-image reconstruction
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). This will be an informal journal club presentation, we do not have an author of the paper joining us. Title: Brain decoding: toward real-time reconstruction of visual perception Abstract: In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution (≈0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution (≈5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that MEG signals primarily contain high-level visual features, whereas the same approach applied to 7T fMRI also recovers low-level features. Overall, these results provide an important step towards the decoding - in real time - of the visual processes continuously unfolding within the human brain. Speaker: Dr. Paul Scotti (Stability AI, MedARC) Paper link: https://arxiv.org/abs/2310.19812
Learning through the eyes and ears of a child
Young children have sophisticated representations of their visual and linguistic environment. Where do these representations come from? How much knowledge arises through generic learning mechanisms applied to sensory data, and how much requires more substantive (possibly innate) inductive biases? We examine these questions by training neural networks solely on longitudinal data collected from a single child (Sullivan et al., 2020), consisting of egocentric video and audio streams. Our principal findings are as follows: 1) Based on visual only training, neural networks can acquire high-level visual features that are broadly useful across categorization and segmentation tasks. 2) Based on language only training, networks can acquire meaningful clusters of words and sentence-level syntactic sensitivity. 3) Based on paired visual and language training, networks can acquire word-referent mappings from tens of noisy examples and align their multi-modal conceptual systems. Taken together, our results show how sophisticated visual and linguistic representations can arise through data-driven learning applied to one child’s first-person experience.
Synthetic and natural images unlock the power of recurrency in primary visual cortex
During perception the visual system integrates current sensory evidence with previously acquired knowledge of the visual world. Presumably this computation relies on internal recurrent interactions. We record populations of neurons from the primary visual cortex of cats and macaque monkeys and find evidence for adaptive internal responses to structured stimulation that change on both slow and fast timescales. In the first experiment, we present abstract images, only briefly, a protocol known to produce strong and persistent recurrent responses in the primary visual cortex. We show that repetitive presentations of a large randomized set of images leads to enhanced stimulus encoding on a timescale of minutes to hours. The enhanced encoding preserves the representational details required for image reconstruction and can be detected in post-exposure spontaneous activity. In a second experiment, we show that the encoding of natural scenes across populations of V1 neurons is improved, over a timescale of hundreds of milliseconds, with the allocation of spatial attention. Given the hierarchical organization of the visual cortex, contextual information from the higher levels of the processing hierarchy, reflecting high-level image regularities, can inform the activity in V1 through feedback. We hypothesize that these fast attentional boosts in stimulus encoding rely on recurrent computations that capitalize on the presence of high-level visual features in natural scenes. We design control images dominated by low-level features and show that, in agreement with our hypothesis, the attentional benefits in stimulus encoding vanish. We conclude that, in the visual system, powerful recurrent processes optimize neuronal responses, already at the earliest stages of cortical processing.
NMC4 Short Talk: Untangling Contributions of Distinct Features of Images to Object Processing in Inferotemporal Cortex
How do humans perceive daily objects of various features and categorize these seemingly intuitive and effortless mental representations? Prior literature focusing on the role of the inferotemporal region (IT) has revealed object category clustering that is consistent with the semantic predefined structure (superordinate, ordinate, subordinate). It has however been debated whether the neural signals in the IT regions are a reflection of such categorical hierarchy [Wen et al.,2018; Bracci et al., 2017]. Visual attributes of images that correlated with semantic and category dimensions may have confounded these prior results. Our study aimed to address this debate by building and comparing models using the DNN AlexNet, to explain the variance in representational dissimilarity matrix (RDM) of neural signals in the IT region. We found that mid and high level perceptual attributes of the DNN model contribute the most to neural RDMs in the IT region. Semantic categories, as in predefined structure, were moderately correlated with mid to high DNN layers (r = [0.24 - 0.36]). Variance partitioning analysis also showed that the IT neural representations were mostly explained by DNN layers, while semantic categorical RDMs brought little additional information. In light of these results, we propose future works should focus more on the specific role IT plays in facilitating the extraction and coding of visual features that lead to the emergence of categorical conceptualizations.
Data spaces: category (sheaf) theory and phenomenology
In this talk, I’ll introduce the formal concept of a (pre)sheaf as data attached to a topological space. Sheaves capture the notion of patching local sources of information to form a global whole, e.g., the binding of visual features such as colour and shape. The formal theory appears to be closely related to the foundational properties asserted by the Information Integration Theory (IIT) for phenomenology. A comparison is intended to engender discussion on ways that phenomenology may benefit from a sheaf theory, or (more generally) a category theory approach.
Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features
It is an open question whether preferences for visual art can be lawfully predicted from the basic constituent elements of a visual image. Here, we developed and tested a computational framework to investigate how aesthetic values are formed. We show that it is possible to explain human preferences for a visual art piece based on a mixture of low- and high-level features of the image. Subjective value ratings could be predicted not only within but also across individuals, using a regression model with a common set of interpretable features. We also show that the features predicting aesthetic preference can emerge hierarchically within a deep convolutional neural network trained only for object recognition. Our findings suggest that human preferences for art can be explained at least in part as a systematic integration over the underlying visual features of an image.
Neural circuits that support robust and flexible navigation in dynamic naturalistic environments
Tracking heading within an environment is a fundamental requirement for flexible, goal-directed navigation. In insects, a head-direction representation that guides the animal’s movements is maintained in a conserved brain region called the central complex. Two-photon calcium imaging of genetically targeted neural populations in the central complex of tethered fruit flies behaving in virtual reality (VR) environments has shown that the head-direction representation is updated based on self-motion cues and external sensory information, such as visual features and wind direction. Thus far, the head direction representation has mainly been studied in VR settings that only give flies control of the angular rotation of simple sensory cues. How the fly’s head direction circuitry enables the animal to navigate in dynamic, immersive and naturalistic environments is largely unexplored. I have developed a novel setup that permits imaging in complex VR environments that also accommodate flies’ translational movements. I have previously demonstrated that flies perform visually-guided navigation in such an immersive VR setting, and also that they learn to associate aversive optogenetically-generated heat stimuli with specific visual landmarks. A stable head direction representation is likely necessary to support such behaviors, but the underlying neural mechanisms are unclear. Based on a connectomic analysis of the central complex, I identified likely circuit mechanisms for prioritizing and combining different sensory cues to generate a stable head direction representation in complex, multimodal environments. I am now testing these predictions using calcium imaging in genetically targeted cell types in flies performing 2D navigation in immersive VR.
Categories, language, and visual working memory: how verbal labels change capacity limitations
The limited capacity of visual working memory constrains the quantity and quality of the information we can store in mind for ongoing processing. Research from our lab has demonstrated that verbal labeling/categorization of visual inputs increases its retention and fidelity in visual working memory. In this talk, I will outline the hypotheses that explain the interaction between visual and verbal inputs in working memory, leading to the boosts we observed. I will further show how manipulations of the categorical distinctiveness of the labels, the timing of their occurrence, to which item labels are applied, as well as their validity modulate the benefits one can draw from combining visual and verbal inputs to alleviate capacity limitations. Finally, I will discuss the implications of these results to our understanding of working memory and its interaction with prior knowledge.
Understanding the role of prediction in sensory encoding
At any given moment the brain receives more sensory information than it can use to guide adaptive behaviour, creating the need for mechanisms that promote efficient processing of incoming sensory signals. One way in which the brain might reduce its sensory processing load is to encode successive presentations of the same stimulus in a more efficient form, a process known as neural adaptation. Conversely, when a stimulus violates an expected pattern, it should evoke an enhanced neural response. Such a scheme for sensory encoding has been formalised in predictive coding theories, which propose that recent experience establishes expectations in the brain that generate prediction errors when violated. In this webinar, Professor Jason Mattingley will discuss whether the encoding of elementary visual features is modulated when otherwise identical stimuli are expected or unexpected based upon the history of stimulus presentation. In humans, EEG was employed to measure neural activity evoked by gratings of different orientations, and multivariate forward modelling was used to determine how orientation selectivity is affected for expected versus unexpected stimuli. In mice, two-photon calcium imaging was used to quantify orientation tuning of individual neurons in the primary visual cortex to expected and unexpected gratings. Results revealed enhanced orientation tuning to unexpected visual stimuli, both at the level of whole-brain responses and for individual visual cortex neurons. Professor Mattingley will discuss the implications of these findings for predictive coding theories of sensory encoding. Professor Jason Mattingley is a Laureate Fellow and Foundation Chair in Cognitive Neuroscience at The University of Queensland. His research is directed toward understanding the brain processes that support perception, selective attention and decision-making, in health and disease.
Memory for Latent Representations: An Account of Working Memory that Builds on Visual Knowledge for Efficient and Detailed Visual Representations
Visual knowledge obtained from our lifelong experience of the world plays a critical role in our ability to build short-term memories. We propose a mechanistic explanation of how working memory (WM) representations are built from the latent representations of visual knowledge and can then be reconstructed. The proposed model, Memory for Latent Representations (MLR), features a variational autoencoder with an architecture that corresponds broadly to the human visual system and an activation-based binding pool of neurons that binds items’ attributes to tokenized representations. The simulation results revealed that shape information for stimuli that the model was trained on, can be encoded and retrieved efficiently from latents in higher levels of the visual hierarchy. On the other hand, novel patterns that are completely outside the training set can be stored from a single exposure using only latents from early layers of the visual system. Moreover, the representation of a given stimulus can have multiple codes, representing specific visual features such as shape or color, in addition to categorical information. Finally, we validated our model by testing a series of predictions against behavioral results acquired from WM tasks. The model provides a compelling demonstration of visual knowledge yielding the formation of compact visual representation for efficient memory encoding.
High precision coding in visual cortex
Individual neurons in visual cortex provide the brain with unreliable estimates of visual features. It is not known if the single-neuron variability is correlated across large neural populations, thus impairing the global encoding of stimuli. We recorded simultaneously from up to 50,000 neurons in mouse primary visual cortex (V1) and in higher-order visual areas and measured stimulus discrimination thresholds of 0.35 degrees and 0.37 degrees respectively in an orientation decoding task. These neural thresholds were almost 100 times smaller than the behavioral discrimination thresholds reported in mice. This discrepancy could not be explained by stimulus properties or arousal states. Furthermore, the behavioral variability during a sensory discrimination task could not be explained by neural variability in primary visual cortex. Instead behavior-related neural activity arose dynamically across a network of non-sensory brain areas. These results imply that sensory perception in mice is limited by downstream decoders, not by neural noise in sensory representations.
High precision coding in visual cortex
Single neurons in visual cortex provide unreliable measurements of visual features due to their high trial-to-trial variability. It is not known if this “noise” extends its effects over large neural populations to impair the global encoding of stimuli. We recorded simultaneously from ∼20,000 neurons in mouse primary visual cortex (V1) and found that the neural populations had discrimination thresholds of ∼0.34° in an orientation decoding task. These thresholds were nearly 100 times smaller than those reported behaviourally in mice. The discrepancy between neural and behavioural discrimination could not be explained by the types of stimuli we used, by behavioural states or by the sequential nature of perceptual learning tasks. Furthermore, higher-order visual areas lateral to V1 could be decoded equally well. These results imply that the limits of sensory perception in mice are not set by neural noise in sensory cortex, but by the limitations of downstream decoders.
Hippocampal place field formation by sparse, local learning of visual features in virtual reality
FENS Forum 2024