TopicNeuro

scene understanding

4 Seminars1 ePoster

Latest

SeminarNeuroscienceRecording

Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation

Aran Nayebi
MIT
Nov 2, 2022

Studies of the mouse visual system have revealed a variety of visual brain areas in a roughly hierarchical arrangement, together with a multitude of behavioral capacities, ranging from stimulus-reward associations, to goal-directed navigation, and object-centric discriminations. However, an overall understanding of the mouse’s visual cortex organization, and how this organization supports visual behaviors, remains unknown. Here, we take a computational approach to help address these questions, providing a high-fidelity quantitative model of mouse visual cortex. By analyzing factors contributing to model fidelity, we identified key principles underlying the organization of mouse visual cortex. Structurally, we find that comparatively low-resolution and shallow structure were both important for model correctness. Functionally, we find that models trained with task-agnostic, unsupervised objective functions, based on the concept of contrastive embeddings were substantially better than models trained with supervised objectives. Finally, the unsupervised objective builds a general-purpose visual representation that enables the system to achieve better transfer on out-of-distribution visual, scene understanding and reward-based navigation tasks. Our results suggest that mouse visual cortex is a low-resolution, shallow network that makes best use of the mouse’s limited resources to create a light-weight, general-purpose visual system – in contrast to the deep, high-resolution, and more task-specific visual system of primates.

SeminarNeuroscienceRecording

NMC4 Short Talk: Image embeddings informed by natural language improve predictions and understanding of human higher-level visual cortex

Aria Wang
Carnegie Mellon University
Dec 1, 2021

To better understand human scene understanding, we extracted features from images using CLIP, a neural network model of visual concept trained with supervision from natural language. We then constructed voxelwise encoding models to explain whole brain responses arising from viewing natural images from the Natural Scenes Dataset (NSD) - a large-scale fMRI dataset collected at 7T. Our results reveal that CLIP, as compared to convolution based image classification models such as ResNet or AlexNet, as well as language models such as BERT, gives rise to representations that enable better prediction performance - up to a 0.86 correlation with test data and an r-square of 0.75 - in higher-level visual cortex in humans. Moreover, CLIP representations explain distinctly unique variance in these higher-level visual areas as compared to models trained with only images or text. Control experiments show that the improvement in prediction observed with CLIP is not due to architectural differences (transformer vs. convolution) or to the encoding of image captions per se (vs. single object labels). Together our results indicate that CLIP and, more generally, multimodal models trained jointly on images and text, may serve as better candidate models of representation in human higher-level visual cortex. The bridge between language and vision provided by jointly trained models such as CLIP also opens up new and more semantically-rich ways of interpreting the visual brain.

SeminarNeuroscienceRecording

Uncovering the temporal dynamics of scene understanding using Event-Related Potentials

Assaf Harel
Wright State University
Jul 14, 2020
ePosterNeuroscience

Probing Motion-Form Interactions in the Macaque Inferior Temporal Cortex and Artificial Neural Networks for Complex Scene Understanding

Jean de Dieu Uwisengeyimana, Kohitij Kar

COSYNE 2025

scene understanding coverage

5 items

Seminar4
ePoster1
Domain spotlight

Explore how scene understanding research is advancing inside Neuro.

Visit domain