Images
images
FLUXSynID: High-Resolution Synthetic Face Generation for Document and Live Capture Images
Synthetic face datasets are increasingly used to overcome the limitations of real-world biometric data, including privacy concerns, demographic imbalance, and high collection costs. However, many existing methods lack fine-grained control over identity attributes and fail to produce paired, identity-consistent images under structured capture conditions. In this talk, I will present FLUXSynID, a framework for generating high-resolution synthetic face datasets with user-defined identity attribute distributions and paired document-style and trusted live capture images. The dataset generated using FLUXSynID shows improved alignment with real-world identity distributions and greater diversity compared to prior work. I will also discuss how FLUXSynID’s dataset and generation tools can support research in face recognition and morphing attack detection (MAD), enhancing model robustness in both academic and practical applications.
An Ecological and Objective Neural Marker of Implicit Unfamiliar Identity Recognition
We developed a novel paradigm measuring implicit identity recognition using Fast Periodic Visual Stimulation (FPVS) with EEG among 16 students and 12 police officers with normal face processing abilities. Participants' neural responses to a 1-Hz tagged oddball identity embedded within a 6-Hz image stream revealed implicit recognition with high-quality mugshots but not CCTV-like images, suggesting optimal resolution requirements. Our findings extend previous research by demonstrating that even unfamiliar identities can elicit robust neural recognition signatures through brief, repeated passive exposure. This approach offers potential for objective validation of face processing abilities in forensic applications, including assessment of facial examiners, Super-Recognisers, and eyewitnesses, potentially overcoming limitations of traditional behavioral assessment methods.
Error Consistency between Humans and Machines as a function of presentation duration
Within the last decade, Deep Artificial Neural Networks (DNNs) have emerged as powerful computer vision systems that match or exceed human performance on many benchmark tasks such as image classification. But whether current DNNs are suitable computational models of the human visual system remains an open question: While DNNs have proven to be capable of predicting neural activations in primate visual cortex, psychophysical experiments have shown behavioral differences between DNNs and human subjects, as quantified by error consistency. Error consistency is typically measured by briefly presenting natural or corrupted images to human subjects and asking them to perform an n-way classification task under time pressure. But for how long should stimuli ideally be presented to guarantee a fair comparison with DNNs? Here we investigate the influence of presentation time on error consistency, to test the hypothesis that higher-level processing drives behavioral differences. We systematically vary presentation times of backward-masked stimuli from 8.3ms to 266ms and measure human performance and reaction times on natural, lowpass-filtered and noisy images. Our experiment constitutes a fine-grained analysis of human image classification under both image corruptions and time pressure, showing that even drastically time-constrained humans who are exposed to the stimuli for only two frames, i.e. 16.6ms, can still solve our 8-way classification task with success rates way above chance. We also find that human-to-human error consistency is already stable at 16.6ms.
Trends in NeuroAI - Meta's MEG-to-image reconstruction
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). Title: Brain-optimized inference improves reconstructions of fMRI brain activity Abstract: The release of large datasets and developments in AI have led to dramatic improvements in decoding methods that reconstruct seen images from human brain activity. We evaluate the prospect of further improving recent decoding methods by optimizing for consistency between reconstructions and brain activity during inference. We sample seed reconstructions from a base decoding method, then iteratively refine these reconstructions using a brain-optimized encoding model that maps images to brain activity. At each iteration, we sample a small library of images from an image distribution (a diffusion model) conditioned on a seed reconstruction from the previous iteration. We select those that best approximate the measured brain activity when passed through our encoding model, and use these images for structural guidance during the generation of the small library in the next iteration. We reduce the stochasticity of the image distribution at each iteration, and stop when a criterion on the "width" of the image distribution is met. We show that when this process is applied to recent decoding methods, it outperforms the base decoding method as measured by human raters, a variety of image feature metrics, and alignment to brain activity. These results demonstrate that reconstruction quality can be significantly improved by explicitly aligning decoding distributions to brain activity distributions, even when the seed reconstruction is output from a state-of-the-art decoding algorithm. Interestingly, the rate of refinement varies systematically across visual cortex, with earlier visual areas generally converging more slowly and preferring narrower image distributions, relative to higher-level brain areas. Brain-optimized inference thus offers a succinct and novel method for improving reconstructions and exploring the diversity of representations across visual brain areas. Speaker: Reese Kneeland is a Ph.D. student at the University of Minnesota working in the Naselaris lab. Paper link: https://arxiv.org/abs/2312.07705
Trends in NeuroAI - Meta's MEG-to-image reconstruction
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). This will be an informal journal club presentation, we do not have an author of the paper joining us. Title: Brain decoding: toward real-time reconstruction of visual perception Abstract: In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution (≈0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution (≈5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that MEG signals primarily contain high-level visual features, whereas the same approach applied to 7T fMRI also recovers low-level features. Overall, these results provide an important step towards the decoding - in real time - of the visual processes continuously unfolding within the human brain. Speaker: Dr. Paul Scotti (Stability AI, MedARC) Paper link: https://arxiv.org/abs/2310.19812
Diverse applications of artificial intelligence and mathematical approaches in ophthalmology
Ophthalmology is ideally placed to benefit from recent advances in artificial intelligence. It is a highly image-based specialty and provides unique access to the microvascular circulation and the central nervous system. This talk will demonstrate diverse applications of machine learning and deep learning techniques in ophthalmology, including in age-related macular degeneration (AMD), the leading cause of blindness in industrialized countries, and cataract, the leading cause of blindness worldwide. This will include deep learning approaches to automated diagnosis, quantitative severity classification, and prognostic prediction of disease progression, both from images alone and accompanied by demographic and genetic information. The approaches discussed will include deep feature extraction, label transfer, and multi-modal, multi-task training. Cluster analysis, an unsupervised machine learning approach to data classification, will be demonstrated by its application to geographic atrophy in AMD, including exploration of genotype-phenotype relationships. Finally, mediation analysis will be discussed, with the aim of dissecting complex relationships between AMD disease features, genotype, and progression.
The development of visual experience
Vision and visual cognition is experience-dependent with likely multiple sensitive periods, but we know very little about statistics of visual experience at the scale of everyday life and how they might change with development. By traditional assumptions, the world at the massive scale of daily life presents pretty much the same visual statistics to all perceivers. I will present an overview our work on ego-centric vision showing that this is not the case. The momentary image received at the eye is spatially selective, dependent on the location, posture and behavior of the perceiver. If a perceiver’s location, possible postures and/or preferences for looking at some kinds of scenes over others are constrained, then their sampling of images from the world and thus the visual statistics at the scale of daily life could be biased. I will present evidence with respect to both low-level and higher level visual statistics about the developmental changes in the visual input over the first 18 months post-birth.
Computational models and experimental methods for the human cornea
The eye is a multi-component biological system, where mechanics, optics, transport phenomena and chemical reactions are strictly interlaced, characterized by the typical bio-variability in sizes and material properties. The eye’s response to external action is patient-specific and it can be predicted only by a customized approach, that accounts for the multiple physics and for the intrinsic microstructure of the tissues, developed with the aid of forefront means of computational biomechanics. Our activity in the last years has been devoted to the development of a comprehensive model of the cornea that aims at being entirely patient-specific. While the geometrical aspects are fully under control, given the sophisticated diagnostic machinery able to provide a fully three-dimensional images of the eye, the major difficulties are related to the characterization of the tissues, which require the setup of in-vivo tests to complement the well documented results of in-vitro tests. The interpretation of in-vivo tests is very complex, since the entire structure of the eye is involved and the characterization of the single tissue is not trivial. The availability of micromechanical models constructed from detailed images of the eye represents an important support for the characterization of the corneal tissues, especially in the case of pathologic conditions. In this presentation I will provide an overview of the research developed in our group in terms of computational models and experimental approaches developed for the human cornea.
Learning to see stuff
Humans are very good at visually recognizing materials and inferring their properties. Without touching surfaces, we can usually tell what they would feel like, and we enjoy vivid visual intuitions about how they typically behave. This is impressive because the retinal image that the visual system receives as input is the result of complex interactions between many physical processes. Somehow the brain has to disentangle these different factors. I will present some recent work in which we show that an unsupervised neural network trained on images of surfaces spontaneously learns to disentangle reflectance, lighting and shape. However, the disentanglement is not perfect, and we find that as a result the network not only predicts the broad successes of human gloss perception, but also the specific pattern of errors that humans exhibit on an image-by-image basis. I will argue this has important implications for thinking about appearance and vision more broadly.
Multimodal Blending
In this talk, I’ll consider how new ideas emerge from old ones via the process of conceptual blending. I’ll start by considering analogical reasoning in problem solving and the role conceptual blending plays in these problem-solving contexts. Then I’ll consider blending in multi-modal contexts, including timelines, memes (viz. image macros), and, if time allows, zoom meetings. I suggest mappings analogy researchers have traditionally considered superficial are often important for the development of novel abstractions. Likewise, the analogue portion of multimodal blends anchors their generative capacity. Overall, these observations underscore the extent to which meaning is a socially distributed process whose intermediate products are stored in cognitive artifacts such as text and digital images.
Automated generation of face stimuli: Alignment, features and face spaces
I describe a well-tested Python module that does automated alignment and warping of faces images, and some advantages over existing solutions. An additional tool I’ve developed does automated extraction of facial features, which can be used in a number of interesting ways. I illustrate the value of wavelet-based features with a brief description of 2 recent studies: perceptual in-painting, and the robustness of the whole-part advantage across a large stimulus set. Finally, I discuss the suitability of various deep learning models for generating stimuli to study perceptual face spaces. I believe those interested in the forensic aspects of face perception may find this talk useful.
Geometry of concept learning
Understanding Human ability to learn novel concepts from just a few sensory experiences is a fundamental problem in cognitive neuroscience. I will describe a recent work with Ben Sorcher and Surya Ganguli (PNAS, October 2022) in which we propose a simple, biologically plausible, and mathematically tractable neural mechanism for few-shot learning of naturalistic concepts. We posit that the concepts that can be learned from few examples are defined by tightly circumscribed manifolds in the neural firing-rate space of higher-order sensory areas. Discrimination between novel concepts is performed by downstream neurons implementing ‘prototype’ decision rule, in which a test example is classified according to the nearest prototype constructed from the few training examples. We show that prototype few-shot learning achieves high few-shot learning accuracy on natural visual concepts using both macaque inferotemporal cortex representations and deep neural network (DNN) models of these representations. We develop a mathematical theory that links few-shot learning to the geometric properties of the neural concept manifolds and demonstrate its agreement with our numerical simulations across different DNNs as well as different layers. Intriguingly, we observe striking mismatches between the geometry of manifolds in intermediate stages of the primate visual pathway and in trained DNNs. Finally, we show that linguistic descriptors of visual concepts can be used to discriminate images belonging to novel concepts, without any prior visual experience of these concepts (a task known as ‘zero-shot’ learning), indicated a remarkable alignment of manifold representations of concepts in visual and language modalities. I will discuss ongoing effort to extend this work to other high level cognitive tasks.
Connecting performance benefits on visual tasks to neural mechanisms using convolutional neural networks
Behavioral studies have demonstrated that certain task features reliably enhance classification performance for challenging visual stimuli. These include extended image presentation time and the valid cueing of attention. Here, I will show how convolutional neural networks can be used as a model of the visual system that connects neural activity changes with such performance changes. Specifically, I will discuss how different anatomical forms of recurrence can account for better classification of noisy and degraded images with extended processing time. I will then show how experimentally-observed neural activity changes associated with feature attention lead to observed performance changes on detection tasks. I will also discuss the implications these results have for how we identify the neural mechanisms and architectures important for behavior.
The Effects of Negative Emotions on Mental Representation of Faces
Face detection is an initial step of many social interactions involving a comparison between a visual input and a mental representation of faces, built from previous experience. Whilst emotional state was found to affect the way humans attend to faces, little research has explored the effects of emotions on the mental representation of faces. Here, we examined the specific perceptual modulation of geometric properties of the mental representations associated with state anxiety and state depression on face detection, and to compare their emotional expression. To this end, we used an adaptation of the reverse correlation technique inspired by Gosselin and Schyns’, (2003) ‘Superstitious Approach’, to construct visual representations of observers’ mental representations of faces and to relate these to their mental states. In two sessions, on separate days, participants were presented with ‘colourful’ noise stimuli and asked to detect faces, which they were told were present. Based on the noise fragments that were identified as faces, we reconstructed the pictorial mental representation utilised by each participant in each session. We found a significant correlation between the size of the mental representation of faces and participants’ level of depression. Our findings provide a preliminary insight about the way emotions affect appearance expectation of faces. To further understand whether the facial expressions of participants’ mental representations reflect their emotional state, we are conducting a validation study with a group of naïve observers who are asked to classify the reconstructed face images by emotion. Thus, we assess whether the faces communicate participants’ emotional states to others.
The transformation from seeing to remembering images
Real-world scene perception and search from foveal to peripheral vision
A high-resolution central fovea is a prominent design feature of human vision. But how important is the fovea for information processing and gaze guidance in everyday visual-cognitive tasks? Following on from classic findings for sentence reading, I will present key results from a series of eye-tracking experiments in which observers had to search for a target object within static or dynamic images of real-world scenes. Gaze-contingent scotomas were used to selectively deny information processing in the fovea, parafovea, or periphery. Overall, the results suggest that foveal vision is less important and peripheral vision is more important for scene perception and search than previously thought. The importance of foveal vision was found to depend on the specific requirements of the task. Moreover, the data support a central-peripheral dichotomy in which peripheral vision selects and central vision recognizes.
Learning with less labels for medical image segmentation
Accurate segmentation of medical images is a key step in developing Computer-Aided Diagnosis (CAD) and automating various clinical tasks such as image-guided interventions. The success of state-of-the-art methods for medical image segmentation is heavily reliant upon the availability of a sizable amount of labelled data. If the required quantity of labelled data for learning cannot be reached, the technology turns out to be fragile. The principle of consensus tells us that as humans, when we are uncertain how to act in a situation, we tend to look to others to determine how to respond. In this webinar, Dr Mehrtash Harandi will show how to model the principle of consensus to learn to segment medical data with limited labelled data. In doing so, we design multiple segmentation models that collaborate with each other to learn from labelled and unlabelled data collectively.
A model of colour appearance based on efficient coding of natural images
An object’s colour, brightness and pattern are all influenced by its surroundings, and a number of visual phenomena and “illusions” have been discovered that highlight these often dramatic effects. Explanations for these phenomena range from low-level neural mechanisms to high-level processes that incorporate contextual information or prior knowledge. Importantly, few of these phenomena can currently be accounted for when measuring an object’s perceived colour. Here we ask to what extent colour appearance is predicted by a model based on the principle of coding efficiency. The model assumes that the image is encoded by noisy spatio-chromatic filters at one octave separations, which are either circularly symmetrical or oriented. Each spatial band’s lower threshold is set by the contrast sensitivity function, and the dynamic range of the band is a fixed multiple of this threshold, above which the response saturates. Filter outputs are then reweighted to give equal power in each channel for natural images. We demonstrate that the model fits human behavioural performance in psychophysics experiments, and also primate retinal ganglion responses. Next we systematically test the model’s ability to qualitatively predict over 35 brightness and colour phenomena, with almost complete success. This implies that contrary to high-level processing explanations, much of colour appearance is potentially attributable to simple mechanisms evolved for efficient coding of natural images, and is a basis for modelling the vision of humans and other animals.
How communication networks promote cross-cultural similarities: The case of category formation
Individuals vary widely in how they categorize novel phenomena. This individual variation has led canonical theories in cognitive and social science to suggest that communication in large social networks leads populations to construct divergent category systems. Yet, anthropological data indicates that large, independent societies consistently arrive at similar categories across a range of topics. How is it possible for diverse populations, consisting of individuals with significant variation in how they view the world, to independently construct similar categories? Through a series of online experiments, I show how large communication networks within cultures can promote the formation of similar categories across cultures. For this investigation, I designed an online “Grouping Game” to observe how people construct categories in both small and large populations when tasked with grouping together the same novel and ambiguous images. I replicated this design for English-speaking subjects in the U.S. and Mandarin-speaking subjects in China. In both cultures, solitary individuals and small social groups produced highly divergent category systems. Yet, large social groups separately and consistently arrived at highly similar categories both within and across cultures. These findings are accurately predicted by a simple mathematical model of critical mass dynamics. Altogether, I show how large communication networks can filter lexical diversity among individuals to produce replicable society-level patterns, yielding unexpected implications for cultural evolution. In particular, I discuss how participants in both cultures readily harnessed analogies when categorizing novel stimuli, and I examine the role of communication networks in promoting cross-cultural similarities in analogy-making as the key engine of category formation.
Synthetic and natural images unlock the power of recurrency in primary visual cortex
During perception the visual system integrates current sensory evidence with previously acquired knowledge of the visual world. Presumably this computation relies on internal recurrent interactions. We record populations of neurons from the primary visual cortex of cats and macaque monkeys and find evidence for adaptive internal responses to structured stimulation that change on both slow and fast timescales. In the first experiment, we present abstract images, only briefly, a protocol known to produce strong and persistent recurrent responses in the primary visual cortex. We show that repetitive presentations of a large randomized set of images leads to enhanced stimulus encoding on a timescale of minutes to hours. The enhanced encoding preserves the representational details required for image reconstruction and can be detected in post-exposure spontaneous activity. In a second experiment, we show that the encoding of natural scenes across populations of V1 neurons is improved, over a timescale of hundreds of milliseconds, with the allocation of spatial attention. Given the hierarchical organization of the visual cortex, contextual information from the higher levels of the processing hierarchy, reflecting high-level image regularities, can inform the activity in V1 through feedback. We hypothesize that these fast attentional boosts in stimulus encoding rely on recurrent computations that capitalize on the presence of high-level visual features in natural scenes. We design control images dominated by low-level features and show that, in agreement with our hypothesis, the attentional benefits in stimulus encoding vanish. We conclude that, in the visual system, powerful recurrent processes optimize neuronal responses, already at the earliest stages of cortical processing.
Language Representations in the Human Brain: A naturalistic approach
Natural language is strongly context-dependent and can be perceived through different sensory modalities. For example, humans can easily comprehend the meaning of complex narratives presented through auditory speech, written text, or visual images. To understand how complex language-related information is represented in the human brain there is a necessity to map the different linguistic and non-linguistic information perceived under different modalities across the cerebral cortex. To map this information to the brain, I suggest following a naturalistic approach and observing the human brain performing tasks in its naturalistic setting, designing quantitative models that transform real-world stimuli into specific hypothesis-related features, and building predictive models that can relate these features to brain responses. In my talk, I will present models of brain responses collected using functional magnetic resonance imaging while human participants listened to or read natural narrative stories. Using natural text and vector representations derived from natural language processing tools I will present how we can study language processing in the human brain across modalities, in different levels of temporal granularity, and across different languages.
PiSpy: An Affordable, Accessible, and Flexible Imaging Platform for the Automated Observation of Organismal Biology and Behavior
A great deal of understanding can be gleaned from direct observation of organismal growth, development, and behavior. However, direct observation can be time consuming and influence the organism through unintentional stimuli. Additionally, video capturing equipment can often be prohibitively expensive, difficult to modify to one’s specific needs, and may come with unnecessary features. Here, we describe the PiSpy, a low-cost, automated video acquisition platform that uses a Raspberry Pi computer and camera to record video or images at specified time intervals or when externally triggered. All settings and controls, such as programmable light cycling, are accessible to users with no programming experience through an easy-to-use graphical user interface. Importantly, the entire PiSpy system can be assembled for less than $100 using laser-cut and 3D-printed components. We demonstrate the broad applications and flexibility of the PiSpy across a range of model and non-model organisms. Designs, instructions, and code can be accessed through an online repository, where a global community of PiSpy users can also contribute their own unique customizations and help grow the community of open-source research solutions.
Forensic use of face recognition systems for investigation
With the increasing development of automatic systems and artificial intelligence, face recognition is becoming increasingly important in forensic and civil contexts. However, face recognition has yet to be thoroughly empirically studied to provide an adequate scientific and legal framework for investigative and court purposes. This observation sets the foundation for the research. We focus on issues related to face images and the use of automatic systems. Our objective is to validate a likelihood ratio computation methodology for interpreting comparison scores from automatic face recognition systems (score-based likelihood ratio, SLR). We collected three types of traces: portraits (ID), video surveillance footage recorded by ATM and by a wide-angle camera (CCTV). The performance of two automatic face recognition systems is compared: the commercial IDEMIA Morphoface (MFE) system and the open source FaceNet algorithm.
Probabilistic computation in natural vision
A central goal of vision science is to understand the principles underlying the perception and neural coding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a rich spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception. In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we are leveraging this framework to develop deep probabilistic algorithms for natural image and video segmentation.
The true and false memorability of images - how we remember (and make errors to) some images over others
Commonly used face cognition tests yield low reliability and inconsistent performance: Implications for test design, analysis, and interpretation of individual differences data
Unfamiliar face processing (face cognition) ability varies considerably in the general population. However, the means of its assessment are not standardised, and selected laboratory tests vary between studies. It is also unclear whether 1) the most commonly employed tests are reliable, 2) participants show a degree of consistency in their performance, 3) and the face cognition tests broadly measure one underlying ability, akin to general intelligence. In this study, we asked participants to perform eight tests frequently employed in the individual differences literature. We examined the reliability of these tests, relationships between them, consistency in participants’ performance, and used data driven approaches to determine factors underpinning performance. Overall, our findings suggest that the reliability of these tests is poor to moderate, the correlations between them are weak, the consistency in participant performance across tasks is low and that performance can be broadly split into two factors: telling faces together, and telling faces apart. We recommend that future studies adjust analyses to account for stimuli (face images) and participants as random factors, routinely assess reliability, and that newly developed tests of face cognition are examined in the context of convergent validity with other commonly used measures of face cognition ability.
NMC4 Keynote:
The brain represents the external world through the bottleneck of sensory organs. The network of hierarchically organized neurons is thought to recover the causes of sensory inputs to reconstruct the reality in the brain in idiosyncratic ways depending on individuals and their internal states. How can we understand the world model represented in an individual’s brain, or the neuroverse? My lab has been working on brain decoding of visual perception and subjective experiences such as imagery and dreaming using machine learning and deep neural network representations. In this talk, I will outline the progress of brain decoding methods and present how subjective experiences are externalized as images and how they could be shared across individuals via neural code conversion. The prospects of these approaches in basic science and neurotechnology will be discussed.
NMC4 Short Talk: Image embeddings informed by natural language improve predictions and understanding of human higher-level visual cortex
To better understand human scene understanding, we extracted features from images using CLIP, a neural network model of visual concept trained with supervision from natural language. We then constructed voxelwise encoding models to explain whole brain responses arising from viewing natural images from the Natural Scenes Dataset (NSD) - a large-scale fMRI dataset collected at 7T. Our results reveal that CLIP, as compared to convolution based image classification models such as ResNet or AlexNet, as well as language models such as BERT, gives rise to representations that enable better prediction performance - up to a 0.86 correlation with test data and an r-square of 0.75 - in higher-level visual cortex in humans. Moreover, CLIP representations explain distinctly unique variance in these higher-level visual areas as compared to models trained with only images or text. Control experiments show that the improvement in prediction observed with CLIP is not due to architectural differences (transformer vs. convolution) or to the encoding of image captions per se (vs. single object labels). Together our results indicate that CLIP and, more generally, multimodal models trained jointly on images and text, may serve as better candidate models of representation in human higher-level visual cortex. The bridge between language and vision provided by jointly trained models such as CLIP also opens up new and more semantically-rich ways of interpreting the visual brain.
NMC4 Short Talk: Untangling Contributions of Distinct Features of Images to Object Processing in Inferotemporal Cortex
How do humans perceive daily objects of various features and categorize these seemingly intuitive and effortless mental representations? Prior literature focusing on the role of the inferotemporal region (IT) has revealed object category clustering that is consistent with the semantic predefined structure (superordinate, ordinate, subordinate). It has however been debated whether the neural signals in the IT regions are a reflection of such categorical hierarchy [Wen et al.,2018; Bracci et al., 2017]. Visual attributes of images that correlated with semantic and category dimensions may have confounded these prior results. Our study aimed to address this debate by building and comparing models using the DNN AlexNet, to explain the variance in representational dissimilarity matrix (RDM) of neural signals in the IT region. We found that mid and high level perceptual attributes of the DNN model contribute the most to neural RDMs in the IT region. Semantic categories, as in predefined structure, were moderately correlated with mid to high DNN layers (r = [0.24 - 0.36]). Variance partitioning analysis also showed that the IT neural representations were mostly explained by DNN layers, while semantic categorical RDMs brought little additional information. In light of these results, we propose future works should focus more on the specific role IT plays in facilitating the extraction and coding of visual features that lead to the emergence of categorical conceptualizations.
Neural network models of binocular depth perception
Our visual experience of living in a three-dimensional world is created from the information contained in the two-dimensional images projected into our eyes. The overlapping visual fields of the two eyes mean that their images are highly correlated, and that the small differences that are present represent an important cue to depth. Binocular neurons encode this information in a way that both maximises efficiency and optimises disparity tuning for the depth structures that are found in our natural environment. Neural network models provide a clear account of how these binocular neurons encode the local binocular disparity in images. These models can be expanded to multi-layer models that are sensitive to salient features of scenes, such as the orientations and discontinuities between surfaces. These deep neural network models have also shown the importance of binocular disparity for the segmentation of images into separate objects, in addition to the estimation of distance. These results demonstrate the usefulness of machine learning approaches as a tool for understanding biological vision.
The wonders and complexities of brain microstructure: Enabling biomedical engineering studies combining imaging and models
Brain microstructure plays a key role in driving the transport of drug molecules directly administered to the brain tissue as in Convection-Enhanced Delivery procedures. This study reports the first systematic attempt to characterize the cytoarchitecture of commissural, long association and projection fiber, namely: the corpus callosum, the fornix and the corona radiata. Ovine samples from three different subjects have been imaged using scanning electron microscope combined with focused ion beam milling. Particular focus has been given to the axons. For each tract, a 3D reconstruction of relatively large volumes (including a significant number of axons) has been performed. Namely, outer axonal ellipticity, outer axonal cross-sectional area and its relative perimeter have been measured. This study [1] provides useful insight into the fibrous organization of the tissue that can be described as composite material presenting elliptical tortuous tubular fibers, leading to a workflow to enable accurate simulations of drug delivery which include well-resolved microstructural features. As a demonstration of the use of these imaging and reconstruction techniques, our research analyses the hydraulic permeability of two white matter (WM) areas (corpus callosum and fornix) whose three-dimensional microstructure was reconstructed starting from the acquisition of the electron microscopy images. Considering that the white matter structure is mainly composed of elongated and parallel axons we computed the permeability along the parallel and perpendicular directions using computational fluid dynamics [2]. The results show a statistically significant difference between parallel and perpendicular permeability, with a ratio about 2 in both the white matter structures analysed, thus demonstrating their anisotropic behaviour. This is in line with the experimental results obtained using perfusion of brain matter [3]. Moreover, we find a significant difference between permeability in corpus callosum and fornix, which suggests that also the white matter heterogeneity should be considered when modelling drug transport in the brain. Our findings, that demonstrate and quantify the anisotropic and heterogeneous character of the white matter, represent a fundamental contribution not only for drug delivery modelling but also for shedding light on the interstitial transport mechanisms in the extracellular space. These and many other discoveries will be discussed during the talk." "1. https://www.researchsquare.com/article/rs-686577/v1, 2. https://www.pnas.org/content/118/36/e2105328118, 3. https://ieeexplore.ieee.org/abstract/document/9198110
Target detection in the natural world
Animal sensory systems are optimally adapted to those features typically encountered in natural surrounds, thus allowing neurons that have a limited bandwidth to encode almost impossibly large input ranges. Importantly, natural scenes are not random, and peripheral visual systems have therefore evolved to reduce the predictable redundancy. The vertebrate visual cortex is also optimally tuned to the spatial statistics of natural scenes, but much less is known about how the insect brain responds to these. We are redressing this deficiency using several techniques. Olga Dyakova uses exquisite image manipulation to give natural images unnatural image statistics, or vice versa. Marissa Holden then uses these images as stimuli in electrophysiological recordings of neurons in the fly optic lobes, to see how the brain codes for the statistics typically encountered in natural scenes, and Olga Dyakova measures the behavioral optomotor response on our trackball set-up.
Appearance-based impression formation
Despite the common advice “not to judge a book by its cover”, we form impressions of character within a second of seeing a stranger’s face. These impressions have widespread consequences for society and for the economy, making it vital that we have a clear theoretical understanding of which impressions are important and how they are formed. In my talk, I outline a data-driven approach to answering these questions, starting by building models of the key dimensions underlying impressions of naturalistic face images. Overall, my findings suggest deeper links between the fields of face perception and social stereotyping than have previously been recognised.
Get more from your ISH brain slices with Stalefish
The standard method for staining structures in the brain is to slice the brain into 2D sections. Each slice is treated using a technique such as in-situ hybridization to examine the spatial expression of a particular molecule at a given developmental timepoint. Depending on the brain structures being studied, slices can be made coronally, sagitally, or at any angle that is thought to be optimal for analysis. However, assimilating the information presented in the 2D slice images to gain quantitiative and informative 3D expression patterns is challenging. Even if expression levels are presented as voxels, to give 3D expression clouds, it can be difficult to compare expression across individuals and analysing such data requires significant expertise and imagination. In this talk, I will describe a new approach to examining histology slices, in which the user defines the brain structure of interest by drawing curves around it on each slice in a set and the depth of tissue from which to sample expression. The sampled 'curves' are then assembled into a 3D surface, which can then be transformed onto a common reference frame for comparative analysis. I will show how other neuroscientists can obtain and use the tool, which is called Stalefish, to analyse their own image data with no (or minimal) changes to their slice preparation workflow.
Measuring relevant features of the social and physical environment with imagery
The efficacy of images to create quantitative measures of urban perception has been explored in psychology, social science, urban planning and architecture over the last 50 years. The ability to scale these measurements has become possible only in the last decade, due to increased urban surveillance in the form of street view and satellite imagery, and the accessibility of such data. This talk will present a series of projects which make use of imagery and CNNs to predict, measure and interpret the social and physical environments of our cities.
Learning the structure and investigating the geometry of complex networks
Networks are widely used as mathematical models of complex systems across many scientific disciplines, and in particular within neuroscience. In this talk, we introduce two aspects of our collaborative research: (1) machine learning and networks, and (2) graph dimensionality. Machine learning and networks. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and sometimes overlapping) characteristics of a network. We have developed hcga, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. Taking inspiration from hctsa, hcga offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that hcga outperforms other methodologies (including deep learning) on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features, which we exemplify on a dataset of neuronal morphologies images. Graph dimensionality. Dimension is a fundamental property of objects and the space in which they are embedded. Yet ideal notions of dimension, as in Euclidean spaces, do not always translate to physical spaces, which can be constrained by boundaries and distorted by inhomogeneities, or to intrinsically discrete systems such as networks. Deviating from approaches based on fractals, here, we present a new framework to define intrinsic notions of dimension on networks, the relative, local and global dimension. We showcase our method on various physical systems.
Analogical Reasoning Plus: Why Dissimilarities Matter
Analogical reasoning remains foundational to the human ability to forge meaningful patterns within the sea of information that continually inundates the senses. Yet, meaningful patterns rely not only on the recognition of attributional similarities but also dissimilarities. Just as the perception of images rests on the juxtaposition of lightness and darkness, reasoning relationally requires systematic attention to both similarities and dissimilarities. With that awareness, my colleagues and I have expanded the study of relational reasoning beyond analogous reasoning and attributional similarities to highlight forms based on the nature of core dissimilarities: anomalous, antinomous, and antithetical reasoning. In this presentation, I will delineate the character of these relational reasoning forms; summarize procedures and measures used to assess them; overview key research findings; and describe how the forms of relational reasoning work together in the performance of complex problem solving. Finally, I will share critical next steps for research which has implications for instructional practice.
Introducing YAPiC: An Open Source tool for biologists to perform complex image segmentation with deep learning
Robust detection of biological structures such as neuronal dendrites in brightfield micrographs, tumor tissue in histological slides, or pathological brain regions in MRI scans is a fundamental task in bio-image analysis. Detection of those structures requests complex decision making which is often impossible with current image analysis software, and therefore typically executed by humans in a tedious and time-consuming manual procedure. Supervised pixel classification based on Deep Convolutional Neural Networks (DNNs) is currently emerging as the most promising technique to solve such complex region detection tasks. Here, a self-learning artificial neural network is trained with a small set of manually annotated images to eventually identify the trained structures from large image data sets in a fully automated way. While supervised pixel classification based on faster machine learning algorithms like Random Forests are nowadays part of the standard toolbox of bio-image analysts (e.g. Ilastik), the currently emerging tools based on deep learning are still rarely used. There is also not much experience in the community how much training data has to be collected, to obtain a reasonable prediction result with deep learning based approaches. Our software YAPiC (Yet Another Pixel Classifier) provides an easy-to-use Python- and command line interface and is purely designed for intuitive pixel classification of multidimensional images with DNNs. With the aim to integrate well in the current open source ecosystem, YAPiC utilizes the Ilastik user interface in combination with a high performance GPU server for model training and prediction. Numerous research groups at our institute have already successfully applied YAPiC for a variety of tasks. From our experience, a surprisingly low amount of sparse label data is needed to train a sufficiently working classifier for typical bioimaging applications. Not least because of this, YAPiC has become the "standard weapon” for our core facility to detect objects in hard-to-segement images. We would like to present some use cases like cell classification in high content screening, tissue detection in histological slides, quantification of neural outgrowth in phase contrast time series, or actin filament detection in transmission electron microscopy.
Statistical Summary Representations in Identity Learning: Exemplar-Independent Incidental Recognition
The literature suggests that ensemble coding, the ability to represent the gist of sets, may be an underlying mechanism for becoming familiar with newly encountered faces. This phenomenon was investigated by introducing a new training paradigm that involves incidental learning of target identities interspersed among distractors. The effectiveness of this training paradigm was explored in Study 1, which revealed that unfamiliar observers who learned the faces incidentally performed just as well as the observers who were instructed to learn the faces, and the intervening distractors did not disrupt familiarization. Using the same training paradigm, ensemble coding was investigated as an underlying mechanism for face familiarization in Study 2 by measuring familiarity with the targets at different time points using average images created either by seen or unseen encounters of the target. The results revealed that observers whose familiarity was tested using seen averages outperformed the observers who were tested using unseen averages, however, this discrepancy diminished over time. In other words, successful recognition of the target faces became less reliant on the previously encountered exemplars over time, suggesting an exemplar-independent representation that is likely achieved through ensemble coding. Taken together, the results from the current experiment provide direct evidence for ensemble coding as a viable underlying mechanism for face familiarization, that faces that are interspersed among distractors can be learned incidentally.
Characterising the brain representations behind variations in real-world visual behaviour
Not all individuals are equally competent at recognizing the faces they interact with. Revealing how the brains of different individuals support variations in this ability is a crucial step to develop an understanding of real-world human visual behaviour. In this talk, I will present findings from a large high-density EEG dataset (>100k trials of participants processing various stimulus categories) and computational approaches which aimed to characterise the brain representations behind real-world proficiency of “super-recognizers”—individuals at the top of face recognition ability spectrum. Using decoding analysis of time-resolved EEG patterns, we predicted with high precision the trial-by-trial activity of super-recognizers participants, and showed that evidence for face recognition ability variations is disseminated along early, intermediate and late brain processing steps. Computational modeling of the underlying brain activity uncovered two representational signatures supporting higher face recognition ability—i) mid-level visual & ii) semantic computations. Both components were dissociable in brain processing-time (the first around the N170, the last around the P600) and levels of computations (the first emerging from mid-level layers of visual Convolutional Neural Networks, the last from a semantic model characterising sentence descriptions of images). I will conclude by presenting ongoing analyses from a well-known case of acquired prosopagnosia (PS) using similar computational modeling of high-density EEG activity.
Do leader cells drive collective behavior in Dictyostelium Discoideum amoeba colonies?
Dictyostelium Discoideum (DD) are a fascinating single-cellular organism. When nutrients are plentiful, the DD cells act as autonomous individuals foraging their local vicinity. At the onset of starvation, a few (<0.1%) cells begin communicating with others by emitting a spike in the chemoattractant protein cyclic-AMP. Nearby cells sense the chemical gradient and respond by moving toward it and emitting a cyclic-AMP spike of their own. Cyclic-AMP activity increases over time, and eventually a spiral wave emerges, attracting hundreds of thousands of cells to an aggregation center. How DD cells go from autonomous individuals to a collective entity remains an open question for more than 60 years--a question whose answer would shed light on the emergence of multi-cellular life. Recently, trans-scale imaging has allowed the ability to sense the cyclic-AMP activity at both cell and colony levels. Using both the images as well as toy simulation models, this research aims to clarify whether the activity at the colony level is in fact initiated by a few cells, which may be deemed "leader" or "pacemaker" cells. In this talk, I will demonstrate the use of information-theoretic techniques to classify leaders and followers based on trajectory data, as well as to infer the domain of interaction of leader cells. We validate the techniques on toy models where leaders and followers are known, and then try to answer the question in real data--do leader cells drive collective behavior in DD colonies?
Memorability: Prioritizing visual information for memory
There is a surprising consistency in the images we remember and forget – across observers, certain images are intrinsically more memorable than others in spite of our diverse individual experiences. The perception of images at different memorability levels also results in stereotyped patterns in visual and mnemonic regions in the brain, regardless of an individual’s actual memory for that item. In this talk, Dr. Bainbridge will discuss our current neuroscientific understanding of how memorability is represented in patterns in the brain, potentially serving as a signal for how stimulus information is prioritized for eventual memory encoding.
Faces influence saccade programming
Several studies have showed that face stimuli elicit extremely fast and involuntary saccadic responses toward them, relative to other categories of visual stimuli. In the talk, I will mainly focus on a quite recent research done in our team that investigated to what extent face stimuli influence the programming and execution of saccades. In this research, two experiments were performed using a saccadic choice task: two images (one with a face, one with a vehicle) were simultaneously displayed in the left and right visual fields of participants who had to execute a saccade toward the image (Experiment 1) or toward a cross added in the center of the image (Experiment 2) containing a target stimulus (a face or a vehicle). As expected participants were faster to execute a saccade toward a face than toward a vehicle and did less errors. We also observed shorter saccades toward vehicle than face targets, even if participants were explicitly asked to perform their saccades toward a specific location (Experiment 2). Further analyses, that I will detailed in the talk, showed that error saccades might be interrupted in mid-fight to initiate a concurrently programmed corrective saccade.
BrainGlobe: a Python ecosystem for computational (neuro)anatomy
Neuroscientists routinely perform experiments aimed at recording or manipulating neural activity, uncovering physiological processes underlying brain function or elucidating aspects of brain anatomy. Understanding how the brain generates behaviour ultimately depends on merging the results of these experiments into a unified picture of brain anatomy and function. We present BrainGlobe, a new initiative aimed at developing common Python tools for computational neuroanatomy. These include cellfinder for fast, accurate cell detection in whole-brain microscopy images, brainreg for aligning images to a reference atlas, and brainrender for visualisation of anatomically registered data. These software packages are developed around the BrainGlobe Atlas API. This API provides a common Python interface to download and interact with reference brain atlases from multiple species (including human, mouse and larval zebrafish). This allows software to be developed agnostic to the atlas and species, increasing adoption and interoperability of software tools in neuroscience.
Application of Airy beam light sheet microscopy to examine early neurodevelopmental structures in 3D hiPSC-derived human cortical spheroids
The inability to observe relevant biological processes in vivo significantly restricts human neurodevelopmental research. Advances in appropriate in vitro model systems, including patient-specific human brain organoids and human cortical spheroids (hCSs), offer a pragmatic solution to this issue. In particular, hCSs are an accessible method for generating homogenous organoids of dorsal telencephalic fate, which recapitulate key aspects of human corticogenesis, including the formation of neural rosettes—in vitro correlates of the neural tube. These neurogenic niches give rise to neural progenitors that subsequently differentiate into neurons. Studies differentiating induced pluripotent stem cells (hiPSCs) in 2D have linked atypical formation of neural rosettes with neurodevelopmental disorders such as autism spectrum conditions. Thus far, however, conventional methods of tissue preparation in this field limit the ability to image these structures in three-dimensions within intact hCS or other 3D preparations. To overcome this limitation, we have sought to optimise a methodological approach to process hCSs to maximise the utility of a novel Airy-beam light sheet microscope (ALSM) to acquire high resolution volumetric images of internal structures within hCS representative of early developmental time points.
The neuroscience of color and what makes primates special
Among mammals, excellent color vision has evolved only in certain non-human primates. And yet, color is often assumed to be just a low-level stimulus feature with a modest role in encoding and recognizing objects. The rationale for this dogma is compelling: object recognition is excellent in grayscale images (consider black-and-white movies, where faces, places, objects, and story are readily apparent). In my talk I will discuss experiments in which we used color as a tool to uncover an organizational plan in inferior temporal cortex (parallel, multistage processing for places, faces, colors, and objects) and a visual-stimulus functional representation in prefrontal cortex (PFC). The discovery of an extensive network of color-biased domains within IT and PFC, regions implicated in high-level object vision and executive functions, compels a re-evaluation of the role of color in behavior. I will discuss behavioral studies prompted by the neurobiology that uncover a universal principle for color categorization across languages, the first systematic study of the color statistics of objects and a chromatic mechanism by which the brain may compute animacy, and a surprising paradoxical impact of memory on face color. Taken together, my talk will put forward the argument that color is not primarily for object recognition, but rather for the assessment of the likely behavioral relevance, or meaning, of the stuff we see.
Do deep learning latent spaces resemble human brain representations?
In recent years, artificial neural networks have demonstrated human-like or super-human performance in many tasks including image or speech recognition, natural language processing (NLP), playing Go, chess, poker and video-games. One remarkable feature of the resulting models is that they can develop very intuitive latent representations of their inputs. In these latent spaces, simple linear operations tend to give meaningful results, as in the well-known analogy QUEEN-WOMAN+MAN=KING. We postulate that human brain representations share essential properties with these deep learning latent spaces. To verify this, we test whether artificial latent spaces can serve as a good model for decoding brain activity. We report improvements over state-of-the-art performance for reconstructing seen and imagined face images from fMRI brain activation patterns, using the latent space of a GAN (Generative Adversarial Network) model coupled with a Variational AutoEncoder (VAE). With another GAN model (BigBiGAN), we can decode and reconstruct natural scenes of any category from the corresponding brain activity. Our results suggest that deep learning can produce high-level representations approaching those found in the human brain. Finally, I will discuss whether these deep learning latent spaces could be relevant to the study of consciousness.
A machine learning way to analyse white matter tractography streamlines / Application of artificial intelligence in correcting motion artifacts and reducing scan time in MRI
1. Embedding is all you need: A machine learning way to analyse white matter tractography streamlines - Dr Shenjun Zhong, Monash Biomedical Imaging Embedding white matter streamlines with various lengths into fixed-length latent vectors enables users to analyse them with general data mining techniques. However, finding a good embedding schema is still a challenging task as the existing methods based on spatial coordinates rely on manually engineered features, and/or labelled dataset. In this webinar, Dr Shenjun Zhong will discuss his novel deep learning model that identifies latent space and solves the problem of streamline clustering without needing labelled data. Dr Zhong is a Research Fellow and Informatics Officer at Monash Biomedical Imaging. His research interests are sequence modelling, reinforcement learning and federated learning in the general medical imaging domain. 2. Application of artificial intelligence in correcting motion artifacts and reducing scan time in MRI - Dr Kamlesh Pawar, Monash Biomedical imaging Magnetic Resonance Imaging (MRI) is a widely used imaging modality in clinics and research. Although MRI is useful it comes with an overhead of longer scan time compared to other medical imaging modalities. The longer scan times also make patients uncomfortable and even subtle movements during the scan may result in severe motion artifact in the images. In this seminar, Dr Kamlesh Pawar will discuss how artificial intelligence techniques can reduce scan time and correct motion artifacts. Dr Pawar is a Research Fellow at Monash Biomedical Imaging. His research interest includes deep learning, MR physics, MR image reconstruction and computer vision.
Uncertainty in perceptual decision-making
Whether we are deciding about Covid-related restrictions, estimating a ball’s trajectory when playing tennis, or interpreting radiological images – most any choice we make is based on uncertain evidence. How do we infer that information is more or less reliable when making these decisions? How does the brain represent knowledge of this uncertainty? In this talk, I will present recent neuroimaging data combined with novel analysis tools to address these questions. Our results indicate that sensory uncertainty can reliably be estimated from the human visual cortex on a trial-by-trial basis, and moreover that observers appear to rely on this uncertainty when making perceptual decisions.
Top-down Modulation in Human Visual Cortex
Human vision flaunts a remarkable ability to recognize objects in the surrounding environment even in the absence of complete visual representation of these objects. This process is done almost intuitively and it was not until scientists had to tackle this problem in computer vision that they noticed its complexity. While current advances in artificial vision systems have made great strides exceeding human level in normal vision tasks, it has yet to achieve a similar robustness level. One cause of this robustness is the extensive connectivity that is not limited to a feedforward hierarchical pathway similar to the current state-of-the-art deep convolutional neural networks but also comprises recurrent and top-down connections. They allow the human brain to enhance the neural representations of degraded images in concordance with meaningful representations stored in memory. The mechanisms by which these different pathways interact are still not understood. In this seminar, studies concerning the effect of recurrent and top-down modulation on the neural representations resulting from viewing blurred images will be presented. Those studies attempted to uncover the role of recurrent and top-down connections in human vision. The results presented challenge the notion of predictive coding as a mechanism for top-down modulation of visual information during natural vision. They show that neural representation enhancement (sharpening) appears to be a more dominant process of different levels of visual hierarchy. They also show that inference in visual recognition is achieved through a Bayesian process between incoming visual information and priors from deeper processing regions in the brain.
Global visual salience of competing stimuli
Current computational models of visual salience accurately predict the distribution of fixations on isolated visual stimuli. It is not known, however, whether the global salience of a stimulus, that is its effectiveness in the competition for attention with other stimuli, is a function of the local salience or an independent measure. Further, do task and familiarity with the competing images influence eye movements? In this talk, I will present the analysis of a computational model of the global salience of natural images. We trained a machine learning algorithm to learn the direction of the first saccade of participants who freely observed pairs of images. The pairs balanced the combinations of new and already seen images, as well as task and task-free trials. The coefficients of the model provided a reliable measure of the likelihood of each image to attract the first fixation when seen next to another image, that is their global salience. For example, images of close-up faces and images containing humans were consistently looked first and were assigned higher global salience. Interestingly, we found that global salience cannot be explained by the feature-driven local salience of images, the influence of task and familiarity was rather small and we reproduced the previously reported left-sided bias. This computational model of global salience allows to analyse multiple other aspects of human visual perception of competing stimuli. In the talk, I will also present our latest results from analysing the saccadic reaction time as a function of the global salience of the pair of images.
The Gist of False Memory
It has long been known that when viewing a set of images, we misjudge individual elements as being closer to the mean than they are (Hollingworth, 1910) and recall seeing the (absent) set mean (Deese, 1959; Roediger & McDermott (1995). Recent studies found that viewing sets of images, simultaneously or sequentially, leads to perception of set statistics (mean, range) with poor memory for individual elements. Ensemble perception was found for sets of simple images (e.g. circles varying in size or brightness; lines of varying orientation), complex objects (e.g. faces of varying emotion), as well as for objects belonging to the same category. When the viewed set does not include its mean or prototype, nevertheless, observers report and act as if they have seen this central image or object – a form of false memory. Physiologically, detailed sensory information at cortical input levels is processed hierarchically to form an integrated scene gist at higher levels. However, we are aware of the gist before the details. We propose that images and objects belonging to a set or category are represented as their gist, mean or prototype, plus individual differences from that gist. Under constrained viewing conditions, only the gist is perceived and remembered. This theory also provides a basis for compressed neural representation. Extending this theory to scenes and episodes supplies a generalized basis for false memories. They seem right, match generalized expectations, so are believable without challenging examination. This theory could be tested by analyzing the typicality of false memories, compared to rejected alternatives.
Learning Neurobiology with electric fish
Electric Gymnotiform fish live in muddy, shallow waters near the shore – hiding in the dense filamentous roots of floating plants such as Eichornia crassipes (“camalote”). They explore their surroundings by using a series of electric pulses that serve as self emitted carrier of electrosensory signals. This propagates at the speed of light through this spongiform habitat and is barely sensed by the lateral line of predators and prey. The emitted field polarizes the surroundings according to the difference in impedance with water which in turn modifies the profile of transcutaneous currents considered as an electrosensory image. Using this system, pulse Gymnotiformes create an electrosensory bubble where an object’s location, impedance, size and other characteristics are discriminated and probably recognized. Although consciousness is still not well-proven, cognitive functions as volition, attention, and path integration have been shown. Here I will summarize different aspects of the electromotor electrosensory loop of pulse Gymnotiforms. First, I will address how objects are polarized with a stereotyped but temporospatially complex electric field, consisting of brief pulses emitted at regular intervals. This relies on complex electric organs quasi periodically activated through an electromotor coordination system by a pacemaker in the medulla. Second, I will deal with the imaging mechanisms of pulse gymnotiform fish and the presence of two regions in the electrosensory field, a rostral region where the field time course is coherent and field vector direction is constant all along the electric organ discharge and a lateral region where the field time course is site specific and field vector direction describes a stereotyped 3D trajectory. Third, I will describe the electrosensory mosaic and their characteristics. Receptor and primary afferents correspond one to one showing subtypes optimally responding to the time course of the self generated pulse with a characteristic train of spikes. While polarized objects at the rostral region project their electric images on the perioral region where electrosensory receptor density, subtypes and central projection are maximal, the image of objects on the side recruit a single type of scattered receptors. Therefore, the rostral mosaic has been likened to an electrosensory fovea and its receptive field referred to as foveal field. The rest of the mosaic and field are referred to as peripheral. Finally, I will describe ongoing work on early processing structures. I will try to generate an integrated view, including anatomical and functional data obtained in vitro, acute experiments, and unitary recordings in freely moving fish. We have recently shown have shown that these fish tract allo-generated fields and the virtual fields generated by nearby objects in the presence of self-generated fields to explore the nearby environment. These data together with the presence of a multimodal receptor mosaic at the cutaneous surface particularly surrounding the mouth and an important role of proprioception in early sensory processing suggests the hypothesis that the active electrosensory system is part of a multimodal haptic sense.
Mechanism(s) of negative feedback from horizontal cells to cones and its consequence for (color) vision
Vision starts in the retina where images are transformed and coded into neuronal activity relevant for the brain. These coding steps function optimally over a wide range of conditions: from bright day on the beach to a moonless night. Under these very different conditions, specific retinal mechanisms continue to select relevant aspects of the visual world and send this information to the brain. We are studying the neuronal processing involved in these selection and adaptation processes. This knowledge is essential for understanding how the visual system works and forms the basis for research dedicated to restoring vision in blind people.
Student´s Oral Presentation III: Emotional State Classification Using Low-Cost Single-Channel Electroencephalography
Although electroencephalography (EEG) has been used in clinical and research studies for almost a century, recent technological advances have made the equipment and processing tools more accessible outside laboratory settings. These low-cost alternatives can achieve satisfactory results in experiments such as detecting event-related potentials and classifying cognitive states. In our research, we use low-cost single-channel EEG to classify brain activity during the presentation of images of opposite emotional valence from the OASIS database. Emotional classification has already been achieved using research-grade and commercial-grade equipment, but our approach pioneers the use of educational-grade equipment for said task. EEG data is collected with a Backyard Brains SpikerBox, a low-cost and open-source bioamplifier that can record a single-channel electric signal from a pair of electrodes placed on the scalp, and used to train machine learning classifiers.
Agency in the Stream of Consciousness: Perspectives from Cognitive Science and Buddhist Psychology
The stream of consciousness refers to ideas, images, and memories that meander across the mind when we are otherwise unoccupied. The standard view is that these thoughts are associationistic in character and they arise from subpersonal processes—we are for the most part passive observers of them. Drawing on a series of laboratory studies we have conducted as well as Buddhist models of mind, I argue that these views are importantly incorrect. On the alternative view I put forward, these thoughts arise from minimal decision processes, which lie in a grey zone: They are both manifestations of agency as well as obstacles to it.
Minimal Images: Beyond ‘Core Recognition
Domain Specificity in the Human Brain: What, Whether, and Why?
The last quarter century has provided extensive evidence that some regions of the human cortex are selectively engaged in processing a single specific domain of information, from faces, places, and bodies to language, music, and other people’s thoughts. This work dovetails with earlier theories in cognitive science highlighting domain specificity in human cognition, development, and evolution. But many questions remain unanswered about even the clearest cases of domain specificity in the brain, the selective engagement of the FFA, PPA, and EBA in the perception of faces, places, and bodies, respectively. First, these claims lack precision, saying little about what is computed and how, and relying on human judgements to decide what counts as a face, place, or body. Second, they provide no account of the reliably varying responses of these regions across different “preferred” images, or across different “nonpreferred” images for each category. Third, the category selectivity of each region is vulnerable to refutation if any of the vast set of as-yet-untested nonpreferred images turns out to produce a stronger response than preferred images for that region. Fourth, and most fundamentally, they provide no account of why, from a computational point of view, brains should exhibit this striking degree of functional specificity in the first place, and why we should have the particular visual specializations we do, for faces, places, and bodies, but not (apparently) for food or snakes. The advent of convolutional neural networks (CNNs) to model visual processing in the ventral pathway has opened up many opportunities to address these long-standing questions in new ways. I will describe ongoing efforts in our lab to harness CNNs to do just that.
Natural stimulus encoding in the retina with linear and nonlinear receptive fields
Popular notions of how the retina encodes visual stimuli typically focus on the center-surround receptive fields of retinal ganglion cells, the output neurons of the retina. In this view, the receptive field acts as a linear filter on the visual stimulus, highlighting spatial contrast and providing efficient representations of natural images. Yet, we also know that many ganglion cells respond vigorously to fine spatial gratings that should not activate the linear filter of the receptive field. Thus, ganglion cells may integrate visual signals nonlinearly across space. In this talk, I will discuss how these (and other) nonlinearities relate to the encoding of natural visual stimuli in the retina. Based on electrophysiological recordings of ganglion and bipolar cells from mouse and salamander retina, I will present methods for assessing nonlinear processing in different cell types and examine their importance and potential function under natural stimulation.
Mind the gradient: context-dependent selectivity to natural images in the retina revealed with a novel perturbative approach
COSYNE 2022
Mind the gradient: context-dependent selectivity to natural images in the retina revealed with a novel perturbative approach
COSYNE 2022
Predictive processing of natural images by V1 firing rates revealed by self-supervised deep neural networks
COSYNE 2022
Predictive processing of natural images by V1 firing rates revealed by self-supervised deep neural networks
COSYNE 2022
A Large Dataset of Macaque V1 Responses to Natural Images Revealed Complexity in V1 Neural Codes
COSYNE 2023
Efficient coding of chromatic natural images reveals unique hues
COSYNE 2025
A novel approach to obtain high-resolution images of the electrical activity of the spinal cord.
COSYNE 2025
Selectivity of neurons in macaque V4 for object and texture images
COSYNE 2025
Comparing CNNs and the brain: sensitivity to images altered in the frequency domain
Neuromatch 5
Differential representation of natural and manmade images in the human ventral visual stream
Neuromatch 5