Convolutional Neural Network
convolutional neural network
Connecting performance benefits on visual tasks to neural mechanisms using convolutional neural networks
Behavioral studies have demonstrated that certain task features reliably enhance classification performance for challenging visual stimuli. These include extended image presentation time and the valid cueing of attention. Here, I will show how convolutional neural networks can be used as a model of the visual system that connects neural activity changes with such performance changes. Specifically, I will discuss how different anatomical forms of recurrence can account for better classification of noisy and degraded images with extended processing time. I will then show how experimentally-observed neural activity changes associated with feature attention lead to observed performance changes on detection tasks. I will also discuss the implications these results have for how we identify the neural mechanisms and architectures important for behavior.
Feedforward and feedback processes in visual recognition
Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching – and sometimes even surpassing – human accuracy on a variety of visual recognition tasks. In this talk, however, I will show that these neural networks and their recent extensions exhibit a limited ability to solve seemingly simple visual reasoning problems involving incremental grouping, similarity, and spatial relation judgments. Our group has developed a recurrent network model of classical and extra-classical receptive field circuits that is constrained by the anatomy and physiology of the visual cortex. The model was shown to account for diverse visual illusions providing computational evidence for a novel canonical circuit that is shared across visual modalities. I will show that this computational neuroscience model can be turned into a modern end-to-end trainable deep recurrent network architecture that addresses some of the shortcomings exhibited by state-of-the-art feedforward networks for solving complex visual reasoning tasks. This suggests that neuroscience may contribute powerful new ideas and approaches to computer science and artificial intelligence.
Adaptive neural network classifier for decoding finger movements
While non-invasive Brain-to-Computer interface can accurately classify the lateralization of hand moments, the distinction of fingers activation in the same hand is limited by their local and overlapping representation in the motor cortex. In particular, the low signal-to-noise ratio restrains the opportunity to identify meaningful patterns in a supervised fashion. Here we combined Magnetoencephalography (MEG) recordings with advanced decoding strategy to classify finger movements at single trial level. We recorded eight subjects performing a serial reaction time task, where they pressed four buttons with left and right index and middle fingers. We evaluated the classification performance of hand and finger movements with increasingly complex approaches: supervised common spatial patterns and logistic regression (CSP + LR) and unsupervised linear finite convolutional neural network (LF-CNN). The right vs left fingers classification performance was accurate above 90% for all methods. However, the classification of the single finger provided the following accuracy: CSP+SVM : – 68 ± 7%, LF-CNN : 71 ± 10%. CNN methods allowed the inspection of spatial and spectral patterns, which reflected activity in the motor cortex in the theta and alpha ranges. Thus, we have shown that the use of CNN in decoding MEG single trials with low signal to noise ratio is a promising approach that, in turn, could be extended to a manifold of problems in clinical and cognitive neuroscience.
Time as a continuous dimension in natural and artificial networks
Neural representations of time are central to our understanding of the world around us. I review cognitive, neurophysiological and theoretical work that converges on three simple ideas. First, the time of past events is remembered via populations of neurons with a continuum of functional time constants. Second, these time constants evenly tile the log time axis. This results in a neural Weber-Fechner scale for time which can support behavioral Weber-Fechner laws and characteristic behavioral effects in memory experiments. Third, these populations appear as dual pairs---one type of population contains cells that change firing rate monotonically over time and a second type of population that has circumscribed temporal receptive fields. These ideas can be used to build artificial neural networks that have novel properties. Of particular interest, a convolutional neural network built using these principles can generalize to arbitrary rescaling of its inputs. That is, after learning to perform a classification task on a time series presented at one speed, it successfully classifies stimuli presented slowed down or sped up. This result illustrates the point that this confluence of ideas originating in cognitive psychology and measured in the mammalian brain could have wide-reaching impacts on AI research.
Probabilistic computation in natural vision
A central goal of vision science is to understand the principles underlying the perception and neural coding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a rich spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception. In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we are leveraging this framework to develop deep probabilistic algorithms for natural image and video segmentation.
NMC4 Short Talk: Directly interfacing brain and deep networks exposes non-hierarchical visual processing
A recent approach to understanding the mammalian visual system is to show correspondence between the sequential stages of processing in the ventral stream with layers in a deep convolutional neural network (DCNN), providing evidence that visual information is processed hierarchically, with successive stages containing ever higher-level information. However, correspondence is usually defined as shared variance between brain region and model layer. We propose that task-relevant variance is a stricter test: If a DCNN layer corresponds to a brain region, then substituting the model’s activity with brain activity should successfully drive the model’s object recognition decision. Using this approach on three datasets (human fMRI and macaque neuron firing rates) we found that in contrast to the hierarchical view, all ventral stream regions corresponded best to later model layers. That is, all regions contain high-level information about object category. We hypothesised that this is due to recurrent connections propagating high-level visual information from later regions back to early regions, in contrast to the exclusively feed-forward connectivity of DCNNs. Using task-relevant correspondence with a late DCNN layer akin to a tracer, we used Granger causal modelling to show late-DCNN correspondence in IT drives correspondence in V4. Our analysis suggests, effectively, that no ventral stream region can be appropriately characterised as ‘early’ beyond 70ms after stimulus presentation, challenging hierarchical models. More broadly, we ask what it means for a model component and brain region to correspond: beyond quantifying shared variance, we must consider the functional role in the computation. We also demonstrate that using a DCNN to decode high-level conceptual information from ventral stream produces a general mapping from brain to model activation space, which generalises to novel classes held-out from training data. This suggests future possibilities for brain-machine interface with high-level conceptual information, beyond current designs that interface with the sensorimotor periphery.
Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features
It is an open question whether preferences for visual art can be lawfully predicted from the basic constituent elements of a visual image. Here, we developed and tested a computational framework to investigate how aesthetic values are formed. We show that it is possible to explain human preferences for a visual art piece based on a mixture of low- and high-level features of the image. Subjective value ratings could be predicted not only within but also across individuals, using a regression model with a common set of interpretable features. We also show that the features predicting aesthetic preference can emerge hierarchically within a deep convolutional neural network trained only for object recognition. Our findings suggest that human preferences for art can be explained at least in part as a systematic integration over the underlying visual features of an image.
Introducing YAPiC: An Open Source tool for biologists to perform complex image segmentation with deep learning
Robust detection of biological structures such as neuronal dendrites in brightfield micrographs, tumor tissue in histological slides, or pathological brain regions in MRI scans is a fundamental task in bio-image analysis. Detection of those structures requests complex decision making which is often impossible with current image analysis software, and therefore typically executed by humans in a tedious and time-consuming manual procedure. Supervised pixel classification based on Deep Convolutional Neural Networks (DNNs) is currently emerging as the most promising technique to solve such complex region detection tasks. Here, a self-learning artificial neural network is trained with a small set of manually annotated images to eventually identify the trained structures from large image data sets in a fully automated way. While supervised pixel classification based on faster machine learning algorithms like Random Forests are nowadays part of the standard toolbox of bio-image analysts (e.g. Ilastik), the currently emerging tools based on deep learning are still rarely used. There is also not much experience in the community how much training data has to be collected, to obtain a reasonable prediction result with deep learning based approaches. Our software YAPiC (Yet Another Pixel Classifier) provides an easy-to-use Python- and command line interface and is purely designed for intuitive pixel classification of multidimensional images with DNNs. With the aim to integrate well in the current open source ecosystem, YAPiC utilizes the Ilastik user interface in combination with a high performance GPU server for model training and prediction. Numerous research groups at our institute have already successfully applied YAPiC for a variety of tasks. From our experience, a surprisingly low amount of sparse label data is needed to train a sufficiently working classifier for typical bioimaging applications. Not least because of this, YAPiC has become the "standard weapon” for our core facility to detect objects in hard-to-segement images. We would like to present some use cases like cell classification in high content screening, tissue detection in histological slides, quantification of neural outgrowth in phase contrast time series, or actin filament detection in transmission electron microscopy.
Characterising the brain representations behind variations in real-world visual behaviour
Not all individuals are equally competent at recognizing the faces they interact with. Revealing how the brains of different individuals support variations in this ability is a crucial step to develop an understanding of real-world human visual behaviour. In this talk, I will present findings from a large high-density EEG dataset (>100k trials of participants processing various stimulus categories) and computational approaches which aimed to characterise the brain representations behind real-world proficiency of “super-recognizers”—individuals at the top of face recognition ability spectrum. Using decoding analysis of time-resolved EEG patterns, we predicted with high precision the trial-by-trial activity of super-recognizers participants, and showed that evidence for face recognition ability variations is disseminated along early, intermediate and late brain processing steps. Computational modeling of the underlying brain activity uncovered two representational signatures supporting higher face recognition ability—i) mid-level visual & ii) semantic computations. Both components were dissociable in brain processing-time (the first around the N170, the last around the P600) and levels of computations (the first emerging from mid-level layers of visual Convolutional Neural Networks, the last from a semantic model characterising sentence descriptions of images). I will conclude by presenting ongoing analyses from a well-known case of acquired prosopagnosia (PS) using similar computational modeling of high-density EEG activity.
Top-down Modulation in Human Visual Cortex
Human vision flaunts a remarkable ability to recognize objects in the surrounding environment even in the absence of complete visual representation of these objects. This process is done almost intuitively and it was not until scientists had to tackle this problem in computer vision that they noticed its complexity. While current advances in artificial vision systems have made great strides exceeding human level in normal vision tasks, it has yet to achieve a similar robustness level. One cause of this robustness is the extensive connectivity that is not limited to a feedforward hierarchical pathway similar to the current state-of-the-art deep convolutional neural networks but also comprises recurrent and top-down connections. They allow the human brain to enhance the neural representations of degraded images in concordance with meaningful representations stored in memory. The mechanisms by which these different pathways interact are still not understood. In this seminar, studies concerning the effect of recurrent and top-down modulation on the neural representations resulting from viewing blurred images will be presented. Those studies attempted to uncover the role of recurrent and top-down connections in human vision. The results presented challenge the notion of predictive coding as a mechanism for top-down modulation of visual information during natural vision. They show that neural representation enhancement (sharpening) appears to be a more dominant process of different levels of visual hierarchy. They also show that inference in visual recognition is achieved through a Bayesian process between incoming visual information and priors from deeper processing regions in the brain.
Crowding and the Architecture of the Visual System
Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. I will present our work using visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. We focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. We show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, we show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. We provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects. In crowding, visual elements interfere across space. To study how elements interfere over time, we use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. We psychophysically characterize the temporal structure of this interference and propose a simple computational model. Our results support the idea that perception is a discrete process. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.
Domain Specificity in the Human Brain: What, Whether, and Why?
The last quarter century has provided extensive evidence that some regions of the human cortex are selectively engaged in processing a single specific domain of information, from faces, places, and bodies to language, music, and other people’s thoughts. This work dovetails with earlier theories in cognitive science highlighting domain specificity in human cognition, development, and evolution. But many questions remain unanswered about even the clearest cases of domain specificity in the brain, the selective engagement of the FFA, PPA, and EBA in the perception of faces, places, and bodies, respectively. First, these claims lack precision, saying little about what is computed and how, and relying on human judgements to decide what counts as a face, place, or body. Second, they provide no account of the reliably varying responses of these regions across different “preferred” images, or across different “nonpreferred” images for each category. Third, the category selectivity of each region is vulnerable to refutation if any of the vast set of as-yet-untested nonpreferred images turns out to produce a stronger response than preferred images for that region. Fourth, and most fundamentally, they provide no account of why, from a computational point of view, brains should exhibit this striking degree of functional specificity in the first place, and why we should have the particular visual specializations we do, for faces, places, and bodies, but not (apparently) for food or snakes. The advent of convolutional neural networks (CNNs) to model visual processing in the ventral pathway has opened up many opportunities to address these long-standing questions in new ways. I will describe ongoing efforts in our lab to harness CNNs to do just that.
Predicting V1 contextual modulation and neural tuning using a convolutional neural network
Bernstein Conference 2024
Using 1D-convolutional neural networks to detect and interpret sharp-wave ripples
COSYNE 2022
Using 1D-convolutional neural networks to detect and interpret sharp-wave ripples
COSYNE 2022
Convolutional neural networks describe encoding subspaces of local circuits in auditory cortex
COSYNE 2025
Integrating macrostructural and microstructural representations of white matter through convolutional neural networks
FENS Forum 2024
Using retinotopic mapping in convolutional neural networks for object categorization leads to saliency-based visual object localization
FENS Forum 2024
Mooney Face Image Processing in Deep Convolutional Neural Networks Compared to Humans
Neuromatch 5
Visualizing surround suppression in deep convolutional neural networks
Neuromatch 5