Convolutional Neural Networks
convolutional neural networks
Latest
Connecting performance benefits on visual tasks to neural mechanisms using convolutional neural networks
Behavioral studies have demonstrated that certain task features reliably enhance classification performance for challenging visual stimuli. These include extended image presentation time and the valid cueing of attention. Here, I will show how convolutional neural networks can be used as a model of the visual system that connects neural activity changes with such performance changes. Specifically, I will discuss how different anatomical forms of recurrence can account for better classification of noisy and degraded images with extended processing time. I will then show how experimentally-observed neural activity changes associated with feature attention lead to observed performance changes on detection tasks. I will also discuss the implications these results have for how we identify the neural mechanisms and architectures important for behavior.
Feedforward and feedback processes in visual recognition
Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching – and sometimes even surpassing – human accuracy on a variety of visual recognition tasks. In this talk, however, I will show that these neural networks and their recent extensions exhibit a limited ability to solve seemingly simple visual reasoning problems involving incremental grouping, similarity, and spatial relation judgments. Our group has developed a recurrent network model of classical and extra-classical receptive field circuits that is constrained by the anatomy and physiology of the visual cortex. The model was shown to account for diverse visual illusions providing computational evidence for a novel canonical circuit that is shared across visual modalities. I will show that this computational neuroscience model can be turned into a modern end-to-end trainable deep recurrent network architecture that addresses some of the shortcomings exhibited by state-of-the-art feedforward networks for solving complex visual reasoning tasks. This suggests that neuroscience may contribute powerful new ideas and approaches to computer science and artificial intelligence.
Probabilistic computation in natural vision
A central goal of vision science is to understand the principles underlying the perception and neural coding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a rich spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception. In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we are leveraging this framework to develop deep probabilistic algorithms for natural image and video segmentation.
Top-down Modulation in Human Visual Cortex
Human vision flaunts a remarkable ability to recognize objects in the surrounding environment even in the absence of complete visual representation of these objects. This process is done almost intuitively and it was not until scientists had to tackle this problem in computer vision that they noticed its complexity. While current advances in artificial vision systems have made great strides exceeding human level in normal vision tasks, it has yet to achieve a similar robustness level. One cause of this robustness is the extensive connectivity that is not limited to a feedforward hierarchical pathway similar to the current state-of-the-art deep convolutional neural networks but also comprises recurrent and top-down connections. They allow the human brain to enhance the neural representations of degraded images in concordance with meaningful representations stored in memory. The mechanisms by which these different pathways interact are still not understood. In this seminar, studies concerning the effect of recurrent and top-down modulation on the neural representations resulting from viewing blurred images will be presented. Those studies attempted to uncover the role of recurrent and top-down connections in human vision. The results presented challenge the notion of predictive coding as a mechanism for top-down modulation of visual information during natural vision. They show that neural representation enhancement (sharpening) appears to be a more dominant process of different levels of visual hierarchy. They also show that inference in visual recognition is achieved through a Bayesian process between incoming visual information and priors from deeper processing regions in the brain.
Crowding and the Architecture of the Visual System
Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. I will present our work using visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. We focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. We show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, we show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. We provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects. In crowding, visual elements interfere across space. To study how elements interfere over time, we use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. We psychophysically characterize the temporal structure of this interference and propose a simple computational model. Our results support the idea that perception is a discrete process. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.
Domain Specificity in the Human Brain: What, Whether, and Why?
The last quarter century has provided extensive evidence that some regions of the human cortex are selectively engaged in processing a single specific domain of information, from faces, places, and bodies to language, music, and other people’s thoughts. This work dovetails with earlier theories in cognitive science highlighting domain specificity in human cognition, development, and evolution. But many questions remain unanswered about even the clearest cases of domain specificity in the brain, the selective engagement of the FFA, PPA, and EBA in the perception of faces, places, and bodies, respectively. First, these claims lack precision, saying little about what is computed and how, and relying on human judgements to decide what counts as a face, place, or body. Second, they provide no account of the reliably varying responses of these regions across different “preferred” images, or across different “nonpreferred” images for each category. Third, the category selectivity of each region is vulnerable to refutation if any of the vast set of as-yet-untested nonpreferred images turns out to produce a stronger response than preferred images for that region. Fourth, and most fundamentally, they provide no account of why, from a computational point of view, brains should exhibit this striking degree of functional specificity in the first place, and why we should have the particular visual specializations we do, for faces, places, and bodies, but not (apparently) for food or snakes. The advent of convolutional neural networks (CNNs) to model visual processing in the ventral pathway has opened up many opportunities to address these long-standing questions in new ways. I will describe ongoing efforts in our lab to harness CNNs to do just that.
convolutional neural networks coverage
6 items
Explore how convolutional neural networks research is advancing inside Neuro.
Visit domain