Learning
learning architecture
From Spiking Predictive Coding to Learning Abstract Object Representation
In a first part of the talk, I will present Predictive Coding Light (PCL), a novel unsupervised learning architecture for spiking neural networks. In contrast to conventional predictive coding approaches, which only transmit prediction errors to higher processing stages, PCL learns inhibitory lateral and top-down connectivity to suppress the most predictable spikes and passes a compressed representation of the input to higher processing stages. We show that PCL reproduces a range of biological findings and exhibits a favorable tradeoff between energy consumption and downstream classification performance on challenging benchmarks. A second part of the talk will feature our lab’s efforts to explain how infants and toddlers might learn abstract object representations without supervision. I will present deep learning models that exploit the temporal and multimodal structure of their sensory inputs to learn representations of individual objects, object categories, or abstract super-categories such as „kitchen object“ in a fully unsupervised fashion. These models offer a parsimonious account of how abstract semantic knowledge may be rooted in children's embodied first-person experiences.
Generative models for video games (rescheduled)
Developing agents capable of modeling complex environments and human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent advances of my team at Microsoft Research towards scalable machine learning architectures that effectively capture human gameplay data. In the first part of my talk, I will focus on diffusion models as generative models of human behavior. Previously shown to have impressive image generation capabilities, I present insights that unlock applications to imitation learning for sequential decision making. In the second part of my talk, I discuss a recent project taking ideas from language modeling to build a generative sequence model of an Xbox game.
Generative models for video games
Developing agents capable of modeling complex environments and human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent advances of my team at Microsoft Research towards scalable machine learning architectures that effectively capture human gameplay data. In the first part of my talk, I will focus on diffusion models as generative models of human behavior. Previously shown to have impressive image generation capabilities, I present insights that unlock applications to imitation learning for sequential decision making. In the second part of my talk, I discuss a recent project taking ideas from language modeling to build a generative sequence model of an Xbox game.
Beyond Biologically Plausible Spiking Networks for Neuromorphic Computing
Biologically plausible spiking neural networks (SNNs) are an emerging architecture for deep learning tasks due to their energy efficiency when implemented on neuromorphic hardware. However, many of the biological features are at best irrelevant and at worst counterproductive when evaluated in the context of task performance and suitability for neuromorphic hardware. In this talk, I will present an alternative paradigm to design deep learning architectures with good task performance in real-world benchmarks while maintaining all the advantages of SNNs. We do this by focusing on two main features – event-based computation and activity sparsity. Starting from the performant gated recurrent unit (GRU) deep learning architecture, we modify it to make it event-based and activity-sparse. The resulting event-based GRU (EGRU) is extremely efficient for both training and inference. At the same time, it achieves performance close to conventional deep learning architectures in challenging tasks such as language modelling, gesture recognition and sequential MNIST.
General purpose event-based architectures for deep learning
Biologically plausible spiking neural networks (SNNs) are an emerging architecture for deep learning tasks due to their energy efficiency when implemented on neuromorphic hardware. However, many of the biological features are at best irrelevant and at worst counterproductive when evaluated in the context of task performance and suitability for neuromorphic hardware. In this talk, I will present an alternative paradigm to design deep learning architectures with good task performance in real-world benchmarks while maintaining all the advantages of SNNs. We do this by focusing on two main features -- event-based computation and activity sparsity. Starting from the performant gated recurrent unit (GRU) deep learning architecture, we modify it to make it event-based and activity-sparse. The resulting event-based GRU (EGRU) is extremely efficient for both training and inference. At the same time, it achieves performance close to conventional deep learning architectures in challenging tasks such as language modelling, gesture recognition and sequential MNIST
Computational psychophysics at the intersection of theory, data and models
Behavioural measurements are often overlooked by computational neuroscientists, who prefer to focus on electrophysiological recordings or neuroimaging data. This attitude is largely due to perceived lack of depth/richness in relation to behavioural datasets. I will show how contemporary psychophysics can deliver extremely rich and highly constraining datasets that naturally interface with computational modelling. More specifically, I will demonstrate how psychophysics can be used to guide/constrain/refine computational models, and how models can be exploited to design/motivate/interpret psychophysical experiments. Examples will span a wide range of topics (from feature detection to natural scene understanding) and methodologies (from cascade models to deep learning architectures).
Crowding and the Architecture of the Visual System
Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. I will present our work using visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. We focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. We show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, we show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. We provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects. In crowding, visual elements interfere across space. To study how elements interfere over time, we use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. We psychophysically characterize the temporal structure of this interference and propose a simple computational model. Our results support the idea that perception is a discrete process. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.
E-prop: A biologically inspired paradigm for learning in recurrent networks of spiking neurons
Transformative advances in deep learning, such as deep reinforcement learning, usually rely on gradient-based learning methods such as backpropagation through time (BPTT) as a core learning algorithm. However, BPTT is not argued to be biologically plausible, since it requires to a propagate gradients backwards in time and across neurons. Here, we propose e-prop, a novel gradient-based learning method with local and online weight update rules for recurrent neural networks, and in particular recurrent spiking neural networks (RSNNs). As a result, e-prop has the potential to provide a substantial fraction of the power of deep learning to RSNNs. In this presentation, we will motivate e-prop from the perspective of recent insights in neuroscience and show how these have to be combined to form an algorithm for online gradient descent. The mathematical results will be supported by empirical evidence in supervised and reinforcement learning tasks. We will also discuss how limitations that are inherited from gradient-based learning methods, such as sample-efficiency, can be addressed by considering an evolution-like optimization that enhances learning on particular task families. The emerging learning architecture can be used to learn tasks by a single demonstration, hence enabling one-shot learning.
Networks thinking themselves
Human learners acquire not only disconnected bits of information, but complex interconnected networks of relational knowledge. The capacity for such learning naturally depends on the architecture of the knowledge network itself, and also on the architecture of the computational unit – the brain – that encodes and processes the information. Here, I will discuss emerging work assessing network constraints on the learnability of relational knowledge, and the neural correlates of that learning.