Architectures
Architectures
Stefan Mihalas
Biological systems learn differently than current machine learning systems, with generally higher sample efficiency but also strong inductive biases. The scientist will explore the effects which bio-realistic neurons, plasticity rules and architectures have on learning in artificial neural networks. This will be done by combining construction of artificial neural network with bio-inspired constraints.
Neural architectures: what are they good for anyway?
The brain has a highly complex structure in terms of cell types and wiring between different regions. What is it for, if anything? I'll start this talk by asking what might an answer to this question even look like given that we can't run an alternative universe where our brains are structured differently. (Preview: we can do this with models!) I'll then talk about some of our work in two areas: (1) does the modular structure of the brain contribute to specialisation of function? (2) how do different cell types and architectures contribute to multimodal sensory processing?
Generative models for video games (rescheduled)
Developing agents capable of modeling complex environments and human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent advances of my team at Microsoft Research towards scalable machine learning architectures that effectively capture human gameplay data. In the first part of my talk, I will focus on diffusion models as generative models of human behavior. Previously shown to have impressive image generation capabilities, I present insights that unlock applications to imitation learning for sequential decision making. In the second part of my talk, I discuss a recent project taking ideas from language modeling to build a generative sequence model of an Xbox game.
Generative models for video games
Developing agents capable of modeling complex environments and human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent advances of my team at Microsoft Research towards scalable machine learning architectures that effectively capture human gameplay data. In the first part of my talk, I will focus on diffusion models as generative models of human behavior. Previously shown to have impressive image generation capabilities, I present insights that unlock applications to imitation learning for sequential decision making. In the second part of my talk, I discuss a recent project taking ideas from language modeling to build a generative sequence model of an Xbox game.
Improving Language Understanding by Generative Pre Training
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
The Neural Race Reduction: Dynamics of nonlinear representation learning in deep architectures
What is the relationship between task, network architecture, and population activity in nonlinear deep networks? I will describe the Gated Deep Linear Network framework, which schematizes how pathways of information flow impact learning dynamics within an architecture. Because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. The reduction takes the form of a neural race with an implicit bias towards shared representations, which then govern the model’s ability to systematically generalize, multi-task, and transfer. We show how appropriate network architectures can help factorize and abstract knowledge. Together, these results begin to shed light on the links between architecture, learning dynamics and network performance.
Analogical Reasoning and Generalization for Interactive Task Learning in Physical Machines
Humans are natural teachers; learning through instruction is one of the most fundamental ways that we learn. Interactive Task Learning (ITL) is an emerging research agenda that studies the design of complex intelligent robots that can acquire new knowledge through natural human teacher-robot learner interactions. ITL methods are particularly useful for designing intelligent robots whose behavior can be adapted by humans collaborating with them. In this talk, I will summarize our recent findings on the structure that human instruction naturally has and motivate an intelligent system design that can exploit their structure. The system – AILEEN – is being developed using the common model of cognition. Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. However, they miss a critical piece of intelligent behavior – analogical reasoning and generalization. I will introduce a new memory – concept memory – that integrates with a common model of cognition architecture and supports ITL.
Predictive modeling, cortical hierarchy, and their computational implications
Predictive modeling and dimensionality reduction of functional neuroimaging data have provided rich information about the representations and functional architectures of the human brain. While these approaches have been effective in many cases, we will discuss how neglecting the internal dynamics of the brain (e.g., spontaneous activity, global dynamics, effective connectivity) and its underlying computational principles may hinder our progress in understanding and modeling brain functions. By reexamining evidence from our previous and ongoing work, we will propose new hypotheses and directions for research that consider both internal dynamics and the computational principles that may govern brain processes.
Connecting performance benefits on visual tasks to neural mechanisms using convolutional neural networks
Behavioral studies have demonstrated that certain task features reliably enhance classification performance for challenging visual stimuli. These include extended image presentation time and the valid cueing of attention. Here, I will show how convolutional neural networks can be used as a model of the visual system that connects neural activity changes with such performance changes. Specifically, I will discuss how different anatomical forms of recurrence can account for better classification of noisy and degraded images with extended processing time. I will then show how experimentally-observed neural activity changes associated with feature attention lead to observed performance changes on detection tasks. I will also discuss the implications these results have for how we identify the neural mechanisms and architectures important for behavior.
Bridging the gap between artificial models and cortical circuits
Artificial neural networks simplify complex biological circuits into tractable models for computational exploration and experimentation. However, the simplification of artificial models also undermines their applicability to real brain dynamics. Typical efforts to address this mismatch add complexity to increasingly unwieldy models. Here, we take a different approach; by reducing the complexity of a biological cortical culture, we aim to distil the essential factors of neuronal dynamics and plasticity. We leverage recent advances in growing neurons from human induced pluripotent stem cells (hiPSCs) to analyse ex vivo cortical cultures with only two distinct excitatory and inhibitory neuron populations. Over 6 weeks of development, we record from thousands of neurons using high-density microelectrode arrays (HD-MEAs) that allow access to individual neurons and the broader population dynamics. We compare these dynamics to two-population artificial networks of single-compartment neurons with random sparse connections and show that they produce similar dynamics. Specifically, our model captures the firing and bursting statistics of the cultures. Moreover, tightly integrating models and cultures allows us to evaluate the impact of changing architectures over weeks of development, with and without external stimuli. Broadly, the use of simplified cortical cultures enables us to use the repertoire of theoretical neuroscience techniques established over the past decades on artificial network models. Our approach of deriving neural networks from human cells also allows us, for the first time, to directly compare neural dynamics of disease and control. We found that cultures e.g. from epilepsy patients tended to have increasingly more avalanches of synchronous activity over weeks of development, in contrast to the control cultures. Next, we will test possible interventions, in silico and in vitro, in a drive for personalised approaches to medical care. This work starts bridging an important theoretical-experimental neuroscience gap for advancing our understanding of mammalian neuron dynamics.
Training Dynamic Spiking Neural Network via Forward Propagation Through Time
With recent advances in learning algorithms, recurrent networks of spiking neurons are achieving performance competitive with standard recurrent neural networks. Still, these learning algorithms are limited to small networks of simple spiking neurons and modest-length temporal sequences, as they impose high memory requirements, have difficulty training complex neuron models, and are incompatible with online learning.Taking inspiration from the concept of Liquid Time-Constant (LTCs), we introduce a novel class of spiking neurons, the Liquid Time-Constant Spiking Neuron (LTC-SN), resulting in functionality similar to the gating operation in LSTMs. We integrate these neurons in SNNs that are trained with FPTT and demonstrate that thus trained LTC-SNNs outperform various SNNs trained with BPTT on long sequences while enabling online learning and drastically reducing memory complexity. We show this for several classical benchmarks that can easily be varied in sequence length, like the Add Task and the DVS-gesture benchmark. We also show how FPTT-trained LTC-SNNs can be applied to large convolutional SNNs, where we demonstrate novel state-of-the-art for online learning in SNNs on a number of standard benchmarks (S-MNIST, R-MNIST, DVS-GESTURE) and also show that large feedforward SNNs can be trained successfully in an online manner to near (Fashion-MNIST, DVS-CIFAR10) or exceeding (PS-MNIST, R-MNIST) state-of-the-art performance as obtained with offline BPTT. Finally, the training and memory efficiency of FPTT enables us to directly train SNNs in an end-to-end manner at network sizes and complexity that was previously infeasible: we demonstrate this by training in an end-to-end fashion the first deep and performant spiking neural network for object localization and recognition. Taken together, we out contribution enable for the first time training large-scale complex spiking neural network architectures online and on long temporal sequences.
Behavioral Timescale Synaptic Plasticity (BTSP) for biologically plausible credit assignment across multiple layers via top-down gating of dendritic plasticity
A central problem in biological learning is how information about the outcome of a decision or behavior can be used to reliably guide learning across distributed neural circuits while obeying biological constraints. This “credit assignment” problem is commonly solved in artificial neural networks through supervised gradient descent and the backpropagation algorithm. In contrast, biological learning is typically modelled using unsupervised Hebbian learning rules. While these rules only use local information to update synaptic weights, and are sometimes combined with weight constraints to reflect a diversity of excitatory (only positive weights) and inhibitory (only negative weights) cell types, they do not prescribe a clear mechanism for how to coordinate learning across multiple layers and propagate error information accurately across the network. In recent years, several groups have drawn inspiration from the known dendritic non-linearities of pyramidal neurons to propose new learning rules and network architectures that enable biologically plausible multi-layer learning by processing error information in segregated dendrites. Meanwhile, recent experimental results from the hippocampus have revealed a new form of plasticity—Behavioral Timescale Synaptic Plasticity (BTSP)—in which large dendritic depolarizations rapidly reshape synaptic weights and stimulus selectivity with as little as a single stimulus presentation (“one-shot learning”). Here we explore the implications of this new learning rule through a biologically plausible implementation in a rate neuron network. We demonstrate that regulation of dendritic spiking and BTSP by top-down feedback signals can effectively coordinate plasticity across multiple network layers in a simple pattern recognition task. By analyzing hidden feature representations and weight trajectories during learning, we show the differences between networks trained with standard backpropagation, Hebbian learning rules, and BTSP.
Beyond Biologically Plausible Spiking Networks for Neuromorphic Computing
Biologically plausible spiking neural networks (SNNs) are an emerging architecture for deep learning tasks due to their energy efficiency when implemented on neuromorphic hardware. However, many of the biological features are at best irrelevant and at worst counterproductive when evaluated in the context of task performance and suitability for neuromorphic hardware. In this talk, I will present an alternative paradigm to design deep learning architectures with good task performance in real-world benchmarks while maintaining all the advantages of SNNs. We do this by focusing on two main features – event-based computation and activity sparsity. Starting from the performant gated recurrent unit (GRU) deep learning architecture, we modify it to make it event-based and activity-sparse. The resulting event-based GRU (EGRU) is extremely efficient for both training and inference. At the same time, it achieves performance close to conventional deep learning architectures in challenging tasks such as language modelling, gesture recognition and sequential MNIST.
General purpose event-based architectures for deep learning
Biologically plausible spiking neural networks (SNNs) are an emerging architecture for deep learning tasks due to their energy efficiency when implemented on neuromorphic hardware. However, many of the biological features are at best irrelevant and at worst counterproductive when evaluated in the context of task performance and suitability for neuromorphic hardware. In this talk, I will present an alternative paradigm to design deep learning architectures with good task performance in real-world benchmarks while maintaining all the advantages of SNNs. We do this by focusing on two main features -- event-based computation and activity sparsity. Starting from the performant gated recurrent unit (GRU) deep learning architecture, we modify it to make it event-based and activity-sparse. The resulting event-based GRU (EGRU) is extremely efficient for both training and inference. At the same time, it achieves performance close to conventional deep learning architectures in challenging tasks such as language modelling, gesture recognition and sequential MNIST
What does the primary visual cortex tell us about object recognition?
Object recognition relies on the complex visual representations in cortical areas at the top of the ventral stream hierarchy. While these are thought to be derived from low-level stages of visual processing, this has not been shown, yet. Here, I describe the results of two projects exploring the contributions of primary visual cortex (V1) processing to object recognition using artificial neural networks (ANNs). First, we developed hundreds of ANN-based V1 models and evaluated how their single neurons approximate those in the macaque V1. We found that, for some models, single neurons in intermediate layers are similar to their biological counterparts, and that the distributions of their response properties approximately match those in V1. Furthermore, we observed that models that better matched macaque V1 were also more aligned with human behavior, suggesting that object recognition is derived from low-level. Motivated by these results, we then studied how an ANN’s robustness to image perturbations relates to its ability to predict V1 responses. Despite their high performance in object recognition tasks, ANNs can be fooled by imperceptibly small, explicitly crafted perturbations. We observed that ANNs that better predicted V1 neuronal activity were also more robust to adversarial attacks. Inspired by this, we developed VOneNets, a new class of hybrid ANN vision models. Each VOneNet contains a fixed neural network front-end that simulates primate V1 followed by a neural network back-end adapted from current computer vision models. After training, VOneNets were substantially more robust, outperforming state-of-the-art methods on a set of perturbations. While current neural network architectures are arguably brain-inspired, these results demonstrate that more precisely mimicking just one stage of the primate visual system leads to new gains in computer vision applications and results in better models of the primate ventral stream and object recognition behavior.
NMC4 Keynote: An all-natural deep recurrent neural network architecture for flexible navigation
A wide variety of animals and some artificial agents can adapt their behavior to changing cues, contexts, and goals. But what neural network architectures support such behavioral flexibility? Agents with loosely structured network architectures and random connections can be trained over millions of trials to display flexibility in specific tasks, but many animals must adapt and learn with much less experience just to survive. Further, it has been challenging to understand how the structure of trained deep neural networks relates to their functional properties, an important objective for neuroscience. In my talk, I will use a combination of behavioral, physiological and connectomic evidence from the fly to make the case that the built-in modularity and structure of its networks incorporate key aspects of the animal’s ecological niche, enabling rapid flexibility by constraining learning to operate on a restricted parameter set. It is not unlikely that this is also a feature of many biological neural networks across other animals, large and small, and with and without vertebrae.
Edge Computing using Spiking Neural Networks
Deep learning has made tremendous progress in the last year but it's high computational and memory requirements impose challenges in using deep learning on edge devices. There has been some progress in lowering memory requirements of deep neural networks (for instance, use of half-precision) but there has been minimal effort in developing alternative efficient computational paradigms. Inspired by the brain, Spiking Neural Networks (SNN) provide an energy-efficient alternative to conventional rate-based neural networks. However, SNN architectures that employ the traditional feedforward and feedback pass do not fully exploit the asynchronous event-based processing paradigm of SNNs. In the first part of my talk, I will present my work on predictive coding which offers a fundamentally different approach to developing neural networks that are particularly suitable for event-based processing. In the second part of my talk, I will present our work on development of approaches for SNNs that target specific problems like low response latency and continual learning. References Dora, S., Bohte, S. M., & Pennartz, C. (2021). Deep Gated Hebbian Predictive Coding Accounts for Emergence of Complex Neural Response Properties Along the Visual Cortical Hierarchy. Frontiers in Computational Neuroscience, 65. Saranirad, V., McGinnity, T. M., Dora, S., & Coyle, D. (2021, July). DoB-SNN: A New Neuron Assembly-Inspired Spiking Neural Network for Pattern Classification. In 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-6). IEEE. Machingal, P., Thousif, M., Dora, S., Sundaram, S., Meng, Q. (2021). A Cross Entropy Loss for Spiking Neural Networks. Expert Systems with Applications (under review).
Norse: A library for gradient-based learning in Spiking Neural Networks
We introduce Norse: An open-source library for gradient-based training of spiking neural networks. In contrast to neuron simulators which mainly target computational neuroscientists, our library seamlessly integrates with the existing PyTorch ecosystem using abstractions familiar to the machine learning community. This has immediate benefits in that it provides a familiar interface, hardware accelerator support and, most importantly, the ability to use gradient-based optimization. While many parallel efforts in this direction exist, Norse emphasizes flexibility and usability in three ways. Users can conveniently specify feed-forward (convolutional) architectures, as well as arbitrarily connected recurrent networks. We strictly adhere to a functional and class-based API such that neuron primitives and, for example, plasticity rules composes. Finally, the functional core API ensures compatibility with the PyTorch JIT and ONNX infrastructure. We have made progress to support network execution on the SpiNNaker platform and plan to support other neuromorphic architectures in the future. While the library is useful in its present state, it also has limitations we will address in ongoing work. In particular, we aim to implement event-based gradient computation, using the EventProp algorithm, which will allow us to support sparse event-based data efficiently, as well as work towards support of more complex neuron models. With this library, we hope to contribute to a joint future of computational neuroscience and neuromorphic computing.
Event-based Backpropagation for Exact Gradients in Spiking Neural Networks
Gradient-based optimization powered by the backpropagation algorithm proved to be the pivotal method in the training of non-spiking artificial neural networks. At the same time, spiking neural networks hold the promise for efficient processing of real-world sensory data by communicating using discrete events in continuous time. We derive the backpropagation algorithm for a recurrent network of spiking (leaky integrate-and-fire) neurons with hard thresholds and show that the backward dynamics amount to an event-based backpropagation of errors through time. Our derivation uses the jump conditions for partial derivatives at state discontinuities found by applying the implicit function theorem, allowing us to avoid approximations or substitutions. We find that the gradient exists and is finite almost everywhere in weight space, up to the null set where a membrane potential is precisely tangent to the threshold. Our presented algorithm, EventProp, computes the exact gradient with respect to a general loss function based on spike times and membrane potentials. Crucially, the algorithm allows for an event-based communication scheme in the backward phase, retaining the potential advantages of temporal sparsity afforded by spiking neural networks. We demonstrate the optimization of spiking networks using gradients computed via EventProp and the Yin-Yang and MNIST datasets with either a spike time-based or voltage-based loss function and report competitive performance. Our work supports the rigorous study of gradient-based optimization in spiking neural networks as well as the development of event-based neuromorphic architectures for the efficient training of spiking neural networks. While we consider the leaky integrate-and-fire model in this work, our methodology generalises to any neuron model defined as a hybrid dynamical system.
On the implicit bias of SGD in deep learning
Tali's work emphasized the tradeoff between compression and information preservation. In this talk I will explore this theme in the context of deep learning. Artificial neural networks have recently revolutionized the field of machine learning. However, we still do not have sufficient theoretical understanding of how such models can be successfully learned. Two specific questions in this context are: how can neural nets be learned despite the non-convexity of the learning problem, and how can they generalize well despite often having more parameters than training data. I will describe our recent work showing that gradient-descent optimization indeed leads to 'simpler' models, where simplicity is captured by lower weight norm and in some cases clustering of weight vectors. We demonstrate this for several teacher and student architectures, including learning linear teachers with ReLU networks, learning boolean functions and learning convolutional pattern detection architectures.
3D Printing Cellular Communities: Mammalian Cells, Bacteria, And Beyond
While the motion and collective behavior of cells are well-studied on flat surfaces or in unconfined liquid media, in most natural settings, cells thrive in complex 3D environments. Bioprinting processes are capable of structuring cells in 3D and conventional bioprinting approaches address this challenge by embedding cells in bio-degradable polymer networks. However, heterogeneity in network structure and biodegradation often preclude quantitative studies of cell behavior in specified 3D architectures. Here, I will present a new approach to 3D bioprinting of cellular communities that utilizes jammed, granular polyelectrolyte microgels as a support medium. The self-healing nature of this medium allows the creation of highly precise cellular communities and tissue-like structures by direct injection of cells inside the 3D medium. Further, the transparent nature of this medium enables precise characterization of cellular behavior. I will describe two examples of my work using this platform to study the behavior of two different classes of cells in 3D. First, I will describe how we interrogate the growth, viability, and migration of mammalian cells—ranging from epithelial cells, cancer cells, and T cells—in the 3D pore space. Second, I will describe how we interrogate the migration of E. coli bacteria through the 3D pore space. Direct visualization enables us to reveal a new mode of motility exhibited by individual cells, in stark contrast to the paradigm of run-and-tumble motility, in which cells are intermittently and transiently trapped as they navigate the pore space; further, analysis of these dynamics enables prediction of single-cell transport over large length and time scales. Moreover, we show that concentrated populations of E. coli can collectively migrate through a porous medium—despite being strongly confined—by chemotactically “surfing” a self-generated nutrient gradient. Together, these studies highlight how the jammed microgel medium provides a powerful platform to design and interrogate complex cellular communities in 3D—with implications for tissue engineering, microtissue mechanics, studies of cellular interactions, and biophysical studies of active matter.
Retroviruses and retrotransposons interacting with the 3D genome in mouse and human brain
Repeat-rich sequence blocks are considered major determinants for 3D folding and structural genome organization in the cell nucleus in all higher eukaryotes. Here, we discuss how megabase-scale chromatin domain and chromosomal compartment organization in adult mouse cerebral cortex is linked, in highly cell type-specific fashion, to multiple retrotransposon superfamilies which comprise the vast majority of mobile DNA elements in the murine genome. We show that neuronal megadomain architectures include an evolutionarily adaptive heterochromatic organization which, upon perturbation, unleashes proviruses from the Long Terminal Repeat (LTR) Endogenous Retrovirus family that exhibit strong tropism in mature neurons. Furthermore, we mapped, in the human brain, cell type-specific genomic integration patterns of the human pathogen and exogenous retrovirus, HIV, together with changes in genome organization and function of the HIV infected brain. Our work highlights the critical importance of chromosomal conformations and the ‘spatial genome’ for neuron- and glia-specific regulatory mechanisms and defenses aimed at exogenous and endogenous retrotransposons in the brain
Computational psychophysics at the intersection of theory, data and models
Behavioural measurements are often overlooked by computational neuroscientists, who prefer to focus on electrophysiological recordings or neuroimaging data. This attitude is largely due to perceived lack of depth/richness in relation to behavioural datasets. I will show how contemporary psychophysics can deliver extremely rich and highly constraining datasets that naturally interface with computational modelling. More specifically, I will demonstrate how psychophysics can be used to guide/constrain/refine computational models, and how models can be exploited to design/motivate/interpret psychophysical experiments. Examples will span a wide range of topics (from feature detection to natural scene understanding) and methodologies (from cascade models to deep learning architectures).
DeepLabStream
DeepLabStream is a python based multi-purpose tool that enables the realtime tracking and manipulation of animals during ongoing experiments. Our toolbox was orginally adapted from the previously published DeepLabCut (Mathis et al., 2018) and expanded on its core capabilities, but is now able to utilize a variety of different network architectures for online pose estimation (SLEAP, DLC-Live, DeepPosekit's StackedDenseNet, StackedHourGlass and LEAP). Our aim is to provide an open-source tool that allows researchers to design custom experiments based on real-time behavior-dependent feedback. My personal ideal goal would be a swiss-army knife like solution where we could integrate the many brilliant python interfaces. We are constantly upgrading DLStream with new features and integrate other open-source solutions.
Multistable structures - from deployable structures to robots
Multistable structures can reversibly change between multiple stable configurations when a sufficient energetic input is provided. While originally the field focused on understanding what governs the snapping, more recently it has been shown that these systems also provide a powerful platform to design a wide range of smart structures. In this talk, I will first show that pressure-deployable origami structures characterized by two stable configurations provide opportunities for a new generation of large-scale inflatable structures that lock in place after deployment and provide a robust enclosure through their rigid faces. Then, I will demonstrate that the propagation of transition waves in a bistable one-dimensional linkage can be exploited as a robust mechanism to realize structures that can be quickly deployed. Finally, while in the first two examples multistability is harnessed to realize deployable architectures, I will demonstrate that bistable building blocks can also be exploited to design crawling and jumping robots. Unlike previously proposed robots that require complex input control of multiple actuators, a simple, slow input signal suffices to make our system move, as all features required for locomotion are embedded into the architecture of the building blocks.
Dimensions of variability in circuit models of cortex
Cortical circuits receive multiple inputs from upstream populations with non-overlapping stimulus tuning preferences. Both the feedforward and recurrent architectures of the receiving cortical layer will reflect this diverse input tuning. We study how population-wide neuronal variability propagates through a hierarchical cortical network receiving multiple, independent, tuned inputs. We present new analysis of in vivo neural data from the primate visual system showing that the number of latent variables (dimension) needed to describe population shared variability is smaller in V4 populations compared to those of its downstream visual area PFC. We successfully reproduce this dimensionality expansion from our V4 to PFC neural data using a multi-layer spiking network with structured, feedforward projections and recurrent assemblies of multiple, tuned neuron populations. We show that tuning-structured connectivity generates attractor dynamics within the recurrent PFC current, where attractor competition is reflected in the high dimensional shared variability across the population. Indeed, restricting the dimensionality analysis to activity from one attractor state recovers the low-dimensional structure inherited from each of our tuned inputs. Our model thus introduces a framework where high-dimensional cortical variability is understood as ``time-sharing’’ between distinct low-dimensional, tuning-specific circuit dynamics.
Building a synthetic cell: Understanding the clock design and function
Clock networks containing the same central architectures may vary drastically in their potential to oscillate, raising the question of what controls robustness, one of the essential functions of an oscillator. We computationally generate an atlas of oscillators and found that, while core topologies are critical for oscillations, local structures substantially modulate the degree of robustness. Strikingly, two local structures, incoherent and coherent inputs, can modify a core topology to promote and attenuate its robustness, additively. The findings underscore the importance of local modifications to the performance of the whole network. It may explain why auxiliary structures not required for oscillations are evolutionary conserved. We also extend this computational framework to search hidden network motifs for other clock functions, such as tunability that relates to the capabilities of a clock to adjust timing to external cues. Experimentally, we developed an artificial cell system in water-in-oil microemulsions, within which we reconstitute mitotic cell cycles that can perform self-sustained oscillations for 30 to 40 cycles over multiple days. The oscillation profiles, such as period, amplitude, and shape, can be quantitatively varied with the concentrations of clock regulators, energy levels, droplet sizes, and circuit design. Such innate flexibility makes it crucial to studying clock functions of tunability and stochasticity at the single-cell level. Combined with a pressure-driven multi-channel tuning setup and long-term time-lapse fluorescence microscopy, this system enables a high-throughput exploration in multi-dimension continuous parameter space and single-cell analysis of the clock dynamics and functions. We integrate this experimental platform with mathematical modeling to elucidate the topology-function relation of biological clocks. With FRET and optogenetics, we also investigate spatiotemporal cell-cycle dynamics in both homogeneous and heterogeneous microenvironments by reconstructing subcellular compartments.
Protecting Machines from Us
The possibilities of machine learning and neural networks in particular are ever expanding. With increased opportunities to do good, however there are just as many opportunities to do harm and even in the case that good intentions are at the helm, evidence suggests that opportunities for good may eventually prove to be the opposite. The greatest threat to what machine learning is able to achieve and to us as humans, is machine learning that does not reflect the diversity of the users it is meant to serve. It is important that we are not so pre-occupied with advancing technology into the future that we have not taken the time to invest the energy into engineering the security measures this future requires. It is important to investigate now, as thoroughly as we investigate differing deep neural network architectures, the complex questions regarding the fact that humans and the society in which they operate is inherently biased and loaded with prejudice and that these traits find themselves in the machines we create (and increasingly allow to run our lives).
Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits
Synaptic plasticity is believed to be a key physiological mechanism for learning. It is well-established that it depends on pre and postsynaptic activity. However, models that rely solely on pre and postsynaptic activity for synaptic changes have, to date, not been able to account for learning complex tasks that demand hierarchical networks. Here, we show that if synaptic plasticity is regulated by high-frequency bursts of spikes, then neurons higher in the hierarchy can coordinate the plasticity of lower-level connections. Using simulations and mathematical analyses, we demonstrate that, when paired with short-term synaptic dynamics, regenerative activity in the apical dendrites, and synaptic plasticity in feedback pathways, a burst-dependent learning rule can solve challenging tasks that require deep network architectures. Our results demonstrate that well-known properties of dendrites, synapses, and synaptic plasticity are sufficient to enable sophisticated learning in hierarchical circuits.
On temporal coding in spiking neural networks with alpha synaptic function
The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic temporal coding ability present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. In classification tasks, the output of the network is indicated by the first neuron to spike in the output layer. This temporal coding scheme allows the supervised training of the network with backpropagation, using locally exact derivatives of the postsynaptic spike times with respect to presynaptic spike times. The network operates using a biologically-plausible alpha synaptic transfer function. Additionally, we use trainable synchronisation pulses that provide bias, add flexibility during training and exploit the decay part of the alpha function. We show that such networks can be trained successfully on noisy Boolean logic tasks and on the MNIST dataset encoded in time. The results show that the spiking neural network outperforms comparable spiking models on MNIST and achieves similar quality to fully connected conventional networks with the same architecture. We also find that the spiking network spontaneously discovers two operating regimes, mirroring the accuracy-speed trade-off observed in human decision-making: a slow regime, where a decision is taken after all hidden neurons have spiked and the accuracy is very high, and a fast regime, where a decision is taken very fast but the accuracy is lower. These results demonstrate the computational power of spiking networks with biological characteristics that encode information in the timing of individual neurons. By studying temporal coding in spiking networks, we aim to create building blocks towards energy-efficient and more complex biologically-inspired neural architectures.
Analogy in Cognitive Architecture
Cognitive architectures are attempts to build larger-scale models of minds. This talk will explore how structure-mapping models of analogical matching, retrieval, and generalization are used in the Companion cognitive architecture. Examples will include modeling conceptual change, learning by reading, and analogical Q/A training.
Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits
Synaptic plasticity is believed to be a key physiological mechanism for learning. It is well-established that it depends on pre and postsynaptic activity. However, models that rely solely on pre and postsynaptic activity for synaptic changes have, to date, not been able to account for learning complex tasks that demand hierarchical networks. Here, we show that if synaptic plasticity is regulated by high-frequency bursts of spikes, then neurons higher in the hierarchy can coordinate the plasticity of lower-level connections. Using simulations and mathematical analyses, we demonstrate that, when paired with short-term synaptic dynamics, regenerative activity in the apical dendrites, and synaptic plasticity in feedback pathways, a burst-dependent learning rule can solve challenging tasks that require deep network architectures. Our results demonstrate that well-known properties of dendrites, synapses, and synaptic plasticity are sufficient to enable sophisticated learning in hierarchical circuits.
Non-feedforward architectures enable diverse multisensory computations
Bernstein Conference 2024
Stochastic Process Model derived indicators of overfitting for deep architectures: Applicability to small sample recalibration of sEMG decoders
Bernstein Conference 2024
Model architectures for choice-selective sequences in a navigation-based, evidence-accumulation task
COSYNE 2022
Model architectures for choice-selective sequences in a navigation-based, evidence-accumulation task
COSYNE 2022
Parallel functional architectures within a single dendritic tree
COSYNE 2022
Parallel functional architectures within a single dendritic tree
COSYNE 2022
Where are the neural architectures? The curse of structural flatness in neural network modelling
Neuromatch 5