Credit Assignment
credit assignment
Dr Jonathan Tang
This position will focus on the neural mechanisms underlying action learning in mice. Scientifically the project aims to understand the neural circuits, activities and behavioral dynamics behind how animals learn what actions to take for reward. Dopaminergic systems and associated circuitries will be the focus of investigation. This lab integrates wireless inertial sensors, closed loop algorithms, optogenetics and neural recording to pursue this goal.
Assigning credit through the "other” connectome
Learning in neural networks requires assigning the right values to thousands to trillions or more of individual connections, so that the network as a whole produces the desired behavior. Neuroscientists have gained insights into this “credit assignment” problem through decades of experimental, modeling, and theoretical studies. This has suggested key roles for synaptic eligibility traces and top-down feedback signals, among other factors. Here we study the potential contribution of another type of signaling that is being revealed in greater and greater fidelity by ongoing molecular and genomics studies. This is the set of modulatory pathways local to a given circuit, which form an intriguing second type of connectome overlayed on top of synaptic connectivity. We will share ongoing modeling and theoretical work that explores the possible roles of this local modulatory connectome in network learning.
Behavioral Timescale Synaptic Plasticity (BTSP) for biologically plausible credit assignment across multiple layers via top-down gating of dendritic plasticity
A central problem in biological learning is how information about the outcome of a decision or behavior can be used to reliably guide learning across distributed neural circuits while obeying biological constraints. This “credit assignment” problem is commonly solved in artificial neural networks through supervised gradient descent and the backpropagation algorithm. In contrast, biological learning is typically modelled using unsupervised Hebbian learning rules. While these rules only use local information to update synaptic weights, and are sometimes combined with weight constraints to reflect a diversity of excitatory (only positive weights) and inhibitory (only negative weights) cell types, they do not prescribe a clear mechanism for how to coordinate learning across multiple layers and propagate error information accurately across the network. In recent years, several groups have drawn inspiration from the known dendritic non-linearities of pyramidal neurons to propose new learning rules and network architectures that enable biologically plausible multi-layer learning by processing error information in segregated dendrites. Meanwhile, recent experimental results from the hippocampus have revealed a new form of plasticity—Behavioral Timescale Synaptic Plasticity (BTSP)—in which large dendritic depolarizations rapidly reshape synaptic weights and stimulus selectivity with as little as a single stimulus presentation (“one-shot learning”). Here we explore the implications of this new learning rule through a biologically plausible implementation in a rate neuron network. We demonstrate that regulation of dendritic spiking and BTSP by top-down feedback signals can effectively coordinate plasticity across multiple network layers in a simple pattern recognition task. By analyzing hidden feature representations and weight trajectories during learning, we show the differences between networks trained with standard backpropagation, Hebbian learning rules, and BTSP.
Online Training of Spiking Recurrent Neural Networks With Memristive Synapses
Spiking recurrent neural networks (RNNs) are a promising tool for solving a wide variety of complex cognitive and motor tasks, due to their rich temporal dynamics and sparse processing. However training spiking RNNs on dedicated neuromorphic hardware is still an open challenge. This is due mainly to the lack of local, hardware-friendly learning mechanisms that can solve the temporal credit assignment problem and ensure stable network dynamics, even when the weight resolution is limited. These challenges are further accentuated, if one resorts to using memristive devices for in-memory computing to resolve the von-Neumann bottleneck problem, at the expense of a substantial increase in variability in both the computation and the working memory of the spiking RNNs. In this talk, I will present our recent work where we introduced a PyTorch simulation framework of memristive crossbar arrays that enables accurate investigation of such challenges. I will show that recently proposed e-prop learning rule can be used to train spiking RNNs whose weights are emulated in the presented simulation framework. Although e-prop locally approximates the ideal synaptic updates, it is difficult to implement the updates on the memristive substrate due to substantial device non-idealities. I will mention several widely adapted weight update schemes that primarily aim to cope with these device non-idealities and demonstrate that accumulating gradients can enable online and efficient training of spiking RNN on memristive substrates.
Credit Assignment in Neural Networks through Deep Feedback Control
The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically-plausible learning methods are either non-local in time, require highly specific connectivity motives, or have no clear link to any known mathematical optimization method. Here, we introduce Deep Feedback Control (DFC), a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment. The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of feedback connectivity patterns. To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing. By combining dynamical system theory with mathematical optimization theory, we provide a strong theoretical foundation for DFC that we corroborate with detailed results on toy experiments and standard computer-vision benchmarks.
The role of spatiotemporal waves in coordinating regional dopamine decision signals
The neurotransmitter dopamine is essential for normal reward learning and motivational arousal processes. Indeed these core functions are implicated in the major neurological and psychiatric dopamine disorders such as schizophrenia, substance abuse disorders/addiction and Parkinson's disease. Over the years, we have made significant strides in understanding the dopamine system across multiple levels of description, and I will focus on our recent advances in the computational description, and brain circuit mechanisms that facilitate the dual role of dopamine in learning and performance. I will specifically describe our recent work with imaging the activity of dopamine axons and measurements of dopamine release in mice performing various behavioural tasks. We discovered wave-like spatiotemporal activity of dopamine in the striatal region, and I will argue that this pattern of activation supports a critical computational operation; spatiotemporal credit assignment to regional striatal subexperts. Our findings provide a mechanistic description for vectorizing reward prediction error signals relayed by dopamine.
Back-propagation in spiking neural networks
Back-propagation is a powerful supervised learning algorithm in artificial neural networks, because it solves the credit assignment problem (essentially: what should the hidden layers do?). This algorithm has led to the deep learning revolution. But unfortunately, back-propagation cannot be used directly in spiking neural networks (SNN). Indeed, it requires differentiable activation functions, whereas spikes are all-or-none events which cause discontinuities. Here we present two strategies to overcome this problem. The first one is to use a so-called 'surrogate gradient', that is to approximate the derivative of the threshold function with the derivative of a sigmoid. We will present some applications of this method for time series processing (audio, internet traffic, EEG). The second one concerns a specific class of SNNs, which process static inputs using latency coding with at most one spike per neuron. Using approximations, we derived a latency-based back-propagation rule for this sort of networks, called S4NN, and applied it to image classification.
Striatal circuits for reward learning and decision-making
How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens (NAc), which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex (PL) and midline regions of the thalamus (mTH). However, little is known about what is represented in PL or mTH neurons that project to NAc (PL-NAc and mTH-NAc). By comparing these inputs during a reinforcement learning task in mice, we discovered that i) PL-NAc preferentially represents actions and choices, ii) mTH-NAc preferentially represents cues, iii) choice-selective activity in PL-NAc is organized in sequences that persist beyond the outcome. Through computational modelling, we demonstrate that these sequences can support the neural implementation of temporal difference learning, a powerful algorithm to connect actions and outcomes across time. Finally, we test and confirm predictions of our circuit model by direct manipulation of PL-NAc neurons. Thus, we integrate experiment and modelling to suggest a neural solution for credit assignment.
Using Dynamical Systems Theory to Improve Temporal Credit Assignment in Spiking Neural Networks
Bernstein Conference 2024
Principled credit assignment with strong feedback through Deep Feedback Control
COSYNE 2022
Principled credit assignment with strong feedback through Deep Feedback Control
COSYNE 2022
Reorganizing cortical learning: a cholinergic adaptive credit assignment model
COSYNE 2022
Reorganizing cortical learning: a cholinergic adaptive credit assignment model
COSYNE 2022
Excitatory-inhibitory cortical feedback enables efficient hierarchical credit assignment
COSYNE 2023
Biologically plausible credit assignment via neuronal frequency multiplexing
COSYNE 2025
Can BTSP mediate credit assignment in the hippocampus?
COSYNE 2025
Dendritic target propagation: a biology-constrained algorithm for credit assignment in multilayer recurrent E/I networks
COSYNE 2025
The neuronal trace of temporal credit assignment in premotor cortex
FENS Forum 2024