Prediction Error

In a first part of the talk, I will present Predictive Coding Light (PCL), a novel unsupervised learning architecture for spiking neural networks. In contrast to conventional predictive coding approaches, which only transmit prediction errors to higher processing stages, PCL learns inhibitory lateral and top-down connectivity to suppress the most predictable spikes and passes a compressed representation of the input to higher processing stages. We show that PCL reproduces a range of biological findings and exhibits a favorable tradeoff between energy consumption and downstream classification performance on challenging benchmarks. A second part of the talk will feature our lab’s efforts to explain how infants and toddlers might learn abstract object representations without supervision. I will present deep learning models that exploit the temporal and multimodal structure of their sensory inputs to learn representations of individual objects, object categories, or abstract super-categories such as „kitchen object“ in a fully unsupervised fashion. These models offer a parsimonious account of how abstract semantic knowledge may be rooted in children's embodied first-person experiences.

Max Planck Institute for Human Cognitive and Brain Sciences

Understanding reward-guided learning using large-scale datasets

Cognitive maps as expectations learned across episodes – a model of the two dentate gyrus blades

Andrej Bicanski

Mar 11, 2025

How can the hippocampal system transition from episodic one-shot learning to a multi-shot learning regime and what is the utility of the resultant neural representations? This talk will explore the role of the dentate gyrus (DG) anatomy in this context. The canonical DG model suggests it performs pattern separation. More recent experimental results challenge this standard model, suggesting DG function is more complex and also supports the precise binding of objects and events to space and the integration of information across episodes. Very recent studies attribute pattern separation and pattern integration to anatomically distinct parts of the DG (the suprapyramidal blade vs the infrapyramidal blade). We propose a computational model that investigates this distinction. In the model the two processing streams (potentially localized in separate blades) contribute to the storage of distinct episodic memories, and the integration of information across episodes, respectively. The latter forms generalized expectations across episodes, eventually forming a cognitive map. We train the model with two data sets, MNIST and plausible entorhinal cortex inputs. The comparison between the two streams allows for the calculation of a prediction error, which can drive the storage of poorly predicted memories and the forgetting of well-predicted memories. We suggest that differential processing across the DG aids in the iterative construction of spatial cognitive maps to serve the generation of location-dependent expectations, while at the same time preserving episodic memory traces of idiosyncratic events.

Decomposing motivation into value and salience

Philippe Tobler

University of Zurich

Oct 31, 2024

Humans and other animals approach reward and avoid punishment and pay attention to cues predicting these events. Such motivated behavior thus appears to be guided by value, which directs behavior towards or away from positively or negatively valenced outcomes. Moreover, it is facilitated by (top-down) salience, which enhances attention to behaviorally relevant learned cues predicting the occurrence of valenced outcomes. Using human neuroimaging, we recently separated value (ventral striatum, posterior ventromedial prefrontal cortex) from salience (anterior ventromedial cortex, occipital cortex) in the domain of liquid reward and punishment. Moreover, we investigated potential drivers of learned salience: the probability and uncertainty with which valenced and non-valenced outcomes occur. We find that the brain dissociates valenced from non-valenced probability and uncertainty, which indicates that reinforcement matters for the brain, in addition to information provided by probability and uncertainty alone, regardless of valence. Finally, we assessed learning signals (unsigned prediction errors) that may underpin the acquisition of salience. Particularly the insula appears to be central for this function, encoding a subjective salience prediction error, similarly at the time of positively and negatively valenced outcomes. However, it appears to employ domain-specific time constants, leading to stronger salience signals in the aversive than the appetitive domain at the time of cues. These findings explain why previous research associated the insula with both valence-independent salience processing and with preferential encoding of the aversive domain. More generally, the distinction of value and salience appears to provide a useful framework for capturing the neural basis of motivated behavior.

Friedrich Miescher Institute for Biomedical Research, Basel

Predictive processing: a circuit approach to psychosis

Georg Keller

Mar 13, 2024

Predictive processing is a computational framework that aims to explain how the brain processes sensory information by making predictions about the environment and minimizing prediction errors. It can also be used to explain some of the key symptoms of psychotic disorders such as schizophrenia. In my talk, I will provide an overview of our progress in this endeavor.

The University of Texas at Houston

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Harel Shouval

Jun 13, 2023

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.

National Institute of Mental Health at National Institutes of Health (NIH)

Richly structured reward predictions in dopaminergic learning circuits

Angela J. Langdon

May 16, 2023

Theories from reinforcement learning have been highly influential for interpreting neural activity in the biological circuits critical for animal and human learning. Central among these is the identification of phasic activity in dopamine neurons as a reward prediction error signal that drives learning in basal ganglia and prefrontal circuits. However, recent findings suggest that dopaminergic prediction error signals have access to complex, structured reward predictions and are sensitive to more properties of outcomes than learning theories with simple scalar value predictions might suggest. Here, I will present recent work in which we probed the identity-specific structure of reward prediction errors in an odor-guided choice task and found evidence for multiple predictive “threads” that segregate reward predictions, and reward prediction errors, according to the specific sensory features of anticipated outcomes. Our results point to an expanded class of neural reinforcement learning algorithms in which biological agents learn rich associative structure from their environment and leverage it to build reward predictions that include information about the specific, and perhaps idiosyncratic, features of available outcomes, using these to guide behavior in even quite simple reward learning tasks.

Columbia University, New York

Off-policy learning in the basal ganglia

Ashok Litwin-Kumar

May 2, 2023

I will discuss work with Jack Lindsey modeling reinforcement learning for action selection in the basal ganglia. I will argue that the presence of multiple brain regions, in addition to the basal ganglia, that contribute to motor control motivates the need for an off-policy basal ganglia learning algorithm. I will then describe a biological implementation of such an algorithm that predicts tuning of dopamine neurons to a quantity we call "action surprise," in addition to reward prediction error. In the same model, an implementation of learning from a motor efference copy also predicts a novel solution to the problem of multiplexing feedforward and efference-related striatal activity. The solution exploits the difference between D1 and D2-expressing medium spiny neurons and leads to predictions about striatal dynamics.

California Institute of Technology

Learning static and dynamic mappings with local self-supervised plasticity

Pantelis Vafeidis

Sep 6, 2022

Animals exhibit remarkable learning capabilities with little direct supervision. Likewise, self-supervised learning is an emergent paradigm in artificial intelligence, closing the performance gap to supervised learning. In the context of biology, self-supervised learning corresponds to a setting where one sense or specific stimulus may serve as a supervisory signal for another. After learning, the latter can be used to predict the former. On the implementation level, it has been demonstrated that such predictive learning can occur at the single neuron level, in compartmentalized neurons that separate and associate information from different streams. We demonstrate the power such self-supervised learning over unsupervised (Hebb-like) learning rules, which depend heavily on stimulus statistics, in two examples: First, in the context of animal navigation where predictive learning can associate internal self-motion information always available to the animal with external visual landmark information, leading to accurate path-integration in the dark. We focus on the well-characterized fly head direction system and show that our setting learns a connectivity strikingly similar to the one reported in experiments. The mature network is a quasi-continuous attractor and reproduces key experiments in which optogenetic stimulation controls the internal representation of heading, and where the network remaps to integrate with different gains. Second, we show that incorporating global gating by reward prediction errors allows the same setting to learn conditioning at the neuronal level with mixed selectivity. At its core, conditioning entails associating a neural activity pattern induced by an unconditioned stimulus (US) with the pattern arising in response to a conditioned stimulus (CS). Solving the generic problem of pattern-to-pattern associations naturally leads to emergent cognitive phenomena like blocking, overshadowing, saliency effects, extinction, interstimulus interval effects etc. Surprisingly, we find that the same network offers a reductionist mechanism for causal inference by resolving the post hoc, ergo propter hoc fallacy.

An economic decision-making model of anticipated surprise with dynamic expectation

Taro Toyoizumi

RIKEN

Dec 7, 2021

When making decision under risk, people often exhibit behaviours that classical economic theories cannot explain. Newer models that attempt to account for these ‘irrational’ behaviours often lack neuroscience bases and require the introduction of subjective and problem-specific constructs. Here, we present a decision-making model inspired by the prediction error signals and introspective neuronal replay reported in the brain. In the model, decisions are chosen based on ‘anticipated surprise’, defined by a nonlinear average of the differences between individual outcomes and a reference point. The reference point is determined by the expected value of the possible outcomes, which can dynamically change during the mental simulation of decision-making problems involving sequential stages. Our model elucidates the contribution of each stage to the appeal of available options in a decision-making problem. This allows us to explain several economic paradoxes and gambling behaviours. Our work could help bridge the gap between decision-making theories in economics and neurosciences.

Saleem lab, University College London

Feature selectivity can explain mismatch signals in mouse visual cortex

Tomaso Muzzu

Oct 19, 2021

Sensory experience often depends on one’s own actions, including self-motion. Theories of predictive coding postulate that actions are regulated by calculating prediction error, which is the difference between sensory experience and expectation based on self-generated actions. Signals consistent with prediction error have been reported in mouse visual cortex (V1) when visual flow coupled to running was unexpectedly stopped. Here, we show such signals can be elicited by visual stimuli uncoupled to animal’s running. We recorded V1 neurons while presenting drifting gratings that unexpectedly stopped. We found strong responses to visual perturbations, which were enhanced during running. Perturbation responses were strongest in the preferred orientation of individual neurons and perturbation responsive neurons were more likely to prefer slow visual speeds. Our results indicate that prediction error signals can be explained by the convergence of known motor and sensory signals, providing a purely sensory and motor explanation for purported mismatch signals.

Kohn lab, Albert Einstein College of Medicine; Growth Intelligence, UK

- CANCELLED -

Selina Solomon

Oct 19, 2021

A recent formulation of predictive coding theory proposes that a subset of neurons in each cortical area encodes sensory prediction errors, the difference between predictions relayed from higher cortex and the sensory input. Here, we test for evidence of prediction error responses in spiking responses and local field potentials (LFP) recorded in primary visual cortex and area V4 of macaque monkeys, and in complementary electroencephalographic (EEG) scalp recordings in human participants. We presented a fixed sequence of visual stimuli on most trials, and violated the expected ordering on a small subset of trials. Under predictive coding theory, pattern-violating stimuli should trigger robust prediction errors, but we found that spiking, LFP and EEG responses to expected and pattern-violating stimuli were nearly identical. Our results challenge the assertion that a fundamental computational motif in sensory cortex is to signal prediction errors, at least those based on predictions derived from temporal patterns of visual stimulation.

Buschman lab, Princeton University

Rule learning representation in the fronto-parietal network

Caroline Jahn

Sep 7, 2021

We must constantly adapt the rules we use to guide our attention. To understand how the brain learns these rules, we designed a novel task that required monkeys to learn which color is the most rewarded at a given time (the current rule). However, just as in real life, the monkey was never explicitly told the rule. Instead, they had to learn it through trial and error by choosing a color, receiving feedback (amount of reward), and then updating their internal rule. After the monkeys reached a behavioral criterion, the rule changed. This change was not cued but could be inferred based on reward feedback. Behavioral modeling found monkeys used rewards to learn the rules. After the rule changed, animals adopted one of two strategies. If the change was small, reflected in a small reward prediction error, the animals continuously updated their rule. However, for large changes, monkeys ‘reset’ their belief about the rule and re-learned the rule from scratch. To understand the neural correlates of learning new rules, we recorded neurons simultaneously from the prefrontal and parietal cortex. We found that the strength of the rule representation increased with the certainty about the current rule, and that the certainty about the rule was represented both implicitly and explicitly in the population.

Monash Biomedical Imaging

Understanding the role of prediction in sensory encoding

Jason Mattingley

Jul 28, 2021

At any given moment the brain receives more sensory information than it can use to guide adaptive behaviour, creating the need for mechanisms that promote efficient processing of incoming sensory signals. One way in which the brain might reduce its sensory processing load is to encode successive presentations of the same stimulus in a more efficient form, a process known as neural adaptation. Conversely, when a stimulus violates an expected pattern, it should evoke an enhanced neural response. Such a scheme for sensory encoding has been formalised in predictive coding theories, which propose that recent experience establishes expectations in the brain that generate prediction errors when violated. In this webinar, Professor Jason Mattingley will discuss whether the encoding of elementary visual features is modulated when otherwise identical stimuli are expected or unexpected based upon the history of stimulus presentation. In humans, EEG was employed to measure neural activity evoked by gratings of different orientations, and multivariate forward modelling was used to determine how orientation selectivity is affected for expected versus unexpected stimuli. In mice, two-photon calcium imaging was used to quantify orientation tuning of individual neurons in the primary visual cortex to expected and unexpected gratings. Results revealed enhanced orientation tuning to unexpected visual stimuli, both at the level of whole-brain responses and for individual visual cortex neurons. Professor Mattingley will discuss the implications of these findings for predictive coding theories of sensory encoding. Professor Jason Mattingley is a Laureate Fellow and Foundation Chair in Cognitive Neuroscience at The University of Queensland. His research is directed toward understanding the brain processes that support perception, selective attention and decision-making, in health and disease.

Active sleep in flies: the dawn of consciousness

Bruno van Swinderen

University of Queensland

Jul 18, 2021

The brain is a prediction machine. Yet the world is never entirely predictable, for any animal. Unexpected events are surprising and this typically evokes prediction error signatures in animal brains. In humans such mismatched expectations are often associated with an emotional response as well. Appropriate emotional responses are understood to be important for memory consolidation, suggesting that valence cues more generally constitute an ancient mechanism designed to potently refine and generalize internal models of the world and thereby minimize prediction errors. On the other hand, abolishing error detection and surprise entirely is probably also maladaptive, as this might undermine the very mechanism that brains use to become better prediction machines. This paradoxical view of brain functions as an ongoing tug-of-war between prediction and surprise suggests a compelling new way to study and understand the evolution of consciousness in animals. I will present approaches to studying attention and prediction in the tiny brain of the fruit fly, Drosophila melanogaster. I will discuss how an ‘active’ sleep stage (termed rapid eye movement – REM – sleep in mammals) may have evolved in the first animal brains as a mechanism for optimizing prediction in motile creatures confronted with constantly changing environments. A role for REM sleep in emotional regulation could thus be better understood as an ancient sleep function that evolved alongside selective attention to maintain an adaptive balance between prediction and surprise. This view of active sleep has some interesting implications for the evolution of subjective awareness and consciousness.

A role for dopamine in value-free learning

Luke Coddington

Dudman lab, HHMI Janelia

Jul 13, 2021

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioral policies and indirect learning via value functions. Policy learning and value learning employ distinct algorithms that depend upon evaluation of errors in performance and reward prediction errors, respectively. In mammals, behavioral learning and the role of mesolimbic dopamine signaling have been extensively evaluated with respect to reward prediction errors; but there has been little consideration of how direct policy learning might inform our understanding. I’ll discuss our recent work on classical conditioning in naïve mice (https://www.biorxiv.org/content/10.1101/2021.05.31.446464v1) that provides multiple lines of evidence that phasic dopamine signaling regulates policy learning from performance errors in addition to its well-known roles in value learning. This work points towards new opportunities for unraveling the mechanisms of basal ganglia control over behavior under both adaptive and maladaptive learning conditions.

The Medical School, University of Salamanca, Spain

The precision of prediction errors in the auditory cortex

Manolo Malmierca

Jan 24, 2021

Generalization guided exploration

Charley Wu

Max Planck

Dec 15, 2020

How do people learn in real-world environments where the space of possible actions can be vast or even infinite? The study of human learning has made rapid progress in past decades, from discovering the neural substrate of reward prediction errors, to building AI capable of mastering the game of Go. Yet this line of research has primarily focused on learning through repeated interactions with the same stimuli. How are humans able to rapidly adapt to novel situations and learn from such sparse examples? I propose a theory of how generalization guides human learning, by making predictions about which unobserved options are most promising to explore. Inspired by Roger Shepard’s law of generalization, I show how a Bayesian function learning model provides a mechanism for generalizing limited experiences to a wide set of novel possibilities, based on the simple principle that similar actions produce similar outcomes. This model of generalization generates predictions about the expected reward and underlying uncertainty of unexplored options, where both are vital components in how people actively explore the world. This model allows us to explain developmental differences in the explorative behavior of children, and suggests a general principle of learning across spatial, conceptual, and structured domains.

Howard Hughes Medical Institute

The role of spatiotemporal waves in coordinating regional dopamine decision signals

Arif Hamid

Oct 14, 2020

The neurotransmitter dopamine is essential for normal reward learning and motivational arousal processes. Indeed these core functions are implicated in the major neurological and psychiatric dopamine disorders such as schizophrenia, substance abuse disorders/addiction and Parkinson's disease. Over the years, we have made significant strides in understanding the dopamine system across multiple levels of description, and I will focus on our recent advances in the computational description, and brain circuit mechanisms that facilitate the dual role of dopamine in learning and performance. I will specifically describe our recent work with imaging the activity of dopamine axons and measurements of dopamine release in mice performing various behavioural tasks. We discovered wave-like spatiotemporal activity of dopamine in the striatal region, and I will argue that this pattern of activation supports a critical computational operation; spatiotemporal credit assignment to regional striatal subexperts. Our findings provide a mechanistic description for vectorizing reward prediction error signals relayed by dopamine.

Technical University Berlin

Self-organisation in interneuron circuits

Henning Sprekeler

Sep 24, 2020

Inhibitory interneurons come in different classes and form intricate circuits. While our knowledge of these circuits has advanced substantially over the last decades, it is not fully understood how the structure of these circuits relates to their function. I will present some of our recent attempts to “understand” the structure of interneuron circuits by means of computational modeling. Surprisingly (at least for us), we found that prominent features of inhibitory circuitry can be accounted for by an optimisation for excitation-inhibition (E/I) balance. In particular, we find that such an optimisation generates networks that resemble mouse V1 in terms of the structure of synaptic efficacies between principal cells and parvalbumin-positive interneurons. Moreover, an optimisation for E/I balance across neuronal compartments promotes a functional diversification of interneurons into two classes that resemble parvalbumin and somatostatin-positive interneurons. Time permitting, I may briefly touch on recent work in which we link E/I balance to prediction error coding in V1.

Delineating Reward/Avoidance Decision Process in the Impulsive-compulsive Spectrum Disorders through a Probabilistic Reversal Learning Task

Xiaoliu Zhang

Monash University

Jul 18, 2020

Impulsivity and compulsivity are behavioural traits that underlie many aspects of decision-making and form the characteristic symptoms of Obsessive Compulsive Disorder (OCD) and Gambling Disorder (GD). The neural underpinnings of aspects of reward and avoidance learning under the expression of these traits and symptoms are only partially understood. " "The present study combined behavioural modelling and neuroimaging technique to examine brain activity associated with critical phases of reward and loss processing in OCD and GD. " "Forty-two healthy controls (HC), forty OCD and twenty-three GD participants were recruited in our study to complete a two-session reinforcement learning (RL) task featuring a “probability switch (PS)” with imaging scanning. Finally, 39 HC (20F/19M, 34 yrs +/- 9.47), 28 OCD (14F/14M, 32.11 yrs ±9.53) and 16 GD (4F/12M, 35.53yrs ± 12.20) were included with both behavioural and imaging data available. The functional imaging was conducted by using 3.0-T SIEMENS MAGNETOM Skyra syngo MR D13C at Monash Biomedical Imaging. Each volume compromised 34 coronal slices of 3 mm thickness with 2000 ms TR and 30 ms TE. A total of 479 volumes were acquired for each participant in each session in an interleaved-ascending manner. " " The standard Q-learning model was fitted to the observed behavioural data and the Bayesian model was used for the parameter estimation. Imaging analysis was conducted using SPM12 (Welcome Department of Imaging Neuroscience, London, United Kingdom) in the Matlab (R2015b) environment. The pre-processing commenced with the slice timing, realignment, normalization to MNI space according to T1-weighted image and smoothing with a 8 mm Gaussian kernel. " " The frontostriatal brain circuit including the putamen and medial orbitofrontal (mOFC) were significantly more active in response to receiving reward and avoiding punishment compared to receiving an aversive outcome and missing reward at 0.001 with FWE correction at cluster level; While the right insula showed greater activation in response to missing rewards and receiving punishment. Compared to healthy participants, GD patients showed significantly lower activation in the left superior frontal and posterior cingulum at 0.001 for the gain omission. " " The reward prediction error (PE) signal was found positively correlated with the activation at several clusters expanding across cortical and subcortical region including the striatum, cingulate, bilateral insula, thalamus and superior frontal at 0.001 with FWE correction at cluster level. The GD patients showed a trend of decreased reward PE response in the right precentral extending to left posterior cingulate compared to controls at 0.05 with FWE correction. " " The aversive PE signal was negatively correlated with brain activity in regions including bilateral thalamus, hippocampus, insula and striatum at 0.001 with FWE correction. Compared with the control group, GD group showed an increased aversive PE activation in the cluster encompassing right thalamus and right hippocampus, and also the right middle frontal extending to the right anterior cingulum at 0.005 with FWE correction. " " Through the reversal learning task, the study provided a further support of the dissociable brain circuits for distinct phases of reward and avoidance learning. Also, the OCD and GD is characterised by aberrant patterns of reward and avoidance processing.

Counterfactual outcomes affect reward expectation and prediction errors in macaque frontal cortex

COSYNE 2022

VTA dopamine neurons signal phasic and ramping reward prediction error in goal-directed navigation

COSYNE 2022

Learning and expression of dopaminergic reward prediction error via plastic representations of time

COSYNE 2022

Learning and expression of dopaminergic reward prediction error via plastic representations of time

COSYNE 2022

Neurons in dlPFC signal unsigned reward prediction error independently from value

COSYNE 2022

Neurons in dlPFC signal unsigned reward prediction error independently from value

COSYNE 2022

Uncertainty-weighted prediction errors (UPEs) in cortical microcircuits

COSYNE 2022

Uncertainty-weighted prediction errors (UPEs) in cortical microcircuits

COSYNE 2022

Jae Hoon Shin, Jee Hang Lee, Sang Wan Lee

Controlling human cortical and striatal reinforcement learning with meta prediction error

COSYNE 2023

Quentin Chevy, Rui Ponte Costa, Zoltan Szadai, Rozsa Balazs, Adam Kepecs

A cortical microcircuit for reinforcement prediction error

COSYNE 2023

Wei Ji Ma, Dongjae Kim, Heiko Schuett

Reward prediction error neurons implement an efficient code for value

COSYNE 2023

Priyanka Gupta, Marie Dussauze, Uri Livneh, Dinu Albeanu

Sensorimotor prediction errors in the mouse olfactory cortex

COSYNE 2023

Justin Buck, Mark Slifstein, Jodi Weinstein, Roberto Gil, Jared Van Snellenberg, Christoph Juchem, Anissa Abi-Dargham, Guillermo Horga

Altered sensory prediction error signaling and dopamine function drive speech hallucinations in schizophrenia

COSYNE 2025

Eleonora Bano, Amelia Christensen, Fengrui Zhang, Adam Kepecs

Sensory Prediction Error signals in Tail of the Striatum Dopamine

COSYNE 2025

Daphne Zafiri, Ximena Icaria Salinas-Hernández, Eloah S. De Biasi, Leonor Rebelo, Sevil Duvarci

Dopamine prediction error signaling in a unique nigrostriatal circuit is critical for associative fear learning

FENS Forum 2024

Matthias Tsai, Jasper Teutsch, Willem Wybo, Fritjof Helmchen, Abhishek Banerjee, Walter Senn

Hierarchy of prediction errors shapes context-dependent sensory representations

FENS Forum 2024