← Back

Reward Prediction Error

Topic spotlight
TopicWorld Wide

reward prediction error

Discover seminars, jobs, and research tagged with reward prediction error across World Wide.
17 curated items11 Seminars6 ePosters
Updated 5 months ago
17 items · reward prediction error
17 results
SeminarNeuroscience

Understanding reward-guided learning using large-scale datasets

Kim Stachenfeld
DeepMind, Columbia U
Jul 8, 2025

Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.

SeminarNeuroscience

Understanding reward-guided learning using large-scale datasets

Kim Stachenfeld
DeepMind, Columbia U
May 13, 2025

Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.

SeminarNeuroscience

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Harel Shouval
The University of Texas at Houston
Jun 13, 2023

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.

SeminarNeuroscience

Richly structured reward predictions in dopaminergic learning circuits

Angela J. Langdon
National Institute of Mental Health at National Institutes of Health (NIH)
May 16, 2023

Theories from reinforcement learning have been highly influential for interpreting neural activity in the biological circuits critical for animal and human learning. Central among these is the identification of phasic activity in dopamine neurons as a reward prediction error signal that drives learning in basal ganglia and prefrontal circuits. However, recent findings suggest that dopaminergic prediction error signals have access to complex, structured reward predictions and are sensitive to more properties of outcomes than learning theories with simple scalar value predictions might suggest. Here, I will present recent work in which we probed the identity-specific structure of reward prediction errors in an odor-guided choice task and found evidence for multiple predictive “threads” that segregate reward predictions, and reward prediction errors, according to the specific sensory features of anticipated outcomes. Our results point to an expanded class of neural reinforcement learning algorithms in which biological agents learn rich associative structure from their environment and leverage it to build reward predictions that include information about the specific, and perhaps idiosyncratic, features of available outcomes, using these to guide behavior in even quite simple reward learning tasks.

SeminarNeuroscienceRecording

Learning static and dynamic mappings with local self-supervised plasticity

Pantelis Vafeidis
California Institute of Technology
Sep 6, 2022

Animals exhibit remarkable learning capabilities with little direct supervision. Likewise, self-supervised learning is an emergent paradigm in artificial intelligence, closing the performance gap to supervised learning. In the context of biology, self-supervised learning corresponds to a setting where one sense or specific stimulus may serve as a supervisory signal for another. After learning, the latter can be used to predict the former. On the implementation level, it has been demonstrated that such predictive learning can occur at the single neuron level, in compartmentalized neurons that separate and associate information from different streams. We demonstrate the power such self-supervised learning over unsupervised (Hebb-like) learning rules, which depend heavily on stimulus statistics, in two examples: First, in the context of animal navigation where predictive learning can associate internal self-motion information always available to the animal with external visual landmark information, leading to accurate path-integration in the dark. We focus on the well-characterized fly head direction system and show that our setting learns a connectivity strikingly similar to the one reported in experiments. The mature network is a quasi-continuous attractor and reproduces key experiments in which optogenetic stimulation controls the internal representation of heading, and where the network remaps to integrate with different gains. Second, we show that incorporating global gating by reward prediction errors allows the same setting to learn conditioning at the neuronal level with mixed selectivity. At its core, conditioning entails associating a neural activity pattern induced by an unconditioned stimulus (US) with the pattern arising in response to a conditioned stimulus (CS). Solving the generic problem of pattern-to-pattern associations naturally leads to emergent cognitive phenomena like blocking, overshadowing, saliency effects, extinction, interstimulus interval effects etc. Surprisingly, we find that the same network offers a reductionist mechanism for causal inference by resolving the post hoc, ergo propter hoc fallacy.

SeminarNeuroscienceRecording

A role for dopamine in value-free learning

Luke Coddington
Dudman lab, HHMI Janelia
Jul 13, 2021

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioral policies and indirect learning via value functions. Policy learning and value learning employ distinct algorithms that depend upon evaluation of errors in performance and reward prediction errors, respectively. In mammals, behavioral learning and the role of mesolimbic dopamine signaling have been extensively evaluated with respect to reward prediction errors; but there has been little consideration of how direct policy learning might inform our understanding. I’ll discuss our recent work on classical conditioning in naïve mice (https://www.biorxiv.org/content/10.1101/2021.05.31.446464v1) that provides multiple lines of evidence that phasic dopamine signaling regulates policy learning from performance errors in addition to its well-known roles in value learning. This work points towards new opportunities for unraveling the mechanisms of basal ganglia control over behavior under both adaptive and maladaptive learning conditions.

SeminarNeuroscience

Generalization guided exploration

Charley Wu
Max Planck
Dec 15, 2020

How do people learn in real-world environments where the space of possible actions can be vast or even infinite? The study of human learning has made rapid progress in past decades, from discovering the neural substrate of reward prediction errors, to building AI capable of mastering the game of Go. Yet this line of research has primarily focused on learning through repeated interactions with the same stimuli. How are humans able to rapidly adapt to novel situations and learn from such sparse examples? I propose a theory of how generalization guides human learning, by making predictions about which unobserved options are most promising to explore. Inspired by Roger Shepard’s law of generalization, I show how a Bayesian function learning model provides a mechanism for generalizing limited experiences to a wide set of novel possibilities, based on the simple principle that similar actions produce similar outcomes. This model of generalization generates predictions about the expected reward and underlying uncertainty of unexplored options, where both are vital components in how people actively explore the world. This model allows us to explain developmental differences in the explorative behavior of children, and suggests a general principle of learning across spatial, conceptual, and structured domains.

SeminarNeuroscience

Delineating Reward/Avoidance Decision Process in the Impulsive-compulsive Spectrum Disorders through a Probabilistic Reversal Learning Task

Xiaoliu Zhang
Monash University
Jul 18, 2020

Impulsivity and compulsivity are behavioural traits that underlie many aspects of decision-making and form the characteristic symptoms of Obsessive Compulsive Disorder (OCD) and Gambling Disorder (GD). The neural underpinnings of aspects of reward and avoidance learning under the expression of these traits and symptoms are only partially understood. " "The present study combined behavioural modelling and neuroimaging technique to examine brain activity associated with critical phases of reward and loss processing in OCD and GD. " "Forty-two healthy controls (HC), forty OCD and twenty-three GD participants were recruited in our study to complete a two-session reinforcement learning (RL) task featuring a “probability switch (PS)” with imaging scanning. Finally, 39 HC (20F/19M, 34 yrs +/- 9.47), 28 OCD (14F/14M, 32.11 yrs ±9.53) and 16 GD (4F/12M, 35.53yrs ± 12.20) were included with both behavioural and imaging data available. The functional imaging was conducted by using 3.0-T SIEMENS MAGNETOM Skyra syngo MR D13C at Monash Biomedical Imaging. Each volume compromised 34 coronal slices of 3 mm thickness with 2000 ms TR and 30 ms TE. A total of 479 volumes were acquired for each participant in each session in an interleaved-ascending manner. " " The standard Q-learning model was fitted to the observed behavioural data and the Bayesian model was used for the parameter estimation. Imaging analysis was conducted using SPM12 (Welcome Department of Imaging Neuroscience, London, United Kingdom) in the Matlab (R2015b) environment. The pre-processing commenced with the slice timing, realignment, normalization to MNI space according to T1-weighted image and smoothing with a 8 mm Gaussian kernel. " " The frontostriatal brain circuit including the putamen and medial orbitofrontal (mOFC) were significantly more active in response to receiving reward and avoiding punishment compared to receiving an aversive outcome and missing reward at 0.001 with FWE correction at cluster level; While the right insula showed greater activation in response to missing rewards and receiving punishment. Compared to healthy participants, GD patients showed significantly lower activation in the left superior frontal and posterior cingulum at 0.001 for the gain omission. " " The reward prediction error (PE) signal was found positively correlated with the activation at several clusters expanding across cortical and subcortical region including the striatum, cingulate, bilateral insula, thalamus and superior frontal at 0.001 with FWE correction at cluster level. The GD patients showed a trend of decreased reward PE response in the right precentral extending to left posterior cingulate compared to controls at 0.05 with FWE correction. " " The aversive PE signal was negatively correlated with brain activity in regions including bilateral thalamus, hippocampus, insula and striatum at 0.001 with FWE correction. Compared with the control group, GD group showed an increased aversive PE activation in the cluster encompassing right thalamus and right hippocampus, and also the right middle frontal extending to the right anterior cingulum at 0.005 with FWE correction. " " Through the reversal learning task, the study provided a further support of the dissociable brain circuits for distinct phases of reward and avoidance learning. Also, the OCD and GD is characterised by aberrant patterns of reward and avoidance processing.

ePoster

VTA dopamine neurons signal phasic and ramping reward prediction error in goal-directed navigation

COSYNE 2022

ePoster

Learning and expression of dopaminergic reward prediction error via plastic representations of time

COSYNE 2022

ePoster

Learning and expression of dopaminergic reward prediction error via plastic representations of time

COSYNE 2022

ePoster

Neurons in dlPFC signal unsigned reward prediction error independently from value

COSYNE 2022

ePoster

Neurons in dlPFC signal unsigned reward prediction error independently from value

COSYNE 2022

ePoster

Reward prediction error neurons implement an efficient code for value

Wei Ji Ma, Dongjae Kim, Heiko Schuett

COSYNE 2023