Temporal Difference Learning

Topic spotlight

TopicWorld Wide

temporal difference learning

Discover seminars, jobs, and research tagged with temporal difference learning across World Wide.

6 curated items4 ePosters2 Seminars

Updated over 2 years ago

Browse all topics Explore domains

6 items · temporal difference learning

6 results

SeminarNeuroscience

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Harel Shouval

The University of Texas at Houston

Jun 13, 2023

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.

SeminarNeuroscience

Striatal circuits for reward learning and decision-making

Ilana Witten

Princeton University

Jun 10, 2020

How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens (NAc), which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex (PL) and midline regions of the thalamus (mTH). However, little is known about what is represented in PL or mTH neurons that project to NAc (PL-NAc and mTH-NAc). By comparing these inputs during a reinforcement learning task in mice, we discovered that i) PL-NAc preferentially represents actions and choices, ii) mTH-NAc preferentially represents cues, iii) choice-selective activity in PL-NAc is organized in sequences that persist beyond the outcome. Through computational modelling, we demonstrate that these sequences can support the neural implementation of temporal difference learning, a powerful algorithm to connect actions and outcomes across time. Finally, we test and confirm predictions of our circuit model by direct manipulation of PL-NAc neurons. Thus, we integrate experiment and modelling to suggest a neural solution for credit assignment.

ePoster

Reward Bases: instant reward revaluation with temporal difference learning

COSYNE 2022

ePoster

Reward Bases: instant reward revaluation with temporal difference learning

COSYNE 2022

ePoster

Minimal neural circuit elements for dopaminergic temporal difference learning

Malcolm Campbell, Yongsoo Ra, Shudi Xu, Sara Matias, Mitsuko Watabe-Uchida, Naoshige Uchida

COSYNE 2025

ePoster