ePoster

Temporal difference learning models explain behavior and dopamine during contingency degradation

Mark Burrell, Venkatesh Murthy, Naoshige Uchida, Lechen Qian, Jay Hennig, Sara Matias, Samuel Gershman
COSYNE 2025(2025)
Montreal, Canada

Conference

COSYNE 2025

Montreal, Canada

Resources

Authors & Affiliations

Mark Burrell, Venkatesh Murthy, Naoshige Uchida, Lechen Qian, Jay Hennig, Sara Matias, Samuel Gershman

Abstract

Associative learning relies on the contingency between stimuli and outcomes, yet the neural mechanisms instantiating contingency remain unclear. We investigated dopamine activity in the ventral striatum of mice performing a Pavlovian contingency degradation task. Both anticipatory licking behavior and dopamine responses to a conditioned stimulus diminished when additional rewards were delivered without cues (uncued) but remained consistent when the same additional rewards were paired with a new cue. These findings challenge traditional contingency-based accounts and the recent causal learning model ANCCR1, which predict similar outcomes in both conditions. Instead, our results are accurately explained by temporal difference (TD) learning models that incorporate appropriate representations of the inter-trial interval (ITI). We further demonstrated that recurrent neural networks (RNNs) trained within a TD framework2 naturally develop state representations akin to our best handcrafted models. This suggests that TD error signals, as conveyed by dopaminergic activity, inherently reflect contingency by comparing expected outcomes with actual outcomes over time. By comparing various state representations, including analyzing the hidden unit activity of the RNNs, we identify that it is sufficient to explain all our results with TD to represent the time during the ITI as a growing belief that the next trial is imminent. Our best handcrafted model was inspired by previous work on belief-states3 (belief state being the posterior probability over possible states) and represents this growing belief as the slow transition between two states in a manner that is informed by knowledge of the transition structure of the task. The RNNs reliably learn a similar representation, as belief-states were reliable decoded from hidden unit activity. Our findings suggest that the TD error can be a measure that describes both contingency and dopaminergic activity.

Unique ID: cosyne-25/temporal-difference-learning-models-46b693c2