ePoster

Learning and expression of dopaminergic reward prediction error via plastic representations of time

Ian Cone,Claudia Clopath,Harel Shouval
COSYNE 2022(2022)
Lisbon, Portugal

Conference

COSYNE 2022

Lisbon, Portugal

Resources

Authors & Affiliations

Ian Cone,Claudia Clopath,Harel Shouval

Abstract

Various models and investigations over the years have attempted to pin down a mechanistic explanation for how dopamine (DA) neurons in the brain can exhibit reward prediction error (RPE), usually through direct analogy to temporal difference (TD) learning. However, there are two key hurdles in imagining how TD could plausibly be implemented in the brain. First, TD models of DA learning frequently require arbitrarily constructed and unrealistic components, such as a temporal chain of feature-specific neurons that uniformly tile the time from stimulus onset to reward arrival. Secondly, various predictions of TD clash with experimental observations of how dopaminergic RPE evolves over learning. Here, we present a biophysically plausible network architecture of spiking neurons, that when coupled with local Hebbian and eligibility trace learning rules, learns RPEs and can replicate results observed from multiple experimental paradigms. The model learns feature specific representations of time, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. Following learning, our model DA neurons report a distribution of “optimistic” and “pessimistic” RPEs, akin to those seen in distributional reinforcement learning literature. While DA firing in our model reflects an accurate RPE before and after learning, these two quantities are not necessarily synonymous during learning. This separation of DA neuron firing from a strict RPE allows our model to unify seemingly mutually exclusive experimental results, as well as make unique predictions that directly contrast those of TD. One such prediction is that even after overtraining, reward omission will still result in a negative RPE at the time of expected reward, since the model’s representation of the cue-reward delay (and thereby cue-specific suppression of the reward-triggered dopamine) is maintained for timescales longer than the cue-evoked dopamine.

Unique ID: cosyne-22/learning-expression-dopaminergic-reward-13fe24b6