ePoster

A striatal probabilistic population code for reward underlies distributional reinforcement learning

Adam Lowet,Qiao Zheng,Sara Matias,Naoshige Uchida,Jan Drugowitsch
COSYNE 2022(2022)
Lisbon, Portugal

Conference

COSYNE 2022

Lisbon, Portugal

Resources

Authors & Affiliations

Adam Lowet,Qiao Zheng,Sara Matias,Naoshige Uchida,Jan Drugowitsch

Abstract

Research in machine learning has realized large performance gains on a variety of tasks by expanding the target of learning from the mean reward, as in traditional reinforcement learning (RL), to the entire distribution of rewards, an approach known as distributional RL. Dopamine (DA) neurons projecting from the midbrain to the striatum have long been thought to drive traditional RL in the mammalian brain. Moreover, a recent analysis of the response diversity of these neurons shows they have the appropriate properties to support distributional RL, and thus the learning of complete reward distributions. However, while representations of mean reward (frequently called “value”) abound across brain regions, particularly in the striatum, little is known about how neurons encode information about higher-order moments of reward distributions — much less the complete shapes of these distributions. To fill this gap, we used Neuropixels probes to acutely record striatal activity from well-trained mice (n=9) in three classical conditioning tasks, in which unique odors were paired with particular reward distributions. We found that striatal neurons stably represent reward distributions, over and above mean reward, stimulus identity, and behavioral output. We then asked what mathematical form these codes take by modeling population responses as either probabilistic population codes (PPCs), distributed distributional codes (DDCs), quantile codes, or expectile codes, which differ in the particular statistics they use to characterize encoded probability distributions. We consistently found that PPCs outperformed the other code types, allowing us to decode from single-trial population responses not only the identity of reward distributions, but also their precise shapes. These results simultaneously bolster the core claim of distributional RL in neuroscience — that neurons encode full reward distributions — while challenging existing distributional RL models, which rely on other code types.

Unique ID: cosyne-22/striatal-probabilistic-population-code-435846e5