Resources
Authors & Affiliations
Sara Matias, Malcolm Campbell, Shudi Xu, Adam Lowet, Jan Drugowitsch, Naoshige Uchida
Abstract
Animal behavior is controlled through the coordinated action of multiple learning systems in the brain. One of these systems, the basal ganglia, instantiates a reinforcement learning (RL) algorithm in which dopamine (DA) neurons transmit reward prediction error (RPE) signals---the difference between actual and expected rewards---to enable value learning via cortico-striatal plasticity. Recent studies have highlighted two novel aspects: first, that RPE signals from midbrain DA neurons can encode entire reward distributions through a distributional RL algorithm that mirrors cutting-edge machine learning approaches, and second, that dopamine axons projecting to different regions of the striatum exhibit functional heterogeneity, indicating that not all DA neurons encode RPE. To examine the functional and anatomical organization of RPE and non-RPE dopamine signals, we conducted multi-fiber photometry recordings of dopamine axonal activity across the entire striatum. We observed that while RPE signals are present throughout the striatum in a reward-based task, aversive signals are heterogeneous. For example, DA in the dorsomedial striatum is activated by airpuffs, while that in the dorsolateral striatum conveys a brief biphasic response. However, fiber photometry recordings cannot disentangle whether the recorded signals are generated from a uniform population of dopamine axons or if, on the contrary, functionally heterogeneous axons intermingle in any particular striatal area. To overcome this limitation, we performed projection-identified electrophysiological recordings from midbrain DA neurons, to investigate if all dopamine neurons, projecting to all striatal regions, encode the reward distribution. We found that pure RPE-encoding DA neurons project to the lateral nucleus accumbens shell (lAcbSh), and broadly across the striatum. Moreover, lAcbSh- and broadly-projecting DA neurons show structured RPE heterogeneity consistent with distributional RL predictions for a quantile-like population code. Our findings suggest that dopamine-based RL is organized through a “distributional critic” architecture that is superimposed on other outcome-specific information, supporting continuous, reward-informed, behavioral control.