What do meta-reinforcement learning networks learn in two-stage decision-making?

Abstract

The striatum and prefrontal cortex (PFC) play critical roles in reinforcement learning (RL). The striatum implements a model-free RL algorithm by driving synaptic plasticity modulated by dopaminergic prediction errors. The PFC, in turn, is thought to implement a model-based algorithm through its neuronal dynamics. The role and interplay of both regions can be successfully modeled in the meta-RL framework, whereby a striatal model-free learning algorithm is used to adjust synaptic weights of the PFC network, enabling a free-standing learning algorithm through neuronal dynamics. However, it is unclear which free-standing learning algorithm emerges in PFC from the training procedure. To answer this question, we trained recurrent neural networks on the widely-studied two-stage task, in which two first-stage actions probabilistically lead to two rewarding second-stage states. We then analyzed networks’ representational geometry. We found that the networks acquired a representation with neural activity grouped by second-stage state and reward. In this space, points (i.e., neural activity on one trial) in each group formed curves. The relative location of points along these curves roughly corresponded to action probabilities. To elucidate mechanisms giving rise to these curves, we fit behavioral models to networks’ action probabilities, including model-free, reward-as-cue, model-based (MB), and latent-state (LS) algorithm families. The MB and LS families provided the best fits. Surprisingly, trial-by-trial choice probabilities predicted by the LS, but not the MB model, were consistent with networks’ action probabilities. Additionally, the more training the networks received, the more the networks sharpened their dynamics towards the LS representation. Our results demonstrate that the networks learned an augmented latent-state representation in the two-stage task. More generally, we offer a systematic approach for "opening the black box" of meta-RL agents, identifying emergent algorithms, and adjudicating model families (e.g., MB vs. LS) previously thought to be difficult to distinguish in animal experiments.

What do meta-reinforcement learning networks learn in two-stage decision-making?

Resources

Authors & Affiliations

Abstract

Guiding Visual Attention in Dynamic Scenes

Knight ADRC Seminar

TBD

Analytics consent required