ePoster

Arithmetic value representation for hierarchical behavior composition

Hiroshi Makino
COSYNE 2022(2022)
Lisbon, Portugal
Presented: Mar 19, 2022

Conference

COSYNE 2022

Lisbon, Portugal

Resources

Authors & Affiliations

Hiroshi Makino

Abstract

The ability to compose new skills from a pre-acquired behavior repertoire is a hallmark of intelligence in humans and other animals. In deep reinforcement learning (RL), artificial agents can extract re-usable skills from past experience and recombine them in a hierarchical manner. It remains largely unknown, however, whether the brain similarly composes a novel behavior. Here we trained deep RL agents with the soft actor-critic (SAC) algorithm and studied their representation of RL variables during hierarchical learning. The objective of SAC is to maximize future cumulative rewards and policy entropy, which confer artificial agents with flexibility and robustness to perturbation. We demonstrate that the agents learned to solve a novel composite task by additively combining representations of previously learned values of actions from constituent subtasks. Sample efficiency in the composite task was further augmented by the introduction of a stochastic policy in the subtask, which endowed the agents with a wide range of action representations. These theoretical predictions were empirically tested in mice trained in the same behavior paradigm, where mice with prior subtask training rapidly learned the composite task. Cortex-wide two-photon calcium imaging across the subtasks and composite task revealed neural representations of combined action values analogous to those observed in the deep RL agents. These mixed representations of subtask action values in single neurons of mice were not observed in the agents when a new value function was constructed by taking the maximum of the subtask-related action values, highlighting the specificity of the additive operation. As in the case of the deep RL agents, learning efficiency in mice was enhanced when the subtask policy was made more stochastic. Together, these results suggest that the brain composes a novel behavior with a simple arithmetic operation of pre-acquired action-value representations with a stochastic policy.

Unique ID: cosyne-22/arithmetic-value-representation-hierarchical-97511ae6