Action Sequences
action sequences
A recurrent network model of planning predicts hippocampal replay and human behavior
When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as `rollouts'. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by -- and in turn adaptively affect -- prefrontal dynamics.
A recurrent network model of planning explains hippocampal replay and human behavior
When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as 'rollouts'. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by - and in turn adaptively affect - prefrontal dynamics.
The emergence and modulation of time in neural circuits and behavior
Spontaneous behavior in animals and humans shows a striking amount of variability both in the spatial domain (which actions to choose) and temporal domain (when to act). Concatenating actions into sequences and behavioral plans reveals the existence of a hierarchy of timescales ranging from hundreds of milliseconds to minutes. How do multiple timescales emerge from neural circuit dynamics? How do circuits modulate temporal responses to flexibly adapt to changing demands? In this talk, we will present recent results from experiments and theory suggesting a new computational mechanism generating the temporal variability underlying naturalistic behavior. We will show how neural activity from premotor areas unfolds through temporal sequences of attractors, which predict the intention to act. These sequences naturally emerge from recurrent cortical networks, where correlated neural variability plays a crucial role in explaining the observed variability in action timing. We will then discuss how reaction times in these recurrent circuits can be accelerated or slowed down via gain modulation, induced by neuromodulation or perturbations. Finally, we will present a general mechanism producing a reservoir of multiple timescales in recurrent networks.