Resources
Authors & Affiliations
Jacob Bakermans,Joseph Warren,James Whittington,Timothy Behrens
Abstract
In reinforcement learning (RL), we try to learn a function f that maps a state representation s to actions a: a=f(s). Learning f is hard, and is typically performed in each environment. What if, instead, we learn how to rapidly construct s out of general building blocks, for which the same f always works? We demonstrate (1) that this is possible in spatial worlds using object- and goal-vector representations that exist in the brain and (2) that this ‘behavioural construction’ dramatically outperforms traditional RL in simple spatial problems. In a companion paper, we demonstrate that the “building-block” representations can be learnt by predicting actions. For such a mechanism to work, each state representation (s) must contain information about the whole environment, so that actions (a) can be inferred directly from any s. To achieve this, whenever a new object or reward is encountered, we initialise a vector-representation centred on that object. Because vector representations path-integrate, the correct object-centred representations can immediately be inferred at all remote locations, either during exploration or, critically, in replay. These representations are bound to their locations in memory – effectively building memories of future behaviour. When the agent visits a new location, its s already contains information about all objects. We show it is possible to learn a function, f, that maps s to optimal a, including tortuous paths around multiple boundaries to the goal. Crucially, the same f works in all environments. Hence, instead of learning actions afresh in each environment, we construct a representation of the current situation from generalisable representations that predict good actions. By simply binding existing cortical representations in hippocampal memory, replay implicitly constructs state spaces (latent learning) and performs credit assignment. Although demonstrated in physical space, this approach will work in any structured space where actions have consistent meaning.