Resources
Authors & Affiliations
Nada Abdelrahman, Wanchen Jiang, Joshua Dudman, Ann Hermundstad
Abstract
Animals quickly learn to navigate to rewarding or salient landmarks in their environments. However, existing
models often require thousands of trials to learn contingencies that animals learn within tens of trials, and they
do so via unstructured sequences of actions that do not mimic real behavior. In this work, we study rapid
learning in a hidden-target foraging task for mice in which animals learn to intercept an uncued target location
within an open arena. To study the computational underpinnings of this learning, we build an agent that controls
its speed and heading over time via a pre-specified set of generative functions; the parameters of these functions
can be chosen to smoothly link pairs of spatial locations (“anchor points”). To support learning, we assume
that the agent maintains and updates a belief about the target location, which is in turn used to sample anchor
points that guide the composition of subsequent trajectories. Three key features enable rapid learning: firstly,
learning operates over a low-dimensional set of generative model parameters, rather than a high-dimensional
set of discrete location-action pairs; secondly, the agent learns from both rewarded and unrewarded trajectories;
lastly, the agent samples anchor points that efficiently narrow down the space of hypothesized target locations
by iteratively halving it. As a result, the agent learns within tens of trials to intercept new targets regardless
of their spatial separation, matching learning rates observed in mice and significantly outperforming standard
reinforcement learning models. In doing so, the agent replicates new features of behavior, such as the progression
from more extended to more compact trajectories during learning. Together, this work integrates concepts that
have typically been treated separately---such as motor planning, execution, and spatial learning---to understand
how animals efficiently explore space and quickly modify their behavior based on experience.