ePoster

World structure emerges from inferring basic actions for goal-directed behavior

Caroline Haimerl, Joseph Paton, Daniel McNamee
COSYNE 2025(2025)
Montreal, Canada

Conference

COSYNE 2025

Montreal, Canada

Resources

Authors & Affiliations

Caroline Haimerl, Joseph Paton, Daniel McNamee

Abstract

The brain needs to process high-dimensional sensory information in order to control the non-linear dynamics of body movements. One view posits that it does so modularly with different brain regions solving specific and distinct objectives that can be studied quasi-independently (e.g. visual classification in the ventral visual stream or path integration in entorhinal-hippocampal circuit). However, it is unclear how these modular abstract objectives would emerge and how they give rise to coherent behavior executed through concrete actions. An alternative paradigm proposes that the brain holistically computes appropriate motor commands to control the body, so that individual brain region’s representations emerge as local intermediary computations from the joint objective to collectively generate behavior. In line with the latter perspective, here we show that world structure arises from first principle in networks trained to generate flexible behavior. Specifically, we developed a deep inverse network that maps current and goal sensory observations directly to appropriate behavioral sequences during spatial navigation. Given the challenge of decoding long non-linear egocentric behavioral sequences, we hypothesized that the brain encodes high-dimensional observations in a “straightened” linearized format to facilitate action readout, taking inspiration from recent theories in the perceptual domain. We embed our linearized latent spaces (LLS) hypothesis into our network architecture as a novel inductive bias, by decoding behavior from the difference vector between the encoded latent current and goal states. We find that the resulting trained network’s latent space reflects allocentric information about the environment despite no such representations being explicitly present in the training data. In fact, we find that spatial order and straightening emerge particularly if networks are trained to decode egocentric action sequences, but not allocentric sequences. Our results suggest that deep inverse networks explain cognitive representations of world structure as intermediary representations for flexible behavior generation.

Unique ID: cosyne-25/world-structure-emerges-from-inferring-7be575cd