Resources
Authors & Affiliations
Michele Nardin, Ann Hermundstad
Abstract
Maintaining physiological homeostasis is essential for survival. To achieve this, brains integrate various
inputs, such as internal physiological and external environmental signals, to guide behavior. Inspired
by brain-body literature, we consider three key axes along which the dynamics of these inputs can
differ: their timescales, dependence on actions, and contribution to error signals that drive behavior. To
understand how these axes impact physiological homeostasis, we consider an agent that can influence its
internal physiology through interactions with the environment. We assume that environmental factors
vary over short timescales, and influence physiological states on longer timescales through the agent’s
actions. We further assume that the error signal, formalized through a loss function, depends solely on
physiology, akin to an internal evaluation of well-being. First, we show that simple decision-making
models that capture these assumptions allow for an efficient hierarchical policy structure, where a
master policy acts on a slow variable, and selects a relevant subpolicy that depends on a fast variable.
We then show how these hierarchies emerge in increasingly complex environment-brain-body models
trained with Q-learning and deep reinforcement learning. These agents navigate and interact with the
environment to maintain physiological homeostasis by minimizing a physiology-dependent loss function.
The behavior of trained agents can be partitioned into different slow modes that specify actions on
faster timescales, allowing for an almost lossless recasting of trained policies into compact, hierarchical
ones. This hierarchy results from the interaction between slow (physiological) and fast (environmental)
factors, which leads to specific behavioral adaptations depending on internal needs. Finally, these
models provide predictions on state-dependent stimulus encoding. Specifically, we find that physiology
modulates the mutual information between external stimuli and within-layer population responses,
favoring stimuli relevant to the current need. These results lead to experimentally testable predictions
for validating brain-body hierarchical decision-making models during naturalistic behavior.