ePoster

BELIEF-BASED REINFORCEMENT LEARNING EXPLAINS THE DYNAMICS OF MEMORY-DEPENDENT NAVIGATION UNDER UNCERTAINTY

Gonzalo Hernández Ortegaand 3 co-authors

Cajal Neuroscience Center (CNC), CSIC

FENS Forum 2026 (2026)
Barcelona, Spain
Board PS07-10AM-354

Presentation

Date TBA

Board: PS07-10AM-354

Poster preview

BELIEF-BASED REINFORCEMENT LEARNING EXPLAINS THE DYNAMICS OF MEMORY-DEPENDENT NAVIGATION UNDER UNCERTAINTY poster preview

Event Information

Poster Board

PS07-10AM-354

Abstract

Reward driven spatial navigation is widely used to understand factors involved in long-term memory. Memory-guided spatial navigation relies also on variables such as internal representations, environmental structure, and decision-making processes under uncertainty. In a foraging task, animals’ decision making is affected, not just by the memory of past reward locations, but also by memory-independent factors such as the exploration-exploitation tradeoff. This poses a challenge for interpreting experimental results.

To dissociate the contributions of memory from other factors affecting decision making, we used a partially observable Markov decision process (POMDP) framework. We modeled mouse behavior from the Morales et al. (2020) 8-ports spatial navigation task, where mice made choices in a high-throughput setting. In this framework, each mouse is treated as a reward‑maximizing agent making decisions under uncertainty and imperfect memory. In each session, mice start with prior beliefs (probability distributions over reward ports) representing imperfect memories of past reward locations. Belief states are updated via Bayesian inference from action-outcome history representing its current uncertain knowledge of the reward location and reward availability.

We compared a fixed greedy policy with a deep reinforcement-learning algorithm trained on empirical choice and outcome data, enabling likelihood-based policy comparison. Preliminary analyses show that policies that only maximize water reward are insufficient to capture mouse behavior, whereas agents incorporating a physical distance penalty provide substantially better fits.

Our results demonstrate that interpreting spatial navigation as a direct readout of memory is insufficient, as belief-based decision-making under uncertainty, and not reward maximization alone, shapes observed behavior.

Recommended posters

Cookies

We use essential cookies to run the site. Analytics cookies are optional and help us improve World Wide. Learn more.