ePoster

USING DEEP REINFORCEMENT LEARNING TO REVEAL NEURAL REPRESENTATIONS OF EXPLORATION

Tamir Scherfand 2 co-authors

The Weizmann Institute of Science

FENS Forum 2026 (2026)
Barcelona, Spain
Board PS02-07PM-570

Presentation

Date TBA

Board: PS02-07PM-570

Poster preview

USING DEEP REINFORCEMENT LEARNING TO REVEAL NEURAL REPRESENTATIONS OF EXPLORATION poster preview

Event Information

Poster Board

PS02-07PM-570

Abstract

Exploring uncertain options in the hope of obtaining higher rewards versus exploiting a known, familiar choice is a core dilemma captured by the exploration-exploitation trade-off, which has been widely studied within the reinforcement learning (RL) framework. Here, we used deep RL-artificial neural networks trained with RL to characterize neural representations of exploration and exploitation in humans performing a dynamic three-armed bandit task during fMRI. Behaviorally, exploration was associated with low outcome values and high uncertainty, whereas exploitation showed the opposite pattern; this relationship was observed both in the model and in participants. At the neural level, the internal activations of the deep RL model represented exploration and exploitation states such that, during trials with low outcome value and high exploration, action representations were clustered together, whereas as value increased, action representations gradually separated into distinct clusters. These model-derived state representations explained the representational structure of value-related brain regions, including the dACC, insula, and vmPFC. Critically, only in the vmPFC did we observe a correspondence between behavior and neural representation, such that the slope of across-action neural separation as a function of value correlated with the slope of exploration as a function of value. These results suggest a neural mechanism in which clustered neural representations among choice possibilities promote exploratory behavior. Together, these findings reveal a structured neural representational organization underlying exploration–exploitation behavior and demonstrate how deep RL models can be used to probe the neural code supporting adaptive decision-making.

Recommended posters

Cookies

We use essential cookies to run the site. Analytics cookies are optional and help us improve World Wide. Learn more.