ePoster

Long-term consequences of actions affect human exploration in structured environments

Lior Fox,Ohad Dan,Gal Yarden,Yonatan Loewenstein
COSYNE 2022(2022)
Lisbon, Portugal

Conference

COSYNE 2022

Lisbon, Portugal

Resources

Authors & Affiliations

Lior Fox,Ohad Dan,Gal Yarden,Yonatan Loewenstein

Abstract

Exploration is an essential part of learning, and the question of how to achieve efficient exploration has been extensively studied in the field of Reinforcement Learning (RL). Recent works have demonstrated -- both in theory and in practice -- the importance and effectiveness of sophisticated methods of exploration which are sensitive to the long-term exploratory consequences of actions, and to the global structures of the environment. This is analogous to the standard concept of value in RL. How "good" an action is depends not only on the immediate reward gained by taking it, but also on the expected future rewards gained by taking it. Similarly, how "good" an action is for exploration also depends on whether it leads to future states from which new knowledge can be gained. Whether and how humans implement these computational principles in their exploratory behavior is largely unknown. The standard paradigm that is used to study exploration in humans, the Multi-Armed Bandit, cannot address these questions because it is characterized by a single state, and as such does not entail long-term consequences of actions. Ultimately, the exploration "utility" of an action in a Bandit problem can be expressed in a local quantity such as the number of times it has been previously visited. Based on recent RL algorithms for exploration in complex environments we developed a novel experimental task involving a multi-state structured environment. This task allowed us to test predictions of the models by parametrically changing some properties of the environment. We found that human participants take into the account long-term consequences of actions when making exploratory choices, and are sensitive not only to local, immediate exploration gains, but also to the global underlying structure of the environment.

Unique ID: cosyne-22/longterm-consequences-actions-affect-03d21444