Resources
Authors & Affiliations
Joanna Aloor, Oliver Gauld, Joseph Warren, Matthew Mower, Olga Mavromati, Chunyu A. Duan
Abstract
Stochastic strategies are often advantageous in competitive environments to prevent opponents from predicting one’s actions. To study how stochastic behaviours arise, we trained head-fixed mice to play a zero-sum game, Matching Pennies, in which the optimal behaviour against a rational agent is to choose between actions with equal probability. On each trial, mice chose between two available actions: lick left or right. A computer opponent predicted each animal’s upcoming choice by searching for statistical regularities in their choice and reward history. If mice remained unpredictable against the computer opponent, choices were rewarded. Over weeks of game play, mice transitioned from using highly predictable, biased choice patterns, to executing unpredictable stochastic decisions. As performance increased, choice patterns became more balanced with higher entropy, and the computer opponent’s predictions of the animal’s choices became less accurate and more uncertain. To probe how mice learned to optimise their behaviour in this competitive environment, we conducted mesoscale widefield calcium imaging of the dorsal cortex and videography during learning to compare cortical and pupil dynamics between distinct behavioural strategies, identified using unsupervised methods (GLM-HMM). We found that animals' strategies significantly modulated cortical activity across distinct task epochs, increasing preparatory (pre-choice) activity and decreasing outcome (post-reward) activity when choosing stochastically. Furthermore, both cortical and pupil dynamics reflected reduced reward history and reward expectation in the stochastic state. These findings provide novel insights into the neural basis of stochastic decision-making, revealing how decoupling future choice from past reward feedback may drive the emergence of optimal and unpredictable strategies in competitive environments.