ePoster

Using Markov Decision Processes to benchmark the performance of artificial and biological agents

Alexander Kazakov,Ana Polterovich,Maciej M. Jankowski,Johannes Niediek,Israel Nelken
COSYNE 2022(2022)
Lisbon, Portugal
Presented: Mar 17, 2022

Conference

COSYNE 2022

Lisbon, Portugal

Resources

Authors & Affiliations

Alexander Kazakov,Ana Polterovich,Maciej M. Jankowski,Johannes Niediek,Israel Nelken

Abstract

When an agent is trained on a complex episodic task, different task parts may be learned at different rates. How do we determine which part of the task challenged the agent the most? Since reward is provided usually only at the end of each trial, it cannot be used to infer within-trial learning trends. Behavioral features such as speed or trial duration capture trends in the agent's decision-making, but do not necessarily indicate that the agent is getting better at the task. We address this issue by modeling the task as a Markov Decision Process (MDP). The Q values of the optimal policy measure the quality of each and every action of the agent. We illustrate the use of two such measures, the well-established Optimality-Gap measure, and the Action-Rank – a new suggestion which is analytically shown to be less sensitive to the model’s hyper-parameters. We first validated this approach on synthetic data from a deep reinforcement learning agent (Deep Q-network, DQN), and then used it to analyze the behavior of a rat, where both DQN and rat were trained on the same operant task (sound discrimination in a large arena). We observed that (1) Rat behavior approached the optimal policy gradually throughout training; (2) most of the policy refinement occurred at a specific, short (<1s) segment of the trial; (3) the first trials of each day showed sub-optimal performance. These results illustrate the ability of optimality-based measures to quantify fine features of the learning process. Importantly, optimality-based measures may contribute to cross-disciplinary research on learning in both artificial and biological agents.

Unique ID: cosyne-22/using-markov-decision-processes-benchmark-413aa503