Topic: policy learning

Seminar: 4 seminars

Latest

Off-policy learning in the basal ganglia

May 3, 2023

I will discuss work with Jack Lindsey modeling reinforcement learning for action selection in the basal ganglia. I will argue that the presence of multiple brain regions, in addition to the basal ganglia, that contribute to motor control motivates the need for an off-policy basal ganglia learning algorithm. I will then describe a biological implementation of such an algorithm that predicts tuning of dopamine neurons to a quantity we call "action surprise," in addition to reward prediction error. In the same model, an implementation of learning from a motor efference copy also predicts a novel solution to the problem of multiplexing feedforward and efference-related striatal activity. The solution exploits the difference between D1 and D2-expressing medium spiny neurons and leads to predictions about striatal dynamics.

SeminarNeuroscienceRecording

Learning Relational Rules from Rewards

Guillermo Puebla

University of Bristol

Oct 13, 2022

Humans perceive the world in terms of objects and relations between them. In fact, for any given pair of objects, there is a myriad of relations that apply to them. How does the cognitive system learn which relations are useful to characterize the task at hand? And how can it use these representations to build a relational policy to interact effectively with the environment? In this paper we propose that this problem can be understood through the lens of a sub-field of symbolic machine learning called relational reinforcement learning (RRL). To demonstrate the potential of our approach, we build a simple model of relational policy learning based on a function approximator developed in RRL. We trained and tested our model in three Atari games that required to consider an increasingly number of potential relations: Breakout, Pong and Demon Attack. In each game, our model was able to select adequate relational representations and build a relational policy incrementally. We discuss the relationship between our model with models of relational and analogical reasoning, as well as its limitations and future directions of research.

SeminarNeuroscienceRecording

A role for dopamine in value-free learning

Luke Coddington

Dudman lab, HHMI Janelia

Jul 14, 2021

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioral policies and indirect learning via value functions. Policy learning and value learning employ distinct algorithms that depend upon evaluation of errors in performance and reward prediction errors, respectively. In mammals, behavioral learning and the role of mesolimbic dopamine signaling have been extensively evaluated with respect to reward prediction errors; but there has been little consideration of how direct policy learning might inform our understanding. I’ll discuss our recent work on classical conditioning in naïve mice (https://www.biorxiv.org/content/10.1101/2021.05.31.446464v1) that provides multiple lines of evidence that phasic dopamine signaling regulates policy learning from performance errors in addition to its well-known roles in value learning. This work points towards new opportunities for unraveling the mechanisms of basal ganglia control over behavior under both adaptive and maladaptive learning conditions.

SeminarNeuroscienceRecording

An inference perspective on meta-learning

Kate Rakelly

University of California Berkeley

Nov 26, 2020

While meta-learning algorithms are often viewed as algorithms that learn to learn, an alternative viewpoint frames meta-learning as inferring a hidden task variable from experience consisting of observations and rewards. From this perspective, learning to learn is learning to infer. This viewpoint can be useful in solving problems in meta-RL, which I’ll demonstrate through two examples: (1) enabling off-policy meta-learning, and (2) performing efficient meta-RL from image observations. I’ll also discuss how this perspective leads to an algorithm for few-shot image segmentation.

Add content

Have a seminar, talk, or paper on policy learning? Post it so others working in this area can find it.

Post content

Domain

See policy learning content within Neuroscience.

View domain