← Back

Reinforcement Learning

Topic spotlight
TopicWorld Wide

reinforcement learning

Discover seminars, jobs, and research tagged with reinforcement learning across World Wide.
118 curated items53 Seminars40 ePosters25 Positions
Updated about 23 hours ago
118 items · reinforcement learning
118 results
Position

Gatsby Computational Neuroscience Unit

Gatsby Computational Neuroscience Unit, UCL
London, UK
Dec 5, 2025

4-Year PhD Programme in Theoretical Neuroscience and Machine Learning Call for Applications! Deadline: 13 November 2022 The Gatsby Computational Neuroscience Unit is a leading research centre focused on theoretical neuroscience and machine learning. We study (un)supervised and reinforcement learning; inference, coding and neural dynamics; Bayesian and kernel methods; deep learning; with applications to the analysis of perceptual processing and cognition, neural data, signal and image processing, machine vision, network data and nonparametric hypothesis testing. The unit provides a unique opportunity for a critical mass of theoreticians to interact closely with one another and with researchers at the Sainsbury Wellcome Centre for Neural Circuits and Behaviour (SWC), the Centre for Computational Statistics and Machine Learning (CSML) and related UCL departments such as Computer Science; Statistical Science; Artificial Intelligence; the ELLIS Unit at UCL; Neuroscience; and the nearby Alan Turing and Francis Crick Institutes. Our PhD programme provides a rigorous preparation for a research career. Students complete a 4-year PhD in either machine learning or theoretical and computational neuroscience, with minor emphasis in the complementary field. Courses in the first year provide a comprehensive introduction to both fields and systems neuroscience. Students are encouraged to work and interact closely with SWC/CSML researchers to take advantage of this uniquely multidisciplinary research environment. Full funding is available regardless of nationality. The unit also welcomes applicants who have secured or are seeking funding from other sources. To apply, please visit www.ucl.ac.uk/gatsby/study-and-work/phd-programme

Position

Dr. Tatsuo Okubo

Chinese Institute for Brain Research, Beijing
Beijing, China
Dec 5, 2025

We are a new group at the Chinese Institute for Brain Research (CIBR), Beijing, which focuses on using modern data science and machine learning tools on neuroscience data. We collaborate with various labs within CIBR to develop models and analysis pipelines to accelerate neuroscience research. We are looking for enthusiastic and talented machine learning engineers and data scientists to join this effort. Example projects include (but not limited to) extracting hidden states from population neural activity, automating behavioral classification from videos, and segmenting neurons from confocal images using deep learning.

Position

Dr Flavia Mancini

Computational and Biological Learning, Department of Engineering, University of Cambridge
Cambridge, UK
Dec 5, 2025

This is an opportunity for a highly creative and skilled pre-doctoral Research Assistant to join the dynamic and multidisciplinary research environment of the Computational and Biological Learning research group (https://www.cbl-cambridge.org/), Department of Engineering, University of Cambridge. We are looking for a Research Assistant to work on projects related to statistical learning and contextual inference in the human brain. We have a particular focus of learning of aversive states, as this has a strong clinical significance for chronic pain and mental health disorders. The RA will be supervised by Dr Flavia Mancini (MRC Career Development fellow, and Head of the Nox Lab www.noxlab.org), and is expected to collaborate with theoretical and experimental colleagues in Cambridge, Oxford and abroad. The post holder will be located in central Cambridge, Cambridgeshire, UK. As a general approach, we combine statistical learning tasks in humans, computational modelling (using Bayesian inference, reinforcement learning, deep learning and neural networks) with neuroimaging methods (including 7T fMRI). The successful candidate will strengthen this approach and be responsible for designing experiments, collecting and analysis behavioural and brain fMRI data using computational modelling techniques. The key responsibilities and duties are: Ideating and conducting research studies on statistical/aversive learning, combining behavioural tasks, computational modelling (using Bayesian inference, reinforcement learning, deep learning and/or neural networks) with fMRI in healthy volunteers and chronic pain patients. Disseminating research findings Maintaining and developing technical skills to expand their scientific potential ******* More info and to apply: https://www.jobs.cam.ac.uk/job/35905/

Position

Erik C. Johnson

Johns Hopkins University Applied Physics Laboratory
Laurel, MD, USA
Dec 5, 2025

The Intelligent Systems Center at JHU/APL is an interdisciplinary research center for neuroscientists, AI researchers, and roboticists. Please see the individual listings for specific postings and application instructions. Postings for Neuroscience-Inspired AI researchers and Computational Neuroscience researchers may also be posted soon. https://prdtss.jhuapl.edu/jobs/senior-neural-decoding-researcher-2219 https://prdtss.jhuapl.edu/jobs/senior-reinforcement-learning-researcher-615 https://prdtss.jhuapl.edu/jobs/senior-computer-vision-researcher-2242 https://prdtss.jhuapl.edu/jobs/artificial-intelligence-software-developer-2255

Position

Francisco Pereira

Machine Learning Team, National Institute of Mental Health
Bethesda, Maryland, United States of America
Dec 5, 2025

The Machine Learning Team at the National Institute of Mental Health (NIMH) in Bethesda, MD, has an open position for a machine learning research scientist. The NIMH is the leading federal agency for research on mental disorders and neuroscience, and part of the National Institutes of Health (NIH). Our mission is to help NIMH scientists use machine learning methods to address a diverse set of research problems in clinical and cognitive psychology and neuroscience. These range from identifying biomarkers for aiding diagnoses to creating and testing models of mental processes in healthy subjects. Our overarching goal is to use machine learning to improve every aspect of the scientific effort, from helping discover or develop theories to generating actionable results. For more information, please refer to the full ad https://nih-fmrif.github.io/ml/index.html

PositionComputer Science

Prof. Dr.-Ing. Marcus Magnor

Technische Universität Braunschweig
Technische Universität Braunschweig, Germany
Dec 5, 2025

The job is a W3 Full Professorship for Artificial Intelligence in interactive Systems at Technische Universität Braunschweig. The role involves expanding the research area of data-driven methods for interactive and intelligent systems at the TU Braunschweig and strengthening the focal points 'Data Science' and 'Reliability' of the Department of Computer Science. The position holder is expected to have a strong background in Computer Science with a focus on Artificial Intelligence/​Machine Learning, specifically in the areas of Dependable AI and Explainable AI. The role also involves teaching, topic-related courses in the areas of Artificial Intelligence and Machine Learning to complement the Bachelor's and Master's degree programs of the Department of Computer Science.

Position

Jun.-Prof. Dr.-Ing. Rania Rayyes

Karlsruhe Institute of Technology (KIT), Institut für Fördertechnik und Logistiksysteme (IFL), InnovationsCampus Mobilität der Zukunft (ICM)
Karlsruhe Institute of Technology (KIT), Gebäude 50.38, Gotthard-Franz-Straße 8, 76131 Karlsruhe
Dec 5, 2025

The main focus of this position is to develop novel AI systems and methods for robot applications: Dexterous robot grasping, Human-robot learning, Transfer learning – efficient online learning. The role offers close cooperation with other institutes, universities, and numerous industrial partners, a self-determined development environment for own research topics with active support for the doctorate research project, flexible working hours, and work in a young, interdisciplinary research team.

Position

N/A

University of Manchester
University of Manchester
Dec 5, 2025

1) Lecturer/Senior Lecturer (Assoc/Asst Prof) in Machine Learning: The University of Manchester is making a strategic investment in fundamentals of AI, to complement its existing strengths in AI applications across several prominent research fields in the University. Applications are welcome in any area of the fundamentals of machine learning, in particular probabilistic modelling, deep learning, reinforcement learning, causal modelling, human-in-the-loop ML, explainable AI, ethics, privacy and security. This position is meant to contribute to machine learning methodologies and not purely to their applications. You will be located in the Department of Computer Science and, in addition to the new centre for Fundamental AI research, you will belong to a large community of machine learning, data science and AI researchers. 2) Programme Manager – Centre for AI Fundamentals: The University of Manchester is seeking to appoint an individual with a strategic mindset and a track record of building and leading collaborative relationships and professional networks, expertise in a domain ideally related to artificial intelligence, excellent communication and interpersonal skills, experience in managing high-performing teams, and demonstrable ability to support the preparation of large, complex grant proposals to take up the role of Programme Manager for the Centre for AI Fundamentals. The successful candidate will play a major role in developing and shaping the Centre, working closely with its Director to grow the Centre and plan and deliver an exciting programme of activities, including leading key science translational activity and development of use cases in the Centre’s key domains, partnership development, bid writing, resource management, impact and public engagement strategies.

Position

Prof Zoe Kourtzi

Adaptive Brain Lab, University of Cambridge
University of Cambridge
Dec 5, 2025

Post-doctoral position in Cognitive Computational Neuroscience at the Adaptive Brain Lab. The role involves combining high field brain imaging (7T fMRI, MR Spectroscopy), electrophysiology (EEG), computational modelling (machine learning, reinforcement learning) and interventions (TMS, tDCS, pharmacology) to understand network dynamics for learning and brain plasticity. The research programme bridges work across scales (local circuits, global networks) and species (humans, rodents) to uncover the neurocomputations that support learning and brain plasticity.

Position

Quentin Huys

UCL
UCL
Dec 5, 2025

The aim of the position is to establish a thorough computational modelling framework for cognitive research in mental health settings, focusing in particular on longitudinal data, such as changes due to treatments or longitudinally with development.

Position

Cassio de Campos

TU Eindhoven
TU Eindhoven, The Netherlands
Dec 5, 2025

We are looking for a highly motivated and skilled PhD candidate to work in the area of Reinforcement Learning (broadly speaking) in the Uncertainty in AI group of TU Eindhoven, The Netherlands. It is a full-time formal job and salaries are competitive. TU Eindhoven is an English-language university.

Position

N/A

Donders Centre for Cognition, Donders Institute for Brain, Cognition and Behaviour, School of Artificial Intelligence at Radboud University Nijmegen
Radboud University Nijmegen
Dec 5, 2025

The AI Department of the Donders Centre for Cognition (DCC), embedded in the Donders Institute for Brain, Cognition and Behaviour, and the School of Artificial Intelligence at Radboud University Nijmegen are looking for a researcher in reinforcement learning with an emphasis on safety and robustness, an interest in natural computing as well as in applications in neurotechnology and other domains such as robotics, healthcare and/or sustainability. You will be expected to perform top-quality research in (deep) reinforcement learning, actively contribute to the DBI2 consortium, interact and collaborate with other researchers and specialists in academia and/or industry, and be an inspiring member of our staff with excellent communication skills. You are also expected to engage with students through teaching and master projects not exceeding 20% of your time.

Position

Samuel Kaski

University of Manchester and Aalto University
Manchester, UK
Dec 5, 2025

The University of Manchester is making a strategic investment in fundamentals of AI, to complement its existing strengths in AI applications across several prominent research fields in the University, which give high-profile application and collaboration opportunities for the outcomes of fundamental AI research. The university is one of the most active partners of the national Alan Turing Institute, hosts 33 Turing Fellows and Fellows of the European Laboratory of Learning and Intelligent Systems ELLIS, in the new ELLIS Unit Manchester. The university’s ambition is to establish a leading AI centre at the cross section of these opportunities. The university has recently launched a Centre for AI Fundamentals and has already recruited four new academics to it. These two lectureships continue this series of positions in establishing the new Centre.

Position

N/A

Aalto Probabilistic Machine Learning Group
Helsinki, Finland
Dec 5, 2025

We are hiring in Helsinki Finland, to my research group and in the Finnish Centre for Artificial Intelligence AI and ELLIS Unit Helsinki. These are two separate calls - you can apply in one or both calls: 1. My research group, probabilistic machine learning. 2. Finnish Center for Artificial Intelligence and ELLIS Unit Helsinki. The positions can include a suitable combination of theoretical and methodological work, and applications such as drug design, synthetic biology, economics, neuroimaging.

Position

N/A

University of Neuchatel
Neuchatel, Switzerland
Dec 5, 2025

This project is about developing reinforcement-learning based AI systems that directly interact with some segment of society. The applications include matching and other allocation problems. The research will be performed at the interface between reinforcement learning, social choice theory, Bayesian inference, mechanism design, differential privacy and algorithmic fairness. The research will have both a theoretical and practical component, which will include some experiments with humans. However, a good theoretical background in probability, machine learning or game theory is necessary for all students. The positions are available from January 2024. The PhD lasts for 4 years and includes a small teaching component.

Position

N/A

University of Neuchatel
Neuchatel, Switzerland
Dec 5, 2025

The project is about developing reinforcement-learning based AI systems that directly interact with some segment of society. The applications include matching and other allocation problems. The research will be performed at the interface between reinforcement learning, social choice theory, Bayesian inference, mechanism design, differential privacy and algorithmic fairness. The research will have both a theoretical and practical component, which will include some experiments with humans.

SeminarNeuroscience

Understanding reward-guided learning using large-scale datasets

Kim Stachenfeld
DeepMind, Columbia U
Jul 8, 2025

Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.

SeminarNeuroscience

Understanding reward-guided learning using large-scale datasets

Kim Stachenfeld
DeepMind, Columbia U
May 13, 2025

Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.

SeminarNeuroscience

Screen Savers : Protecting adolescent mental health in a digital world

Amy Orben
University of Cambridge UK
Dec 2, 2024

In our rapidly evolving digital world, there is increasing concern about the impact of digital technologies and social media on the mental health of young people. Policymakers and the public are nervous. Psychologists are facing mounting pressures to deliver evidence that can inform policies and practices to safeguard both young people and society at large. However, research progress is slow while technological change is accelerating.My talk will reflect on this, both as a question of psychological science and metascience. Digital companies have designed highly popular environments that differ in important ways from traditional offline spaces. By revisiting the foundations of psychology (e.g. development and cognition) and considering digital changes' impact on theories and findings, we gain deeper insights into questions such as the following. (1) How do digital environments exacerbate developmental vulnerabilities that predispose young people to mental health conditions? (2) How do digital designs interact with cognitive and learning processes, formalised through computational approaches such as reinforcement learning or Bayesian modelling?However, we also need to face deeper questions about what it means to do science about new technologies and the challenge of keeping pace with technological advancements. Therefore, I discuss the concept of ‘fast science’, where, during crises, scientists might lower their standards of evidence to come to conclusions quicker. Might psychologists want to take this approach in the face of technological change and looming concerns? The talk concludes with a discussion of such strategies for 21st-century psychology research in the era of digitalization.

SeminarNeuroscience

Decision and Behavior

Sam Gershman, Jonathan Pillow, Kenji Doya
Harvard University; Princeton University; Okinawa Institute of Science and Technology
Nov 28, 2024

This webinar addressed computational perspectives on how animals and humans make decisions, spanning normative, descriptive, and mechanistic models. Sam Gershman (Harvard) presented a capacity-limited reinforcement learning framework in which policies are compressed under an information bottleneck constraint. This approach predicts pervasive perseveration, stimulus‐independent “default” actions, and trade-offs between complexity and reward. Such policy compression reconciles observed action stochasticity and response time patterns with an optimal balance between learning capacity and performance. Jonathan Pillow (Princeton) discussed flexible descriptive models for tracking time-varying policies in animals. He introduced dynamic Generalized Linear Models (Sidetrack) and hidden Markov models (GLM-HMMs) that capture day-to-day and trial-to-trial fluctuations in choice behavior, including abrupt switches between “engaged” and “disengaged” states. These models provide new insights into how animals’ strategies evolve under learning. Finally, Kenji Doya (OIST) highlighted the importance of unifying reinforcement learning with Bayesian inference, exploring how cortical-basal ganglia networks might implement model-based and model-free strategies. He also described Japan’s Brain/MINDS 2.0 and Digital Brain initiatives, aiming to integrate multimodal data and computational principles into cohesive “digital brains.”

SeminarNeuroscience

Contribution of computational models of reinforcement learning to neurosciences/ computational modeling, reward, learning, decision-making, conditioning, navigation, dopamine, basal ganglia, prefrontal cortex, hippocampus

Khamasi Mehdi
Centre National de la Recherche Scientifique / Sorbonne University
Nov 7, 2024
SeminarNeuroscience

Maintaining Plasticity in Neural Networks

Clare Lyle
DeepMind
Mar 12, 2024

Nonstationarity presents a variety of challenges for machine learning systems. One surprising pathology which can arise in nonstationary learning problems is plasticity loss, whereby making progress on new learning objectives becomes more difficult as training progresses. Networks which are unable to adapt in response to changes in their environment experience plateaus or even declines in performance in highly non-stationary domains such as reinforcement learning, where the learner must quickly adapt to new information even after hundreds of millions of optimization steps. The loss of plasticity manifests in a cluster of related empirical phenomena which have been identified by a number of recent works, including the primacy bias, implicit under-parameterization, rank collapse, and capacity loss. While this phenomenon is widely observed, it is still not fully understood. This talk will present exciting recent results which shed light on the mechanisms driving the loss of plasticity in a variety of learning problems and survey methods to maintain network plasticity in non-stationary tasks, with a particular focus on deep reinforcement learning.

SeminarNeuroscience

A recurrent network model of planning predicts hippocampal replay and human behavior

Marcelo Mattar
NYU
Oct 19, 2023

When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as `rollouts'. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by -- and in turn adaptively affect -- prefrontal dynamics.

SeminarNeuroscience

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Harel Shouval
The University of Texas at Houston
Jun 13, 2023

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.

SeminarNeuroscience

A recurrent network model of planning explains hippocampal replay and human behavior

Guillaume Hennequin
University of Cambridge, UK
May 30, 2023

When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as 'rollouts'. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by - and in turn adaptively affect - prefrontal dynamics.

SeminarNeuroscience

Richly structured reward predictions in dopaminergic learning circuits

Angela J. Langdon
National Institute of Mental Health at National Institutes of Health (NIH)
May 16, 2023

Theories from reinforcement learning have been highly influential for interpreting neural activity in the biological circuits critical for animal and human learning. Central among these is the identification of phasic activity in dopamine neurons as a reward prediction error signal that drives learning in basal ganglia and prefrontal circuits. However, recent findings suggest that dopaminergic prediction error signals have access to complex, structured reward predictions and are sensitive to more properties of outcomes than learning theories with simple scalar value predictions might suggest. Here, I will present recent work in which we probed the identity-specific structure of reward prediction errors in an odor-guided choice task and found evidence for multiple predictive “threads” that segregate reward predictions, and reward prediction errors, according to the specific sensory features of anticipated outcomes. Our results point to an expanded class of neural reinforcement learning algorithms in which biological agents learn rich associative structure from their environment and leverage it to build reward predictions that include information about the specific, and perhaps idiosyncratic, features of available outcomes, using these to guide behavior in even quite simple reward learning tasks.

SeminarNeuroscience

Off-policy learning in the basal ganglia

Ashok Litwin-Kumar
Columbia University, New York
May 2, 2023

I will discuss work with Jack Lindsey modeling reinforcement learning for action selection in the basal ganglia. I will argue that the presence of multiple brain regions, in addition to the basal ganglia, that contribute to motor control motivates the need for an off-policy basal ganglia learning algorithm. I will then describe a biological implementation of such an algorithm that predicts tuning of dopamine neurons to a quantity we call "action surprise," in addition to reward prediction error. In the same model, an implementation of learning from a motor efference copy also predicts a novel solution to the problem of multiplexing feedforward and efference-related striatal activity. The solution exploits the difference between D1 and D2-expressing medium spiny neurons and leads to predictions about striatal dynamics.

SeminarCognition

Beyond Volition

Patrick Haggard
University College London
Apr 26, 2023

Voluntary actions are actions that agents choose to make. Volition is the set of cognitive processes that implement such choice and initiation. These processes are often held essential to modern societies, because they form the cognitive underpinning for concepts of individual autonomy and individual responsibility. Nevertheless, psychology and neuroscience have struggled to define volition, and have also struggled to study it scientifically. Laboratory experiments on volition, such as those of Libet, have been criticised, often rather naively, as focussing exclusively on meaningless actions, and ignoring the factors that make voluntary action important in the wider world. In this talk, I will first review these criticisms, and then look at extending scientific approaches to volition in three directions that may enrich scientific understanding of volition. First, volition becomes particularly important when the range of possible actions is large and unconstrained - yet most experimental paradigms involve minimal response spaces. We have developed a novel paradigm for eliciting de novo actions through verbal fluency, and used this to estimate the elusive conscious experience of generativity. Second, volition can be viewed as a mechanism for flexibility, by promoting adaptation of behavioural biases. This view departs from the tradition of defining volition by contrasting internally-generated actions with externally-triggered actions, and instead links volition to model-based reinforcement learning. By using the context of competitive games to re-operationalise the classic Libet experiment, we identified a form of adaptive autonomy that allows agents to reduce biases in their action choices. Interestingly, this mechanism seems not to require explicit understanding and strategic use of action selection rules, in contrast to classical ideas about the relation between volition and conscious, rational thought. Third, I will consider volition teleologically, as a mechanism for achieving counterfactual goals through complex problem-solving. This perspective gives a key role in mediating between understanding and planning on the one hand, and instrumental action on the other hand. Taken together, these three cognitive phenomena of generativity, flexibility, and teleology may partly explain why volition is such an important cognitive function for organisation of human behaviour and human flourishing. I will end by discussing how this enriched view of volition can relate to individual autonomy and responsibility.

SeminarNeuroscienceRecording

Memory-enriched computation and learning in spiking neural networks through Hebbian plasticity

Thomas Limbacher
TU Graz
Nov 8, 2022

Memory is a key component of biological neural systems that enables the retention of information over a huge range of temporal scales, ranging from hundreds of milliseconds up to years. While Hebbian plasticity is believed to play a pivotal role in biological memory, it has so far been analyzed mostly in the context of pattern completion and unsupervised learning. Here, we propose that Hebbian plasticity is fundamental for computations in biological neural systems. We introduce a novel spiking neural network (SNN) architecture that is enriched by Hebbian synaptic plasticity. We experimentally show that our memory-equipped SNN model outperforms state-of-the-art deep learning mechanisms in a sequential pattern-memorization task, as well as demonstrate superior out-of-distribution generalization capabilities compared to these models. We further show that our model can be successfully applied to one-shot learning and classification of handwritten characters, improving over the state-of-the-art SNN model. We also demonstrate the capability of our model to learn associations for audio to image synthesis from spoken and handwritten digits. Our SNN model further presents a novel solution to a variety of cognitive question answering tasks from a standard benchmark, achieving comparable performance to both memory-augmented ANN and SNN-based state-of-the-art solutions to this problem. Finally we demonstrate that our model is able to learn from rewards on an episodic reinforcement learning task and attain near-optimal strategy on a memory-based card game. Hence, our results show that Hebbian enrichment renders spiking neural networks surprisingly versatile in terms of their computational as well as learning capabilities. Since local Hebbian plasticity can easily be implemented in neuromorphic hardware, this also suggests that powerful cognitive neuromorphic systems can be build based on this principle.

SeminarNeuroscienceRecording

Learning Relational Rules from Rewards

Guillermo Puebla
University of Bristol
Oct 12, 2022

Humans perceive the world in terms of objects and relations between them. In fact, for any given pair of objects, there is a myriad of relations that apply to them. How does the cognitive system learn which relations are useful to characterize the task at hand? And how can it use these representations to build a relational policy to interact effectively with the environment? In this paper we propose that this problem can be understood through the lens of a sub-field of symbolic machine learning called relational reinforcement learning (RRL). To demonstrate the potential of our approach, we build a simple model of relational policy learning based on a function approximator developed in RRL. We trained and tested our model in three Atari games that required to consider an increasingly number of potential relations: Breakout, Pong and Demon Attack. In each game, our model was able to select adequate relational representations and build a relational policy incrementally. We discuss the relationship between our model with models of relational and analogical reasoning, as well as its limitations and future directions of research.

SeminarNeuroscienceRecording

Learning in/about/from the basal ganglia

Jonathan Rubin
University of Pittsburgh
May 24, 2022

The basal ganglia are a collection of brain areas that are connected by a variety of synaptic pathways and are a site of significant reward-related dopamine release. These properties suggest a possible role for the basal ganglia in action selection, guided by reinforcement learning. In this talk, I will discuss a framework for how this function might be performed and computational results using an upward mapping to identify putative low-dimensional control ensembles that may be involved in tuning decision policy. I will also present some recent experimental results and theory – related to effects of extracellular ion dynamics -- that run counter to the classical view of basal ganglia pathways and suggest a new interpretation of certain aspects of this framework. For those not so interested in the basal ganglia, I hope that the upward mapping approach and impact of extracellular ion dynamics will nonetheless be of interest!

SeminarNeuroscience

Dissecting the role of accumbal D1 and D2 medium spiny neurons in information encoding

Munir Gunes Kutlu
Calipari Lab, Vanderbilt University
Feb 8, 2022

Nearly all motivated behaviors require the ability to associate outcomes with specific actions and make adaptive decisions about future behavior. The nucleus accumbens (NAc) is integrally involved in these processes. The NAc is a heterogeneous population primarily composed of D1 and D2 medium spiny projection (MSN) neurons that are thought to have opposed roles in behavior, with D1 MSNs promoting reward and D2 MSNs promoting aversion. Here we examined what types of information are encoded by the D1 and D2 MSNs using optogenetics, fiber photometry, and cellular resolution calcium imaging. First, we showed that mice responded for optical self-stimulation of both cell types, suggesting D2-MSN activation is not inherently aversive. Next, we recorded population and single cell activity patterns of D1 and D2 MSNs during reinforcement as well as Pavlovian learning paradigms that allow dissociation of stimulus value, outcome, cue learning, and action. We demonstrated that D1 MSNs respond to the presence and intensity of unconditioned stimuli – regardless of value. Conversely, D2 MSNs responded to the prediction of these outcomes during specific cues. Overall, these results provide foundational evidence for the discrete aspects of information that are encoded within the NAc D1 and D2 MSN populations. These results will significantly enhance our understanding of the involvement of the NAc MSNs in learning and memory as well as how these neurons contribute to the development and maintenance of substance use disorders.

SeminarNeuroscienceRecording

NaV Long-term Inactivation Regulates Adaptation in Place Cells and Depolarization Block in Dopamine Neurons

Carmen Canavier
LSU Health Sciences Center, New Orleans
Feb 8, 2022

In behaving rodents, CA1 pyramidal neurons receive spatially-tuned depolarizing synaptic input while traversing a specific location within an environment called its place. Midbrain dopamine neurons participate in reinforcement learning, and bursts of action potentials riding a depolarizing wave of synaptic input signal rewards and reward expectation. Interestingly, slice electrophysiology in vitro shows that both types of cells exhibit a pronounced reduction in firing rate (adaptation) and even cessation of firing during sustained depolarization. We included a five state Markov model of NaV1.6 (for CA1) and NaV1.2 (for dopamine neurons) respectively, in computational models of these two types of neurons. Our simulations suggest that long-term inactivation of this channel is responsible for the adaptation in CA1 pyramidal neurons, in response to triangular depolarizing current ramps. We also show that the differential contribution of slow inactivation in two subpopulations of midbrain dopamine neurons can account for their different dynamic ranges, as assessed by their responses to similar depolarizing ramps. These results suggest long-term inactivation of the sodium channel is a general mechanism for adaptation.

SeminarNeuroscienceRecording

NMC4 Short Talk: What can deep reinforcement learning tell us about human motor learning and vice-versa ?

Michele Garibbo
University of Bristol
Nov 30, 2021

In the deep reinforcement learning (RL) community, motor control problems are usually approached from a reward-based learning perspective. However, humans are often believed to learn motor control through directed error-based learning. Within this learning setting, the control system is assumed to have access to exact error signals and their gradients with respect to the control signal. This is unlike reward-based learning, in which errors are assumed to be unsigned, encoding relative successes and failures. Here, we try to understand the relation between these two approaches, reward- and error- based learning, and ballistic arm reaches. To do so, we test canonical (deep) RL algorithms on a well-known sensorimotor perturbation in neuroscience: mirror-reversal of visual feedback during arm reaching. This test leads us to propose a potentially novel RL algorithm, denoted as model-based deterministic policy gradient (MB-DPG). This RL algorithm draws inspiration from error-based learning to qualitatively reproduce human reaching performance under mirror-reversal. Next, we show MB-DPG outperforms the other canonical (deep) RL algorithms on a single- and a multi- target ballistic reaching task, based on a biomechanical model of the human arm. Finally, we propose MB-DPG may provide an efficient computational framework to help explain error-based learning in neuroscience.

SeminarNeuroscience

Reinforcement Learning

Pater Dayan & Jonathan Rubin
Max Planck Institute for Biological Cybernetics resp. University of Pittsburgh
Nov 18, 2021
SeminarMachine LearningRecording

Playing StarCraft and saving the world using multi-agent reinforcement learning!

InstaDeep
Oct 28, 2021

This is my C-14 Impaler gauss rifle! There are many like it, but this one is mine!" - A terran marine If you have never heard of a terran marine before, then you have probably missed out on playing the very engaging and entertaining strategy computer game, StarCraft. However, don’t despair, because what we have in store might be even more exciting! In this interactive session, we will take you through, step-by-step, on how to train a team of terran marines to defeat a team of marines controlled by the built-in game AI in StarCraft II. How will we achieve this? Using multi-agent reinforcement learning (MARL). MARL is a useful framework for building distributed intelligent systems. In MARL, multiple agents are trained to act as individual decision-makers of some larger system, while learning to work as a team. We will show you how to use Mava (https://github.com/instadeepai/Mava), a newly released research framework for MARL to build a multi-agent learning system for StarCraft II. We will provide the necessary guidance, tools and background to understand the key concepts behind MARL, how to use Mava building blocks to build systems and how to train a system from scratch. We will conclude the session by briefly sharing various exciting real-world application areas for MARL at InstaDeep, such as large-scale autonomous train navigation and circuit board routing. These are problems that become exponentially more difficult to solve as they scale. Finally, we will argue that many of humanity’s most important practical problems are reminiscent of the ones just described. These include, for example, the need for sustainable management of distributed resources under the pressures of climate change, or efficient inventory control and supply routing in critical distribution networks, or robotic teams for rescue missions and exploration. We believe MARL has enormous potential to be applied in these areas and we hope to inspire you to get excited and interested in MARL and perhaps one day contribute to the field!

SeminarNeuroscienceRecording

Network dynamics in the basal ganglia and possible implications for Parkinson’s disease

Jonathan Rubin
University of Pittsburgh
Oct 13, 2021

The basal ganglia are a collection of brain areas that are connected by a variety of synaptic pathways and are a site of significant reward-related dopamine release. These properties suggest a possible role for the basal ganglia in action selection, guided by reinforcement learning. In this talk, I will discuss a framework for how this function might be performed. I will also present some recent experimental results and theory that call for a re-evaluation of certain aspects of this framework. Next, I will turn to the changes in basal ganglia activity observed to occur with the dopamine depletion associated with Parkinson’s disease. I will discuss some of the potential functional implications of some of these changes and, if time permits, will conclude with some new results that focus on delta oscillations under dopamine depletion.

SeminarNeuroscienceRecording

Higher cognitive resources for efficient learning

Aurelio Cortese
ATR
Jun 17, 2021

A central issue in reinforcement learning (RL) is the ‘curse-of-dimensionality’, arising when the degrees-of-freedom are much larger than the number of training samples. In such circumstances, the learning process becomes too slow to be plausible. In the brain, higher cognitive functions (such as abstraction or metacognition) may be part of the solution by generating low dimensional representations on which RL can operate. In this talk I will discuss a series of studies in which we used functional magnetic resonance imaging (fMRI) and computational modeling to investigate the neuro-computational basis of efficient RL. We found that people can learn remarkably complex task structures non-consciously, but also that - intriguingly - metacognition appears tightly coupled to this learning ability. Furthermore, when people use an explicit (conscious) policy to select relevant information, learning is accelerated by abstractions. At the neural level, prefrontal cortex subregions are differentially involved in separate aspects of learning: dorsolateral prefrontal cortex pairs with metacognitive processes, while ventromedial prefrontal cortex with valuation and abstraction. I will discuss the implications of these findings, in particular new questions on the function of metacognition in adaptive behavior and the link with abstraction.

SeminarNeuroscience

From function to cognition: New spectroscopic tools for studying brain neurochemistry in-vivo

Assaf Tal
Weizmann Institute
Apr 21, 2021

In this seminar, I will present new methods in magnetic resonance spectroscopy (MRS) we’ve been working on in the lab. The talk will be divided into two parts. In the first, I will talk about neurochemical changes we observe in glutamate and GABA during various paradigms, including simple motors tasks and reinforcement learning. In the second part, I’ll present a new approach to MRS that focuses on measuring the relaxation times (T1, T2) of metabolites, which reflect changes to specific cellular microenvironments. I will explain why these can be exciting markers for studying several in-vivo pathologies, and also present some preliminary data from a cohort of mild cognitive impairment (MCI) patients, showing changes that correlate to cognitive decline.

SeminarNeuroscienceRecording

Choice engineering and the modeling of operant learning

Yonatan Loewenstein
The Hebrew University
Apr 6, 2021

Organisms modify their behavior in response to its consequences, a phenomenon referred to as operant learning. Contemporary modeling of this learning behavior is based on reinforcement learning algorithms. I will discuss some of the challenges that these models face, and proposed a new approach to model-selection that is based on testing their ability to engineer behavior. Finally, I will present the results of The Choice Engineering Competition – an academic competition that compared the efficacies of qualitative and quantitative models of operant learning in shaping behavior.

SeminarNeuroscienceRecording

Peril, Prudence and Planning as Risk, Avoidance and Worry

Peter Dayan
University of Tübingen
Mar 31, 2021

Risk occupies a central role in both the theory and practice of decision-making. Although it is deeply implicated in many conditions involving dysfunctional behavior and thought, modern theoretical approaches to understanding and mitigating risk in either one-shot or sequential settings, which are derived largely from finance and economics, have yet to permeate fully the fields of neural reinforcement learning and computational psychiatry. I will discuss the use of dynamic and static versions of one prominent approach, namely conditional value-at-risk, to examine both the nature of risk avoidant choices, encompassing such things as justified gambler's fallacies, and the optimal planning that can lead to consideration of such choices, with implications for offline, ruminative, thinking.

SeminarNeuroscience

Navigation Turing Test: Toward Human-like RL

Ida Momennejad
Microsoft Research NYC
Mar 25, 2021

tbc

SeminarNeuroscience

A machine learning way to analyse white matter tractography streamlines / Application of artificial intelligence in correcting motion artifacts and reducing scan time in MRI

Dr Shenjun Zhong and Dr Kamlesh Pawar
Monash Biomedical Imaging
Mar 10, 2021

1. Embedding is all you need: A machine learning way to analyse white matter tractography streamlines - Dr Shenjun Zhong, Monash Biomedical Imaging Embedding white matter streamlines with various lengths into fixed-length latent vectors enables users to analyse them with general data mining techniques. However, finding a good embedding schema is still a challenging task as the existing methods based on spatial coordinates rely on manually engineered features, and/or labelled dataset. In this webinar, Dr Shenjun Zhong will discuss his novel deep learning model that identifies latent space and solves the problem of streamline clustering without needing labelled data. Dr Zhong is a Research Fellow and Informatics Officer at Monash Biomedical Imaging. His research interests are sequence modelling, reinforcement learning and federated learning in the general medical imaging domain. 2. Application of artificial intelligence in correcting motion artifacts and reducing scan time in MRI - Dr Kamlesh Pawar, Monash Biomedical imaging Magnetic Resonance Imaging (MRI) is a widely used imaging modality in clinics and research. Although MRI is useful it comes with an overhead of longer scan time compared to other medical imaging modalities. The longer scan times also make patients uncomfortable and even subtle movements during the scan may result in severe motion artifact in the images. In this seminar, Dr Kamlesh Pawar will discuss how artificial intelligence techniques can reduce scan time and correct motion artifacts. Dr Pawar is a Research Fellow at Monash Biomedical Imaging. His research interest includes deep learning, MR physics, MR image reconstruction and computer vision.

SeminarNeuroscience

Uncertainty in learning and decision making

Maarten Speekenbrink
UCL
Jan 19, 2021

Uncertainty plays a critical role in reinforcement learning and decision making. However, exactly how subjective uncertainty influences behaviour remains unclear. Multi-armed bandits are a useful framework to gain more insight into this. Paired with computational tools such as Kalman filters, they allow us to closely characterize the interplay between trial-by-trial value, uncertainty, learning, and choice. In this talk, I will present recent research where we also measured participants visual fixations on the options in a multi-armed bandit task. The estimated value of each option, and the uncertainty in these estimations, influenced what subjects looked at in the period before making a choice and their subsequent choice, as additionally did fixation itself. Uncertainty also determined how long participants looked at the obtained outcomes. Our findings clearly show the importance of uncertainty in learning and decision making.

SeminarNeuroscienceRecording

An inference perspective on meta-learning

Kate Rakelly
University of California Berkeley
Nov 25, 2020

While meta-learning algorithms are often viewed as algorithms that learn to learn, an alternative viewpoint frames meta-learning as inferring a hidden task variable from experience consisting of observations and rewards. From this perspective, learning to learn is learning to infer. This viewpoint can be useful in solving problems in meta-RL, which I’ll demonstrate through two examples: (1) enabling off-policy meta-learning, and (2) performing efficient meta-RL from image observations. I’ll also discuss how this perspective leads to an algorithm for few-shot image segmentation.

SeminarNeuroscience

On cognitive maps and reinforcement learning in large-scale animal behaviour

Yossi Yovel
Tel Aviv University
Nov 24, 2020

Bats are extreme aviators and amazing navigators. Many bat species nightly com-mute dozens of kilometres in search of food, and some bat species annually migrate over thousands of kilometres. Studying bats in their natural environment has al-ways been extremely challenging because of their small size (mostly <50 gr) and agile nature. We have recently developed novel miniature technology allowing us to GPS-tag small bats, thus opening a new window to document their behaviour in the wild. We have used this technology to track fruit-bats pups over 5 months from birth to adulthood. Following the bats’ full movement history allowed us to show that they use novel short-cuts which are typical for cognitive-map based naviga-tion. In a second study, we examined how nectar-feeding bats make foraging deci-sions under competition. We show that by relying on a simple reinforcement learn-ing strategy, the bats can divide the resource between them without aggression or communication. Together, these results demonstrate the power of the large scale natural approach for studying animal behavior.

SeminarNeuroscienceRecording

On climate change, multi-agent systems and the behaviour of networked control

Arnu Pretorius
InstaDeep
Nov 17, 2020

Multi-agent reinforcement learning (MARL) has recently shown great promise as an approach to networked system control. Arguably, one of the most difficult and important tasks for which large scale networked system control is applicable is common-pool resource (CPR) management. Crucial CPRs include arable land, fresh water, wetlands, wildlife, fish stock, forests and the atmosphere, of which proper management is related to some of society’s greatest challenges such as food security, inequality and climate change. This talk will consist of three parts. In the first, we will briefly look at climate change and how it poses a significant threat to life on our planet. In the second, we will consider the potential of multi-agent systems for climate change mitigation and adaptation. And finally, in the third, we will discuss recent research from InstaDeep into better understanding the behaviour of networked MARL systems used for CPR management. More specifically, we will see how the tools from empirical game-theoretic analysis may be harnessed to analyse the differences in networked MARL systems. The results give new insights into the consequences associated with certain design choices and provide an additional dimension of comparison between systems beyond efficiency, robustness, scalability and mean control performance.

SeminarNeuroscience

A journey through connectomics: from manual tracing to the first fully automated basal ganglia connectomes

Joergen Kornfeld
Massachusetts Institute of Technology
Nov 16, 2020

The "mind of the worm", the first electron microscopy-based connectome of C. elegans, was an early sign of where connectomics is headed, followed by a long time of little progress in a field held back by the immense manual effort required for data acquisition and analysis. This changed over the last few years with several technological breakthroughs, which allowed increases in data set sizes by several orders of magnitude. Brain tissue can now be imaged in 3D up to a millimeter in size at nanometer resolution, revealing tissue features from synapses to the mitochondria of all contained cells. These breakthroughs in acquisition technology were paralleled by a revolution in deep-learning segmentation techniques, that equally reduced manual analysis times by several orders of magnitude, to the point where fully automated reconstructions are becoming useful. Taken together, this gives neuroscientists now access to the first wiring diagrams of thousands of automatically reconstructed neurons connected by millions of synapses, just one line of program code away. In this talk, I will cover these developments by describing the past few years' technological breakthroughs and discuss remaining challenges. Finally, I will show the potential of automated connectomics for neuroscience by demonstrating how hypotheses in reinforcement learning can now be tackled through virtual experiments in synaptic wiring diagrams of the songbird basal ganglia.

SeminarNeuroscienceRecording

The geometry of abstraction in hippocampus and pre-frontal cortex

Stefano Fusi
Columbia University
Oct 15, 2020

The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. Here we characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.

SeminarNeuroscienceRecording

E-prop: A biologically inspired paradigm for learning in recurrent networks of spiking neurons

Franz Scherr
Technische Universität Graz
Aug 30, 2020

Transformative advances in deep learning, such as deep reinforcement learning, usually rely on gradient-based learning methods such as backpropagation through time (BPTT) as a core learning algorithm. However, BPTT is not argued to be biologically plausible, since it requires to a propagate gradients backwards in time and across neurons. Here, we propose e-prop, a novel gradient-based learning method with local and online weight update rules for recurrent neural networks, and in particular recurrent spiking neural networks (RSNNs). As a result, e-prop has the potential to provide a substantial fraction of the power of deep learning to RSNNs. In this presentation, we will motivate e-prop from the perspective of recent insights in neuroscience and show how these have to be combined to form an algorithm for online gradient descent. The mathematical results will be supported by empirical evidence in supervised and reinforcement learning tasks. We will also discuss how limitations that are inherited from gradient-based learning methods, such as sample-efficiency, can be addressed by considering an evolution-like optimization that enhances learning on particular task families. The emerging learning architecture can be used to learn tasks by a single demonstration, hence enabling one-shot learning.

SeminarNeuroscience

Reward foraging task, and model-based analysis reveal how fruit flies learn the value of available options

Duda Kvitsiani
Aarhus University
Jul 28, 2020

Understanding what drives foraging decisions in animals requires careful manipulation of the value of available options while monitoring animal choices. Value-based decision-making tasks, in combination with formal learning models, have provided both an experimental and theoretical framework to study foraging decisions in lab settings. While these approaches were successfully used in the past to understand what drives choices in mammals, very little work has been done on fruit flies. This is even though fruit flies have served as a model organism for many complex behavioural paradigms. To fill this gap we developed a single-animal, trial-based decision-making task, where freely walking flies experienced optogenetic sugar-receptor neuron stimulation. We controlled the value of available options by manipulating the probabilities of optogenetic stimulation. We show that flies integrate a reward history of chosen options and forget value of unchosen options. We further discover that flies assign higher values to rewards experienced early in the behavioural session, consistent with formal reinforcement learning models. Finally, we show that the probabilistic rewards affect walking trajectories of flies, suggesting that accumulated value is controlling the navigation vector of flies in a graded fashion. These findings establish the fruit fly as a model organism to explore the genetic and circuit basis of value-based decisions.

SeminarNeuroscience

Delineating Reward/Avoidance Decision Process in the Impulsive-compulsive Spectrum Disorders through a Probabilistic Reversal Learning Task

Xiaoliu Zhang
Monash University
Jul 18, 2020

Impulsivity and compulsivity are behavioural traits that underlie many aspects of decision-making and form the characteristic symptoms of Obsessive Compulsive Disorder (OCD) and Gambling Disorder (GD). The neural underpinnings of aspects of reward and avoidance learning under the expression of these traits and symptoms are only partially understood. " "The present study combined behavioural modelling and neuroimaging technique to examine brain activity associated with critical phases of reward and loss processing in OCD and GD. " "Forty-two healthy controls (HC), forty OCD and twenty-three GD participants were recruited in our study to complete a two-session reinforcement learning (RL) task featuring a “probability switch (PS)” with imaging scanning. Finally, 39 HC (20F/19M, 34 yrs +/- 9.47), 28 OCD (14F/14M, 32.11 yrs ±9.53) and 16 GD (4F/12M, 35.53yrs ± 12.20) were included with both behavioural and imaging data available. The functional imaging was conducted by using 3.0-T SIEMENS MAGNETOM Skyra syngo MR D13C at Monash Biomedical Imaging. Each volume compromised 34 coronal slices of 3 mm thickness with 2000 ms TR and 30 ms TE. A total of 479 volumes were acquired for each participant in each session in an interleaved-ascending manner. " " The standard Q-learning model was fitted to the observed behavioural data and the Bayesian model was used for the parameter estimation. Imaging analysis was conducted using SPM12 (Welcome Department of Imaging Neuroscience, London, United Kingdom) in the Matlab (R2015b) environment. The pre-processing commenced with the slice timing, realignment, normalization to MNI space according to T1-weighted image and smoothing with a 8 mm Gaussian kernel. " " The frontostriatal brain circuit including the putamen and medial orbitofrontal (mOFC) were significantly more active in response to receiving reward and avoiding punishment compared to receiving an aversive outcome and missing reward at 0.001 with FWE correction at cluster level; While the right insula showed greater activation in response to missing rewards and receiving punishment. Compared to healthy participants, GD patients showed significantly lower activation in the left superior frontal and posterior cingulum at 0.001 for the gain omission. " " The reward prediction error (PE) signal was found positively correlated with the activation at several clusters expanding across cortical and subcortical region including the striatum, cingulate, bilateral insula, thalamus and superior frontal at 0.001 with FWE correction at cluster level. The GD patients showed a trend of decreased reward PE response in the right precentral extending to left posterior cingulate compared to controls at 0.05 with FWE correction. " " The aversive PE signal was negatively correlated with brain activity in regions including bilateral thalamus, hippocampus, insula and striatum at 0.001 with FWE correction. Compared with the control group, GD group showed an increased aversive PE activation in the cluster encompassing right thalamus and right hippocampus, and also the right middle frontal extending to the right anterior cingulum at 0.005 with FWE correction. " " Through the reversal learning task, the study provided a further support of the dissociable brain circuits for distinct phases of reward and avoidance learning. Also, the OCD and GD is characterised by aberrant patterns of reward and avoidance processing.

SeminarNeuroscienceRecording

Thinking Fast and Slow in AlphaZero and the Brain

Sebastian Bodenstein
Jun 16, 2020

In his bestseller 'Thinking, Fast and Slow', Daniel Kahneman popularized the idea that there are two fundamentally different process of thought: a 'System 1' process that is unconscious and instinctive, and a 'System 2' process that is deliberative and requires conscious attention. There is a growing recognition that machine learning is mostly stuck at the 'System 1' level of cognition, and that moving to 'System 2' methods are key to solving long-standing challenges such as out-of-distribution generalization. In this talk, AlphaZero will be used as a case-study of the power of combining 'System 1' and 'System 2' processes. The similarities and differences between AlphaZero and human learning will be explored, along with drawing lessons for the future of machine learning.

SeminarNeuroscienceRecording

Deep learning for model-based RL

Timothy Lillicrap
Google Deep Mind, University College London
Jun 11, 2020

Model-based approaches to control and decision making have long held the promise of being more powerful and data efficient than model-free counterparts. However, success with model-based methods has been limited to those cases where a perfect model can be queried. The game of Go was mastered by AlphaGo using a combination of neural networks and the MCTS planning algorithm. But planning required a perfect representation of the game rules. I will describe new algorithms that instead leverage deep neural networks to learn models of the environment which are then used to plan, and update policy and value functions. These new algorithms offer hints about how brains might approach planning and acting in complex environments.

SeminarNeuroscience

Striatal circuits for reward learning and decision-making

Ilana Witten
Princeton University
Jun 10, 2020

How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens (NAc), which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex (PL) and midline regions of the thalamus (mTH). However, little is known about what is represented in PL or mTH neurons that project to NAc (PL-NAc and mTH-NAc). By comparing these inputs during a reinforcement learning task in mice, we discovered that i) PL-NAc preferentially represents actions and choices, ii) mTH-NAc preferentially represents cues, iii) choice-selective activity in PL-NAc is organized in sequences that persist beyond the outcome. Through computational modelling, we demonstrate that these sequences can support the neural implementation of temporal difference learning, a powerful algorithm to connect actions and outcomes across time. Finally, we test and confirm predictions of our circuit model by direct manipulation of PL-NAc neurons. Thus, we integrate experiment and modelling to suggest a neural solution for credit assignment.

SeminarNeuroscienceRecording

The geometry of abstraction in artificial and biological neural networks

Stefano Fusi
Columbia University
Jun 10, 2020

The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. We characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.

SeminarNeuroscienceRecording

Spanning the arc between optimality theories and data

Gasper Tkacik
Institute of Science and Technology Austria
Jun 1, 2020

Ideas about optimization are at the core of how we approach biological complexity. Quantitative predictions about biological systems have been successfully derived from first principles in the context of efficient coding, metabolic and transport networks, evolution, reinforcement learning, and decision making, by postulating that a system has evolved to optimize some utility function under biophysical constraints. Yet as normative theories become increasingly high-dimensional and optimal solutions stop being unique, it gets progressively hard to judge whether theoretical predictions are consistent with, or "close to", data. I will illustrate these issues using efficient coding applied to simple neuronal models as well as to a complex and realistic biochemical reaction network. As a solution, we developed a statistical framework which smoothly interpolates between ab initio optimality predictions and Bayesian parameter inference from data, while also permitting statistically rigorous tests of optimality hypotheses.

ePoster

Adaptive brain-computer interfaces based on error-related potentials and reinforcement learning

Aline Xavier Fidencio, Christian Klaes, Ioannis Iossifidis

Bernstein Conference 2024

ePoster

How Do Bees See the World? A (Normative) Deep Reinforcement Learning Model for Insect Navigation

Stephan Lochner, Andrew Straw

Bernstein Conference 2024

ePoster

Competition and integration of sensory signals in a deep reinforcement learning agent

Sandhiya Vijayabaskaran, Sen Cheng

Bernstein Conference 2024

ePoster

Controversial Opinions on Model Based and Model Free Reinforcement Learning in the Brain

Felix Grün, Ioannis Iossifidis

Bernstein Conference 2024

ePoster

Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron

Christian Schmid, James Murray

Bernstein Conference 2024

ePoster

Neuromodulated online cognitive maps for reinforcement learning

Krubeal Danieli, Mikkel Lepperød, Marianne Fyhn

Bernstein Conference 2024

ePoster

Automatic Task Decomposition using Compositional Reinforcement Learning

COSYNE 2022

ePoster

Continual Reinforcement Learning with Multi-Timescale Successor Features

COSYNE 2022

ePoster

Deep Reinforcement Learning mimics Neural Strategies for Limb Movements

COSYNE 2022

ePoster

Energy efficient reinforcement learning as a matter of life and death

COSYNE 2022

ePoster

Integrating deep reinforcement learning agents with the C. elegans nervous system

COSYNE 2022

ePoster

Integrating deep reinforcement learning agents with the C. elegans nervous system

COSYNE 2022

ePoster

Linking tonic dopamine and biased value predictions in a biologically inspired reinforcement learning model

COSYNE 2022

ePoster

Linking tonic dopamine and biased value predictions in a biologically inspired reinforcement learning model

COSYNE 2022

ePoster

Soft-actor-critic for model-free reinforcement learning of eye saccade control

COSYNE 2022

ePoster

Soft-actor-critic for model-free reinforcement learning of eye saccade control

COSYNE 2022

ePoster

A striatal probabilistic population code for reward underlies distributional reinforcement learning

COSYNE 2022

ePoster

A striatal probabilistic population code for reward underlies distributional reinforcement learning

COSYNE 2022

ePoster

Time cell encoding in deep reinforcement learning agents depends on mnemonic demands

COSYNE 2022

ePoster

Time cell encoding in deep reinforcement learning agents depends on mnemonic demands

COSYNE 2022

ePoster

What do meta-reinforcement learning networks learn in two-stage decision-making?

COSYNE 2022

ePoster

What do meta-reinforcement learning networks learn in two-stage decision-making?

COSYNE 2022

ePoster

Controlling human cortical and striatal reinforcement learning with meta prediction error

Jae Hoon Shin, Jee Hang Lee, Sang Wan Lee

COSYNE 2023

ePoster

Cortical dopamine enables deep reinforcement learning and leverages dopaminergic heterogeneity

Jack Lindsey & Ashok Litwin-Kumar

COSYNE 2023

ePoster

Language emergence in reinforcement learning agents performing navigational tasks

Tobias Wieczorek, Maximilian Eggl, Tatjana Tchumatchenko, Carlos Wert Carvajal

COSYNE 2023

ePoster

Modelling ecological constraints on visual processing with deep reinforcement learning

Sacha Sokoloski, Jure Majnik, Thomas Euler, Philipp Berens

COSYNE 2023

ePoster

Reinforcement learning at multiple timescales in biological and artificial neural networks

Paul Masset, Pablo Tano, Athar Malik, HyungGoo Kim, Pol Bech, Alexandre Pouget, Naoshige Uchida

COSYNE 2023

ePoster

Two types of locus coeruleus norepinephrine neurons drive reinforcement learning

Zhixiao Su & Jeremiah Cohen

COSYNE 2023

ePoster

Violations of transitivity disrupt relational inference in humans and reinforcement learning models

Thomas Graham & Bernhard Spitzer

COSYNE 2023

ePoster

Brain-like neural dynamics for behavioral control develop through reinforcement learning

Olivier Codol, Nanda H Krishna, Guillaume Lajoie, Matthew G. Perich

COSYNE 2025

ePoster

Correctness is its own reward: bootstrapping error codes in self-guided reinforcement learning

Ziyi Gong, Fabiola Duarte Ortiz, Richard Mooney, John Pearson

COSYNE 2025

ePoster

Deep reinforcement learning trains agents to track odor plumes with active sensing

Lawrence Jianqiao Hu, Elliott Abe, Harsha Gurnani, Daniel Sitonic, Floris van Breugel, Edgar Y. Walker, Bing Brunton

COSYNE 2025

ePoster

Dual-Model Framework for Cerebellar Function: Integrating Reinforcement Learning and Adaptive Control

Carlos Stein N Brito, Daniel McNamee

COSYNE 2025

ePoster

A GPU-Accelerated Deep Reinforcement Learning Pipeline for Simulating Animal Behavior

Charles Zhang, Elliott Abe, Jason Foat, Bing Brunton, Talmo Pereira, Bence Olveczky, Emil Warnberg

COSYNE 2025

ePoster

Humans forage for reward in classic reinforcement learning tasks

Meriam Zid, Veldon-James Laurie, Alix Levine-Champagne, Akram Shourkeshti, Dameon Harrell, Alexander B Herman, Becket Ebitz

COSYNE 2025

ePoster

Intracranial recordings uncover neuronal dynamics of multidimensional reinforcement learning.

Christina Maher, Salman Qasim, Lizbeth Nunez Martinez, Angela Radulescu, Ignacio Saez

COSYNE 2025

ePoster

Inverse reinforcement learning with switching rewards and history dependency for studying behaviors

Jingyang Ke, Feiyang Wu, Jiyi Wang, Jeffrey Markowitz, Anqi Wu

COSYNE 2025

ePoster

Selective representation of reinforcement learning variables in subpopulations of the external globus pallidus

Lars Rollik, Marcus Stephenson-Jones

COSYNE 2025

ePoster

Acquiring musculoskeletal skills with curriculum-based reinforcement learning

Alberto Chiappa, Pablo Tano, Nisheet Patel, Abigaïl Ingster, Alexandre Pouget, Alexander Mathis

FENS Forum 2024

ePoster

Modeling the sensorimotor system with deep reinforcement learning

Alessandro Marin Vargas, Alberto Silvio Chiappa, Alexander Mathis

FENS Forum 2024