reinforcement learning

The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for 3 PhD students and 1 postdoc. Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”) through the use of neuroexplicit models and investigate the cross-cutting design principles of effective neuroexplicit models (“Foundations”).

N/A

University of Manchester

1) Lecturer/Senior Lecturer (Assoc/Asst Prof) in Machine Learning: The University of Manchester is making a strategic investment in fundamentals of AI, to complement its existing strengths in AI applications across several prominent research fields in the University. Applications are welcome in any area of the fundamentals of machine learning, in particular probabilistic modelling, deep learning, reinforcement learning, causal modelling, human-in-the-loop ML, explainable AI, ethics, privacy and security. This position is meant to contribute to machine learning methodologies and not purely to their applications. You will be located in the Department of Computer Science and, in addition to the new centre for Fundamental AI research, you will belong to a large community of machine learning, data science and AI researchers. 2) Programme Manager – Centre for AI Fundamentals: The University of Manchester is seeking to appoint an individual with a strategic mindset and a track record of building and leading collaborative relationships and professional networks, expertise in a domain ideally related to artificial intelligence, excellent communication and interpersonal skills, experience in managing high-performing teams, and demonstrable ability to support the preparation of large, complex grant proposals to take up the role of Programme Manager for the Centre for AI Fundamentals. The successful candidate will play a major role in developing and shaping the Centre, working closely with its Director to grow the Centre and plan and deliver an exciting programme of activities, including leading key science translational activity and development of use cases in the Centre’s key domains, partnership development, bid writing, resource management, impact and public engagement strategies.

Adaptive Brain Lab, University of Cambridge

Prof Zoe Kourtzi

University of Cambridge

Post-doctoral position in Cognitive Computational Neuroscience at the Adaptive Brain Lab. The role involves combining high field brain imaging (7T fMRI, MR Spectroscopy), electrophysiology (EEG), computational modelling (machine learning, reinforcement learning) and interventions (TMS, tDCS, pharmacology) to understand network dynamics for learning and brain plasticity. The research programme bridges work across scales (local circuits, global networks) and species (humans, rodents) to uncover the neurocomputations that support learning and brain plasticity.

University of Applied Sciences Würzburg-Schweinfurt

Multiple open professor positions at the technical University of Applied Sciences Würzburg-Schweinfurt in Computer Vision, Reinforcement Learning, Dynamical Systems

N/A

University of Neuchatel

Neuchatel, Switzerland

This project is about developing reinforcement-learning based AI systems that directly interact with some segment of society. The applications include matching and other allocation problems. The research will be performed at the interface between reinforcement learning, social choice theory, Bayesian inference, mechanism design, differential privacy and algorithmic fairness. The research will have both a theoretical and practical component, which will include some experiments with humans. However, a good theoretical background in probability, machine learning or game theory is necessary for all students. The positions are available from January 2024. The PhD lasts for 4 years and includes a small teaching component.

N/A

University of Neuchatel

Neuchatel, Switzerland

The project is about developing reinforcement-learning based AI systems that directly interact with some segment of society. The applications include matching and other allocation problems. The research will be performed at the interface between reinforcement learning, social choice theory, Bayesian inference, mechanism design, differential privacy and algorithmic fairness. The research will have both a theoretical and practical component, which will include some experiments with humans.

PositionNeuroscience

Haim Sompolinsky, Kenneth Blum

Harvard University

The Swartz Program at Harvard University seeks applicants for a postdoctoral fellow in theoretical and computational neuroscience. Based on a grant from the Swartz Foundation, a Swartz postdoctoral fellowship is available at Harvard University with a start date in the summer or fall of 2024. Postdocs join a vibrant group of theoretical and experimental neuroscientists plus theorists in allied fields at Harvard’s Center for Brain Science. The Center for Brain Science includes faculty doing research on a wide variety of topics, including neural mechanisms of rodent learning, decision-making, and sex-specific and social behaviors; reinforcement learning in rodents and humans; human motor control; behavioral and fMRI studies of human cognition; circuit mechanisms of learning and behavior in worms, larval flies, and larval zebrafish; circuit mechanisms of individual differences in flies and humans; rodent and fly olfaction; inhibitory circuit development; retinal circuits; and large-scale reconstruction of detailed brain circuitry.

Prof Mark Humphries

University of Nottingham

Nottingham

The Humphries’ lab at the University of Nottingham is seeking a postdoc to study the neural basis of foraging, in collaboration with the groups of Matthew Apps (Birmingham) and Nathan Lepora (Bristol). Whether choosing to leave one shop for another, switching TV programs, or seeking berries to eat, humans and other animals make innumerable stay-or-leave decisions, but how we make them is not well understood. The goal of this project is to develop new computational accounts of stay-or-leave decisions, and use them to test hypotheses for how humans, primates, and rodents learn and make these decisions. The work will draw on and develop new reinforcement learning and accumulation (e.g. diffusion) models of decision-making. The Humphries’ group researches fundamental insights into how the joint activity of neurons encodes actions in the world (https://www.humphries-lab.org). This post will join our developing research program into how humans and other animals learn to make the right decisions (e.g. https://doi.org/10.1101/2022.08.30.505807).

Understanding reward-guided learning using large-scale datasets

Kim Stachenfeld

DeepMind, Columbia U

Jul 9, 2025

Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.

University of Cambridge UK

Understanding reward-guided learning using large-scale datasets

Screen Savers : Protecting adolescent mental health in a digital world

Amy Orben

Dec 3, 2024

In our rapidly evolving digital world, there is increasing concern about the impact of digital technologies and social media on the mental health of young people. Policymakers and the public are nervous. Psychologists are facing mounting pressures to deliver evidence that can inform policies and practices to safeguard both young people and society at large. However, research progress is slow while technological change is accelerating.My talk will reflect on this, both as a question of psychological science and metascience. Digital companies have designed highly popular environments that differ in important ways from traditional offline spaces. By revisiting the foundations of psychology (e.g. development and cognition) and considering digital changes' impact on theories and findings, we gain deeper insights into questions such as the following. (1) How do digital environments exacerbate developmental vulnerabilities that predispose young people to mental health conditions? (2) How do digital designs interact with cognitive and learning processes, formalised through computational approaches such as reinforcement learning or Bayesian modelling?However, we also need to face deeper questions about what it means to do science about new technologies and the challenge of keeping pace with technological advancements. Therefore, I discuss the concept of ‘fast science’, where, during crises, scientists might lower their standards of evidence to come to conclusions quicker. Might psychologists want to take this approach in the face of technological change and looming concerns? The talk concludes with a discussion of such strategies for 21st-century psychology research in the era of digitalization.

Sam Gershman, Jonathan Pillow, Kenji Doya

Decision and Behavior

Harvard University; Princeton University; Okinawa Institute of Science and Technology

Nov 29, 2024

This webinar addressed computational perspectives on how animals and humans make decisions, spanning normative, descriptive, and mechanistic models. Sam Gershman (Harvard) presented a capacity-limited reinforcement learning framework in which policies are compressed under an information bottleneck constraint. This approach predicts pervasive perseveration, stimulus‐independent “default” actions, and trade-offs between complexity and reward. Such policy compression reconciles observed action stochasticity and response time patterns with an optimal balance between learning capacity and performance. Jonathan Pillow (Princeton) discussed flexible descriptive models for tracking time-varying policies in animals. He introduced dynamic Generalized Linear Models (Sidetrack) and hidden Markov models (GLM-HMMs) that capture day-to-day and trial-to-trial fluctuations in choice behavior, including abrupt switches between “engaged” and “disengaged” states. These models provide new insights into how animals’ strategies evolve under learning. Finally, Kenji Doya (OIST) highlighted the importance of unifying reinforcement learning with Bayesian inference, exploring how cortical-basal ganglia networks might implement model-based and model-free strategies. He also described Japan’s Brain/MINDS 2.0 and Digital Brain initiatives, aiming to integrate multimodal data and computational principles into cohesive “digital brains.”

Unmotivated bias

William Cunningham

University of Toronto

Nov 12, 2024

In this talk, I will explore how social affective biases arise even in the absence of motivational factors as an emergent outcome of the basic structure of social learning. In several studies, we found that initial negative interactions with some members of a group can cause subsequent avoidance of the entire group, and that this avoidance perpetuates stereotypes. Additional cognitive modeling discovered that approach and avoidance behavior based on biased beliefs not only influences the evaluative (positive or negative) impressions of group members, but also shapes the depth of the cognitive representations available to learn about individuals. In other words, people have richer cognitive representations of members of groups that are not avoided, akin to individualized vs group level categories. I will end presenting a series of multi-agent reinforcement learning simulations that demonstrate the emergence of these social-structural feedback loops in the development and maintenance of affective biases.

Pater Dayan & Jonathan Rubin

Max Planck Institute for Biological Cybernetics resp. University of Pittsburgh

Nov 19, 2021

SeminarMachine LearningRecording

Playing StarCraft and saving the world using multi-agent reinforcement learning!

InstaDeep

Oct 29, 2021

This is my C-14 Impaler gauss rifle! There are many like it, but this one is mine!" - A terran marine If you have never heard of a terran marine before, then you have probably missed out on playing the very engaging and entertaining strategy computer game, StarCraft. However, don’t despair, because what we have in store might be even more exciting! In this interactive session, we will take you through, step-by-step, on how to train a team of terran marines to defeat a team of marines controlled by the built-in game AI in StarCraft II. How will we achieve this? Using multi-agent reinforcement learning (MARL). MARL is a useful framework for building distributed intelligent systems. In MARL, multiple agents are trained to act as individual decision-makers of some larger system, while learning to work as a team. We will show you how to use Mava (https://github.com/instadeepai/Mava), a newly released research framework for MARL to build a multi-agent learning system for StarCraft II. We will provide the necessary guidance, tools and background to understand the key concepts behind MARL, how to use Mava building blocks to build systems and how to train a system from scratch. We will conclude the session by briefly sharing various exciting real-world application areas for MARL at InstaDeep, such as large-scale autonomous train navigation and circuit board routing. These are problems that become exponentially more difficult to solve as they scale. Finally, we will argue that many of humanity’s most important practical problems are reminiscent of the ones just described. These include, for example, the need for sustainable management of distributed resources under the pressures of climate change, or efficient inventory control and supply routing in critical distribution networks, or robotic teams for rescue missions and exploration. We believe MARL has enormous potential to be applied in these areas and we hope to inspire you to get excited and interested in MARL and perhaps one day contribute to the field!

SeminarMachine LearningRecording

Network dynamics in the basal ganglia and possible implications for Parkinson’s disease

Jonathan Rubin

University of Pittsburgh

Oct 14, 2021

The basal ganglia are a collection of brain areas that are connected by a variety of synaptic pathways and are a site of significant reward-related dopamine release. These properties suggest a possible role for the basal ganglia in action selection, guided by reinforcement learning. In this talk, I will discuss a framework for how this function might be performed. I will also present some recent experimental results and theory that call for a re-evaluation of certain aspects of this framework. Next, I will turn to the changes in basal ganglia activity observed to occur with the dopamine depletion associated with Parkinson’s disease. I will discuss some of the potential functional implications of some of these changes and, if time permits, will conclude with some new results that focus on delta oscillations under dopamine depletion.

Sebastian Bodenstein

Jun 17, 2020

In his bestseller 'Thinking, Fast and Slow', Daniel Kahneman popularized the idea that there are two fundamentally different process of thought: a 'System 1' process that is unconscious and instinctive, and a 'System 2' process that is deliberative and requires conscious attention. There is a growing recognition that machine learning is mostly stuck at the 'System 1' level of cognition, and that moving to 'System 2' methods are key to solving long-standing challenges such as out-of-distribution generalization. In this talk, AlphaZero will be used as a case-study of the power of combining 'System 1' and 'System 2' processes. The similarities and differences between AlphaZero and human learning will be explored, along with drawing lessons for the future of machine learning.

Google Deep Mind, University College London

Deep learning for model-based RL

Timothy Lillicrap

Jun 12, 2020

Model-based approaches to control and decision making have long held the promise of being more powerful and data efficient than model-free counterparts. However, success with model-based methods has been limited to those cases where a perfect model can be queried. The game of Go was mastered by AlphaGo using a combination of neural networks and the MCTS planning algorithm. But planning required a perfect representation of the game rules. I will describe new algorithms that instead leverage deep neural networks to learn models of the environment which are then used to plan, and update policy and value functions. These new algorithms offer hints about how brains might approach planning and acting in complex environments.

Striatal circuits for reward learning and decision-making

Ilana Witten

Princeton University

Jun 11, 2020

How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens (NAc), which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex (PL) and midline regions of the thalamus (mTH). However, little is known about what is represented in PL or mTH neurons that project to NAc (PL-NAc and mTH-NAc). By comparing these inputs during a reinforcement learning task in mice, we discovered that i) PL-NAc preferentially represents actions and choices, ii) mTH-NAc preferentially represents cues, iii) choice-selective activity in PL-NAc is organized in sequences that persist beyond the outcome. Through computational modelling, we demonstrate that these sequences can support the neural implementation of temporal difference learning, a powerful algorithm to connect actions and outcomes across time. Finally, we test and confirm predictions of our circuit model by direct manipulation of PL-NAc neurons. Thus, we integrate experiment and modelling to suggest a neural solution for credit assignment.

The geometry of abstraction in artificial and biological neural networks

Stefano Fusi

Columbia University

Jun 11, 2020

The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. We characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.