Reinforcement Learning
reinforcement learning
Gatsby Computational Neuroscience Unit
4-Year PhD Programme in Theoretical Neuroscience and Machine Learning Call for Applications! Deadline: 13 November 2022 The Gatsby Computational Neuroscience Unit is a leading research centre focused on theoretical neuroscience and machine learning. We study (un)supervised and reinforcement learning; inference, coding and neural dynamics; Bayesian and kernel methods; deep learning; with applications to the analysis of perceptual processing and cognition, neural data, signal and image processing, machine vision, network data and nonparametric hypothesis testing. The unit provides a unique opportunity for a critical mass of theoreticians to interact closely with one another and with researchers at the Sainsbury Wellcome Centre for Neural Circuits and Behaviour (SWC), the Centre for Computational Statistics and Machine Learning (CSML) and related UCL departments such as Computer Science; Statistical Science; Artificial Intelligence; the ELLIS Unit at UCL; Neuroscience; and the nearby Alan Turing and Francis Crick Institutes. Our PhD programme provides a rigorous preparation for a research career. Students complete a 4-year PhD in either machine learning or theoretical and computational neuroscience, with minor emphasis in the complementary field. Courses in the first year provide a comprehensive introduction to both fields and systems neuroscience. Students are encouraged to work and interact closely with SWC/CSML researchers to take advantage of this uniquely multidisciplinary research environment. Full funding is available regardless of nationality. The unit also welcomes applicants who have secured or are seeking funding from other sources. To apply, please visit www.ucl.ac.uk/gatsby/study-and-work/phd-programme
Dr. Tatsuo Okubo
We are a new group at the Chinese Institute for Brain Research (CIBR), Beijing, which focuses on using modern data science and machine learning tools on neuroscience data. We collaborate with various labs within CIBR to develop models and analysis pipelines to accelerate neuroscience research. We are looking for enthusiastic and talented machine learning engineers and data scientists to join this effort. Example projects include (but not limited to) extracting hidden states from population neural activity, automating behavioral classification from videos, and segmenting neurons from confocal images using deep learning.
Dr Flavia Mancini
This is an opportunity for a highly creative and skilled pre-doctoral Research Assistant to join the dynamic and multidisciplinary research environment of the Computational and Biological Learning research group (https://www.cbl-cambridge.org/), Department of Engineering, University of Cambridge. We are looking for a Research Assistant to work on projects related to statistical learning and contextual inference in the human brain. We have a particular focus of learning of aversive states, as this has a strong clinical significance for chronic pain and mental health disorders. The RA will be supervised by Dr Flavia Mancini (MRC Career Development fellow, and Head of the Nox Lab www.noxlab.org), and is expected to collaborate with theoretical and experimental colleagues in Cambridge, Oxford and abroad. The post holder will be located in central Cambridge, Cambridgeshire, UK. As a general approach, we combine statistical learning tasks in humans, computational modelling (using Bayesian inference, reinforcement learning, deep learning and neural networks) with neuroimaging methods (including 7T fMRI). The successful candidate will strengthen this approach and be responsible for designing experiments, collecting and analysis behavioural and brain fMRI data using computational modelling techniques. The key responsibilities and duties are: Ideating and conducting research studies on statistical/aversive learning, combining behavioural tasks, computational modelling (using Bayesian inference, reinforcement learning, deep learning and/or neural networks) with fMRI in healthy volunteers and chronic pain patients. Disseminating research findings Maintaining and developing technical skills to expand their scientific potential ******* More info and to apply: https://www.jobs.cam.ac.uk/job/35905/
Erik C. Johnson
The Intelligent Systems Center at JHU/APL is an interdisciplinary research center for neuroscientists, AI researchers, and roboticists. Please see the individual listings for specific postings and application instructions. Postings for Neuroscience-Inspired AI researchers and Computational Neuroscience researchers may also be posted soon. https://prdtss.jhuapl.edu/jobs/senior-neural-decoding-researcher-2219 https://prdtss.jhuapl.edu/jobs/senior-reinforcement-learning-researcher-615 https://prdtss.jhuapl.edu/jobs/senior-computer-vision-researcher-2242 https://prdtss.jhuapl.edu/jobs/artificial-intelligence-software-developer-2255
Francisco Pereira
The Machine Learning Team at the National Institute of Mental Health (NIMH) in Bethesda, MD, has an open position for a machine learning research scientist. The NIMH is the leading federal agency for research on mental disorders and neuroscience, and part of the National Institutes of Health (NIH). Our mission is to help NIMH scientists use machine learning methods to address a diverse set of research problems in clinical and cognitive psychology and neuroscience. These range from identifying biomarkers for aiding diagnoses to creating and testing models of mental processes in healthy subjects. Our overarching goal is to use machine learning to improve every aspect of the scientific effort, from helping discover or develop theories to generating actionable results. For more information, please refer to the full ad https://nih-fmrif.github.io/ml/index.html
Prof. Thilo Stadelmann
Head of a new research group and member of the Centre Board for a Senior Lecturer Autonomous Learning Systems and Reinforcement Learning (incl. responsibility in research & leadership) 50 - 100 %.
Prof. Chin-Teng Lin
Become a postdoctoral researcher or PhD student at the Human-centric AI Centre (HAI) at the University of Technology Sydney, Australia and work in frontier research in the areas of deep machine learning, trusted AI, trustworthy human-autonomy teaming, natural BCIs, reliable GPT, extended reality and swarm intelligence. We are seeking applicants for PhD scholarships and postdoctoral research positions with prior experience and skills in some of the following areas: machine learning fundamentals and generative models, deep learning optimisation and reinforcement learning, brain-computer interfaces, drones and robotics, trusted AI, extended reality.
Prof. Thilo Stadelmann
Head of a new research group and member of the Centre Board for a Senior Lecturer Autonomous Learning Systems and Reinforcement Learning (incl. responsibility in research & leadership) 50 - 100 %.
Prof. Dr.-Ing. Marcus Magnor
The job is a W3 Full Professorship for Artificial Intelligence in interactive Systems at Technische Universität Braunschweig. The role involves expanding the research area of data-driven methods for interactive and intelligent systems at the TU Braunschweig and strengthening the focal points 'Data Science' and 'Reliability' of the Department of Computer Science. The position holder is expected to have a strong background in Computer Science with a focus on Artificial Intelligence/Machine Learning, specifically in the areas of Dependable AI and Explainable AI. The role also involves teaching, topic-related courses in the areas of Artificial Intelligence and Machine Learning to complement the Bachelor's and Master's degree programs of the Department of Computer Science.
Jun.-Prof. Dr.-Ing. Rania Rayyes
The main focus of this position is to develop novel AI systems and methods for robot applications: Dexterous robot grasping, Human-robot learning, Transfer learning – efficient online learning. The role offers close cooperation with other institutes, universities, and numerous industrial partners, a self-determined development environment for own research topics with active support for the doctorate research project, flexible working hours, and work in a young, interdisciplinary research team.
N/A
The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for 3 PhD students and 1 postdoc. Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”) through the use of neuroexplicit models and investigate the cross-cutting design principles of effective neuroexplicit models (“Foundations”).
N/A
1) Lecturer/Senior Lecturer (Assoc/Asst Prof) in Machine Learning: The University of Manchester is making a strategic investment in fundamentals of AI, to complement its existing strengths in AI applications across several prominent research fields in the University. Applications are welcome in any area of the fundamentals of machine learning, in particular probabilistic modelling, deep learning, reinforcement learning, causal modelling, human-in-the-loop ML, explainable AI, ethics, privacy and security. This position is meant to contribute to machine learning methodologies and not purely to their applications. You will be located in the Department of Computer Science and, in addition to the new centre for Fundamental AI research, you will belong to a large community of machine learning, data science and AI researchers. 2) Programme Manager – Centre for AI Fundamentals: The University of Manchester is seeking to appoint an individual with a strategic mindset and a track record of building and leading collaborative relationships and professional networks, expertise in a domain ideally related to artificial intelligence, excellent communication and interpersonal skills, experience in managing high-performing teams, and demonstrable ability to support the preparation of large, complex grant proposals to take up the role of Programme Manager for the Centre for AI Fundamentals. The successful candidate will play a major role in developing and shaping the Centre, working closely with its Director to grow the Centre and plan and deliver an exciting programme of activities, including leading key science translational activity and development of use cases in the Centre’s key domains, partnership development, bid writing, resource management, impact and public engagement strategies.
Prof Zoe Kourtzi
Post-doctoral position in Cognitive Computational Neuroscience at the Adaptive Brain Lab. The role involves combining high field brain imaging (7T fMRI, MR Spectroscopy), electrophysiology (EEG), computational modelling (machine learning, reinforcement learning) and interventions (TMS, tDCS, pharmacology) to understand network dynamics for learning and brain plasticity. The research programme bridges work across scales (local circuits, global networks) and species (humans, rodents) to uncover the neurocomputations that support learning and brain plasticity.
Quentin Huys
The aim of the position is to establish a thorough computational modelling framework for cognitive research in mental health settings, focusing in particular on longitudinal data, such as changes due to treatments or longitudinally with development.
Cassio de Campos
We are looking for a highly motivated and skilled PhD candidate to work in the area of Reinforcement Learning (broadly speaking) in the Uncertainty in AI group of TU Eindhoven, The Netherlands. It is a full-time formal job and salaries are competitive. TU Eindhoven is an English-language university.
N/A
The AI Department of the Donders Centre for Cognition (DCC), embedded in the Donders Institute for Brain, Cognition and Behaviour, and the School of Artificial Intelligence at Radboud University Nijmegen are looking for a researcher in reinforcement learning with an emphasis on safety and robustness, an interest in natural computing as well as in applications in neurotechnology and other domains such as robotics, healthcare and/or sustainability. You will be expected to perform top-quality research in (deep) reinforcement learning, actively contribute to the DBI2 consortium, interact and collaborate with other researchers and specialists in academia and/or industry, and be an inspiring member of our staff with excellent communication skills. You are also expected to engage with students through teaching and master projects not exceeding 20% of your time.
I-Chun Lin
The Gatsby Unit seeks to appoint a new principal investigator with an outstanding record of research achievement and an innovative research programme in theoretical neuroscience or machine learning at any academic rank. In theoretical neuroscience, we are particularly interested in candidates who focus on the mathematical underpinnings of adaptive intelligent behaviour in animals, or develop mathematical tools and models to understand how neural circuits and systems function. In machine learning, we seek candidates who focus on the mathematical foundations of learning from data and experience, addressing fundamental questions in probabilistic or statistical machine learning and understanding; areas of particular interest include generative or probabilistic modelling, causal discovery, reinforcement learning, theory of deep learning, and links between these areas and neuroscience or cognitive science.
Arlindo Oliveira
The Machine Learning and Knowledge Discovery Group of INESC-ID is looking for qualified applicants for three fully funded PhD student positions on topics related with the application of deep learning techniques to problems with societal impact. These positions are funded by a large-scale research project in responsible AI, supported by the Resiliency and Recovery Facility. The successful candidates will pursue a PhD degree in Computer Science and Engineering at Instituto Superior Técnico, in Lisbon Portugal. The broad topics of research are: 1 - Normalization of geolocation records using deep learning techniques, 2 - High confidence information retrieval and question answering, 3 - Application of reinforcement learning methods to the generation of efficient algorithms.
Samuel Kaski
The University of Manchester is making a strategic investment in fundamentals of AI, to complement its existing strengths in AI applications across several prominent research fields in the University, which give high-profile application and collaboration opportunities for the outcomes of fundamental AI research. The university is one of the most active partners of the national Alan Turing Institute, hosts 33 Turing Fellows and Fellows of the European Laboratory of Learning and Intelligent Systems ELLIS, in the new ELLIS Unit Manchester. The university’s ambition is to establish a leading AI centre at the cross section of these opportunities. The university has recently launched a Centre for AI Fundamentals and has already recruited four new academics to it. These two lectureships continue this series of positions in establishing the new Centre.
N/A
We are hiring in Helsinki Finland, to my research group and in the Finnish Centre for Artificial Intelligence AI and ELLIS Unit Helsinki. These are two separate calls - you can apply in one or both calls: 1. My research group, probabilistic machine learning. 2. Finnish Center for Artificial Intelligence and ELLIS Unit Helsinki. The positions can include a suitable combination of theoretical and methodological work, and applications such as drug design, synthetic biology, economics, neuroimaging.
Frank
Multiple open professor positions at the technical University of Applied Sciences Würzburg-Schweinfurt in Computer Vision, Reinforcement Learning, Dynamical Systems
N/A
This project is about developing reinforcement-learning based AI systems that directly interact with some segment of society. The applications include matching and other allocation problems. The research will be performed at the interface between reinforcement learning, social choice theory, Bayesian inference, mechanism design, differential privacy and algorithmic fairness. The research will have both a theoretical and practical component, which will include some experiments with humans. However, a good theoretical background in probability, machine learning or game theory is necessary for all students. The positions are available from January 2024. The PhD lasts for 4 years and includes a small teaching component.
N/A
The project is about developing reinforcement-learning based AI systems that directly interact with some segment of society. The applications include matching and other allocation problems. The research will be performed at the interface between reinforcement learning, social choice theory, Bayesian inference, mechanism design, differential privacy and algorithmic fairness. The research will have both a theoretical and practical component, which will include some experiments with humans.
Haim Sompolinsky, Kenneth Blum
The Swartz Program at Harvard University seeks applicants for a postdoctoral fellow in theoretical and computational neuroscience. Based on a grant from the Swartz Foundation, a Swartz postdoctoral fellowship is available at Harvard University with a start date in the summer or fall of 2024. Postdocs join a vibrant group of theoretical and experimental neuroscientists plus theorists in allied fields at Harvard’s Center for Brain Science. The Center for Brain Science includes faculty doing research on a wide variety of topics, including neural mechanisms of rodent learning, decision-making, and sex-specific and social behaviors; reinforcement learning in rodents and humans; human motor control; behavioral and fMRI studies of human cognition; circuit mechanisms of learning and behavior in worms, larval flies, and larval zebrafish; circuit mechanisms of individual differences in flies and humans; rodent and fly olfaction; inhibitory circuit development; retinal circuits; and large-scale reconstruction of detailed brain circuitry.
Prof Mark Humphries
The Humphries’ lab at the University of Nottingham is seeking a postdoc to study the neural basis of foraging, in collaboration with the groups of Matthew Apps (Birmingham) and Nathan Lepora (Bristol). Whether choosing to leave one shop for another, switching TV programs, or seeking berries to eat, humans and other animals make innumerable stay-or-leave decisions, but how we make them is not well understood. The goal of this project is to develop new computational accounts of stay-or-leave decisions, and use them to test hypotheses for how humans, primates, and rodents learn and make these decisions. The work will draw on and develop new reinforcement learning and accumulation (e.g. diffusion) models of decision-making. The Humphries’ group researches fundamental insights into how the joint activity of neurons encodes actions in the world (https://www.humphries-lab.org). This post will join our developing research program into how humans and other animals learn to make the right decisions (e.g. https://doi.org/10.1101/2022.08.30.505807).
Understanding reward-guided learning using large-scale datasets
Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.
Understanding reward-guided learning using large-scale datasets
Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.
Screen Savers : Protecting adolescent mental health in a digital world
In our rapidly evolving digital world, there is increasing concern about the impact of digital technologies and social media on the mental health of young people. Policymakers and the public are nervous. Psychologists are facing mounting pressures to deliver evidence that can inform policies and practices to safeguard both young people and society at large. However, research progress is slow while technological change is accelerating.My talk will reflect on this, both as a question of psychological science and metascience. Digital companies have designed highly popular environments that differ in important ways from traditional offline spaces. By revisiting the foundations of psychology (e.g. development and cognition) and considering digital changes' impact on theories and findings, we gain deeper insights into questions such as the following. (1) How do digital environments exacerbate developmental vulnerabilities that predispose young people to mental health conditions? (2) How do digital designs interact with cognitive and learning processes, formalised through computational approaches such as reinforcement learning or Bayesian modelling?However, we also need to face deeper questions about what it means to do science about new technologies and the challenge of keeping pace with technological advancements. Therefore, I discuss the concept of ‘fast science’, where, during crises, scientists might lower their standards of evidence to come to conclusions quicker. Might psychologists want to take this approach in the face of technological change and looming concerns? The talk concludes with a discussion of such strategies for 21st-century psychology research in the era of digitalization.
Decision and Behavior
This webinar addressed computational perspectives on how animals and humans make decisions, spanning normative, descriptive, and mechanistic models. Sam Gershman (Harvard) presented a capacity-limited reinforcement learning framework in which policies are compressed under an information bottleneck constraint. This approach predicts pervasive perseveration, stimulus‐independent “default” actions, and trade-offs between complexity and reward. Such policy compression reconciles observed action stochasticity and response time patterns with an optimal balance between learning capacity and performance. Jonathan Pillow (Princeton) discussed flexible descriptive models for tracking time-varying policies in animals. He introduced dynamic Generalized Linear Models (Sidetrack) and hidden Markov models (GLM-HMMs) that capture day-to-day and trial-to-trial fluctuations in choice behavior, including abrupt switches between “engaged” and “disengaged” states. These models provide new insights into how animals’ strategies evolve under learning. Finally, Kenji Doya (OIST) highlighted the importance of unifying reinforcement learning with Bayesian inference, exploring how cortical-basal ganglia networks might implement model-based and model-free strategies. He also described Japan’s Brain/MINDS 2.0 and Digital Brain initiatives, aiming to integrate multimodal data and computational principles into cohesive “digital brains.”
Unmotivated bias
In this talk, I will explore how social affective biases arise even in the absence of motivational factors as an emergent outcome of the basic structure of social learning. In several studies, we found that initial negative interactions with some members of a group can cause subsequent avoidance of the entire group, and that this avoidance perpetuates stereotypes. Additional cognitive modeling discovered that approach and avoidance behavior based on biased beliefs not only influences the evaluative (positive or negative) impressions of group members, but also shapes the depth of the cognitive representations available to learn about individuals. In other words, people have richer cognitive representations of members of groups that are not avoided, akin to individualized vs group level categories. I will end presenting a series of multi-agent reinforcement learning simulations that demonstrate the emergence of these social-structural feedback loops in the development and maintenance of affective biases.
Contribution of computational models of reinforcement learning to neurosciences/ computational modeling, reward, learning, decision-making, conditioning, navigation, dopamine, basal ganglia, prefrontal cortex, hippocampus
Maintaining Plasticity in Neural Networks
Nonstationarity presents a variety of challenges for machine learning systems. One surprising pathology which can arise in nonstationary learning problems is plasticity loss, whereby making progress on new learning objectives becomes more difficult as training progresses. Networks which are unable to adapt in response to changes in their environment experience plateaus or even declines in performance in highly non-stationary domains such as reinforcement learning, where the learner must quickly adapt to new information even after hundreds of millions of optimization steps. The loss of plasticity manifests in a cluster of related empirical phenomena which have been identified by a number of recent works, including the primacy bias, implicit under-parameterization, rank collapse, and capacity loss. While this phenomenon is widely observed, it is still not fully understood. This talk will present exciting recent results which shed light on the mechanisms driving the loss of plasticity in a variety of learning problems and survey methods to maintain network plasticity in non-stationary tasks, with a particular focus on deep reinforcement learning.
A recurrent network model of planning predicts hippocampal replay and human behavior
When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as `rollouts'. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by -- and in turn adaptively affect -- prefrontal dynamics.
Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.
A recurrent network model of planning explains hippocampal replay and human behavior
When interacting with complex environments, humans can rapidly adapt their behavior to changes in task or context. To facilitate this adaptation, we often spend substantial periods of time contemplating possible futures before acting. For such planning to be rational, the benefits of planning to future behavior must at least compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where not only actions, but also planning, are controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences drawn from its own policy, which we refer to as 'rollouts'. Our results demonstrate that this agent learns to plan when planning is beneficial, explaining the empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded in a spatial navigation task, in terms of both their spatial statistics and their relationship to subsequent behavior. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by - and in turn adaptively affect - prefrontal dynamics.
Richly structured reward predictions in dopaminergic learning circuits
Theories from reinforcement learning have been highly influential for interpreting neural activity in the biological circuits critical for animal and human learning. Central among these is the identification of phasic activity in dopamine neurons as a reward prediction error signal that drives learning in basal ganglia and prefrontal circuits. However, recent findings suggest that dopaminergic prediction error signals have access to complex, structured reward predictions and are sensitive to more properties of outcomes than learning theories with simple scalar value predictions might suggest. Here, I will present recent work in which we probed the identity-specific structure of reward prediction errors in an odor-guided choice task and found evidence for multiple predictive “threads” that segregate reward predictions, and reward prediction errors, according to the specific sensory features of anticipated outcomes. Our results point to an expanded class of neural reinforcement learning algorithms in which biological agents learn rich associative structure from their environment and leverage it to build reward predictions that include information about the specific, and perhaps idiosyncratic, features of available outcomes, using these to guide behavior in even quite simple reward learning tasks.
Off-policy learning in the basal ganglia
I will discuss work with Jack Lindsey modeling reinforcement learning for action selection in the basal ganglia. I will argue that the presence of multiple brain regions, in addition to the basal ganglia, that contribute to motor control motivates the need for an off-policy basal ganglia learning algorithm. I will then describe a biological implementation of such an algorithm that predicts tuning of dopamine neurons to a quantity we call "action surprise," in addition to reward prediction error. In the same model, an implementation of learning from a motor efference copy also predicts a novel solution to the problem of multiplexing feedforward and efference-related striatal activity. The solution exploits the difference between D1 and D2-expressing medium spiny neurons and leads to predictions about striatal dynamics.
Beyond Volition
Voluntary actions are actions that agents choose to make. Volition is the set of cognitive processes that implement such choice and initiation. These processes are often held essential to modern societies, because they form the cognitive underpinning for concepts of individual autonomy and individual responsibility. Nevertheless, psychology and neuroscience have struggled to define volition, and have also struggled to study it scientifically. Laboratory experiments on volition, such as those of Libet, have been criticised, often rather naively, as focussing exclusively on meaningless actions, and ignoring the factors that make voluntary action important in the wider world. In this talk, I will first review these criticisms, and then look at extending scientific approaches to volition in three directions that may enrich scientific understanding of volition. First, volition becomes particularly important when the range of possible actions is large and unconstrained - yet most experimental paradigms involve minimal response spaces. We have developed a novel paradigm for eliciting de novo actions through verbal fluency, and used this to estimate the elusive conscious experience of generativity. Second, volition can be viewed as a mechanism for flexibility, by promoting adaptation of behavioural biases. This view departs from the tradition of defining volition by contrasting internally-generated actions with externally-triggered actions, and instead links volition to model-based reinforcement learning. By using the context of competitive games to re-operationalise the classic Libet experiment, we identified a form of adaptive autonomy that allows agents to reduce biases in their action choices. Interestingly, this mechanism seems not to require explicit understanding and strategic use of action selection rules, in contrast to classical ideas about the relation between volition and conscious, rational thought. Third, I will consider volition teleologically, as a mechanism for achieving counterfactual goals through complex problem-solving. This perspective gives a key role in mediating between understanding and planning on the one hand, and instrumental action on the other hand. Taken together, these three cognitive phenomena of generativity, flexibility, and teleology may partly explain why volition is such an important cognitive function for organisation of human behaviour and human flourishing. I will end by discussing how this enriched view of volition can relate to individual autonomy and responsibility.
Mapping learning and decision-making algorithms onto brain circuitry
In the first half of my talk, I will discuss our recent work on the midbrain dopamine system. The hypothesis that midbrain dopamine neurons broadcast an error signal for the prediction of reward is among the great successes of computational neuroscience. However, our recent results contradict a core aspect of this theory: that the neurons uniformly convey a scalar, global signal. I will review this work, as well as our new efforts to update models of the neural basis of reinforcement learning with our data. In the second half of my talk, I will discuss our recent findings of state-dependent decision-making mechanisms in the striatum.
Memory-enriched computation and learning in spiking neural networks through Hebbian plasticity
Memory is a key component of biological neural systems that enables the retention of information over a huge range of temporal scales, ranging from hundreds of milliseconds up to years. While Hebbian plasticity is believed to play a pivotal role in biological memory, it has so far been analyzed mostly in the context of pattern completion and unsupervised learning. Here, we propose that Hebbian plasticity is fundamental for computations in biological neural systems. We introduce a novel spiking neural network (SNN) architecture that is enriched by Hebbian synaptic plasticity. We experimentally show that our memory-equipped SNN model outperforms state-of-the-art deep learning mechanisms in a sequential pattern-memorization task, as well as demonstrate superior out-of-distribution generalization capabilities compared to these models. We further show that our model can be successfully applied to one-shot learning and classification of handwritten characters, improving over the state-of-the-art SNN model. We also demonstrate the capability of our model to learn associations for audio to image synthesis from spoken and handwritten digits. Our SNN model further presents a novel solution to a variety of cognitive question answering tasks from a standard benchmark, achieving comparable performance to both memory-augmented ANN and SNN-based state-of-the-art solutions to this problem. Finally we demonstrate that our model is able to learn from rewards on an episodic reinforcement learning task and attain near-optimal strategy on a memory-based card game. Hence, our results show that Hebbian enrichment renders spiking neural networks surprisingly versatile in terms of their computational as well as learning capabilities. Since local Hebbian plasticity can easily be implemented in neuromorphic hardware, this also suggests that powerful cognitive neuromorphic systems can be build based on this principle.
Learning Relational Rules from Rewards
Humans perceive the world in terms of objects and relations between them. In fact, for any given pair of objects, there is a myriad of relations that apply to them. How does the cognitive system learn which relations are useful to characterize the task at hand? And how can it use these representations to build a relational policy to interact effectively with the environment? In this paper we propose that this problem can be understood through the lens of a sub-field of symbolic machine learning called relational reinforcement learning (RRL). To demonstrate the potential of our approach, we build a simple model of relational policy learning based on a function approximator developed in RRL. We trained and tested our model in three Atari games that required to consider an increasingly number of potential relations: Breakout, Pong and Demon Attack. In each game, our model was able to select adequate relational representations and build a relational policy incrementally. We discuss the relationship between our model with models of relational and analogical reasoning, as well as its limitations and future directions of research.
Learning in/about/from the basal ganglia
The basal ganglia are a collection of brain areas that are connected by a variety of synaptic pathways and are a site of significant reward-related dopamine release. These properties suggest a possible role for the basal ganglia in action selection, guided by reinforcement learning. In this talk, I will discuss a framework for how this function might be performed and computational results using an upward mapping to identify putative low-dimensional control ensembles that may be involved in tuning decision policy. I will also present some recent experimental results and theory – related to effects of extracellular ion dynamics -- that run counter to the classical view of basal ganglia pathways and suggest a new interpretation of certain aspects of this framework. For those not so interested in the basal ganglia, I hope that the upward mapping approach and impact of extracellular ion dynamics will nonetheless be of interest!
Dissecting the role of accumbal D1 and D2 medium spiny neurons in information encoding
Nearly all motivated behaviors require the ability to associate outcomes with specific actions and make adaptive decisions about future behavior. The nucleus accumbens (NAc) is integrally involved in these processes. The NAc is a heterogeneous population primarily composed of D1 and D2 medium spiny projection (MSN) neurons that are thought to have opposed roles in behavior, with D1 MSNs promoting reward and D2 MSNs promoting aversion. Here we examined what types of information are encoded by the D1 and D2 MSNs using optogenetics, fiber photometry, and cellular resolution calcium imaging. First, we showed that mice responded for optical self-stimulation of both cell types, suggesting D2-MSN activation is not inherently aversive. Next, we recorded population and single cell activity patterns of D1 and D2 MSNs during reinforcement as well as Pavlovian learning paradigms that allow dissociation of stimulus value, outcome, cue learning, and action. We demonstrated that D1 MSNs respond to the presence and intensity of unconditioned stimuli – regardless of value. Conversely, D2 MSNs responded to the prediction of these outcomes during specific cues. Overall, these results provide foundational evidence for the discrete aspects of information that are encoded within the NAc D1 and D2 MSN populations. These results will significantly enhance our understanding of the involvement of the NAc MSNs in learning and memory as well as how these neurons contribute to the development and maintenance of substance use disorders.
NaV Long-term Inactivation Regulates Adaptation in Place Cells and Depolarization Block in Dopamine Neurons
In behaving rodents, CA1 pyramidal neurons receive spatially-tuned depolarizing synaptic input while traversing a specific location within an environment called its place. Midbrain dopamine neurons participate in reinforcement learning, and bursts of action potentials riding a depolarizing wave of synaptic input signal rewards and reward expectation. Interestingly, slice electrophysiology in vitro shows that both types of cells exhibit a pronounced reduction in firing rate (adaptation) and even cessation of firing during sustained depolarization. We included a five state Markov model of NaV1.6 (for CA1) and NaV1.2 (for dopamine neurons) respectively, in computational models of these two types of neurons. Our simulations suggest that long-term inactivation of this channel is responsible for the adaptation in CA1 pyramidal neurons, in response to triangular depolarizing current ramps. We also show that the differential contribution of slow inactivation in two subpopulations of midbrain dopamine neurons can account for their different dynamic ranges, as assessed by their responses to similar depolarizing ramps. These results suggest long-term inactivation of the sodium channel is a general mechanism for adaptation.
Input and target-selective plasticity in sensory neocortex during learning
Behavioral experience shapes neural circuits, adding and subtracting connections between neurons that will ultimately control sensation and perception. We are using natural sensory experience to uncover basic principles of information processing in the cerebral cortex, with a focus on how sensory learning can selectively alter synaptic strength. I will discuss recent findings that differentiate reinforcement learning from sensory experience, showing rapid and selective plasticity of thalamic and inhibitory synapses within primary sensory cortex.
Why would we need Cognitive Science to develop better Collaborative Robots and AI Systems?
While classical industrial robots are mostly designed for repetitive tasks, assistive robots will be challenged by a variety of different tasks in close contact with humans. Hereby, learning through the direct interaction with humans provides a potentially powerful tool for an assistive robot to acquire new skills and to incorporate prior human knowledge during the exploration of novel tasks. Moreover, an intuitive interactive teaching process may allow non-programming experts to contribute to robotic skill learning and may help to increase acceptance of robotic systems in shared workspaces and everyday life. In this talk, I will discuss recent research I did on interactive robot skill learning and the remaining challenges on the route to human-centered teaching of assistive robots. In particular, I will also discuss potential connections and overlap with cognitive science. The presented work covers learning a library of probabilistic movement primitives from human demonstrations, intention aware adaptation of learned skills in shared workspaces, and multi-channel interactive reinforcement learning for sequential tasks.
Mice identify subgoals locations through an action-driven mapping process
Mammals instinctively explore and form mental maps of their spatial environments. Models of cognitive mapping in neuroscience mostly depict map-learning as a process of random or biased diffusion. In practice, however, animals explore spaces using structured, purposeful, sensory-guided actions. We have used threat-evoked escape behavior in mice to probe the relationship between ethological exploratory behavior and abstract spatial cognition. First, we show that in arenas with obstacles and a shelter, mice spontaneously learn efficient multi-step escape routes by memorizing allocentric subgoal locations. Using closed-loop neural manipulations to interrupt running movements during exploration, we next found that blocking runs targeting an obstacle edge abolished subgoal learning. We conclude that mice use an action-driven learning process to identify subgoals, and these subgoals are then integrated into an allocentric map-like representation. We suggest a conceptual framework for spatial learning that is compatible with the successor representation from reinforcement learning and sensorimotor enactivism from cognitive science.
NMC4 Short Talk: What can deep reinforcement learning tell us about human motor learning and vice-versa ?
In the deep reinforcement learning (RL) community, motor control problems are usually approached from a reward-based learning perspective. However, humans are often believed to learn motor control through directed error-based learning. Within this learning setting, the control system is assumed to have access to exact error signals and their gradients with respect to the control signal. This is unlike reward-based learning, in which errors are assumed to be unsigned, encoding relative successes and failures. Here, we try to understand the relation between these two approaches, reward- and error- based learning, and ballistic arm reaches. To do so, we test canonical (deep) RL algorithms on a well-known sensorimotor perturbation in neuroscience: mirror-reversal of visual feedback during arm reaching. This test leads us to propose a potentially novel RL algorithm, denoted as model-based deterministic policy gradient (MB-DPG). This RL algorithm draws inspiration from error-based learning to qualitatively reproduce human reaching performance under mirror-reversal. Next, we show MB-DPG outperforms the other canonical (deep) RL algorithms on a single- and a multi- target ballistic reaching task, based on a biomechanical model of the human arm. Finally, we propose MB-DPG may provide an efficient computational framework to help explain error-based learning in neuroscience.
Reinforcement Learning
Playing StarCraft and saving the world using multi-agent reinforcement learning!
This is my C-14 Impaler gauss rifle! There are many like it, but this one is mine!" - A terran marine If you have never heard of a terran marine before, then you have probably missed out on playing the very engaging and entertaining strategy computer game, StarCraft. However, don’t despair, because what we have in store might be even more exciting! In this interactive session, we will take you through, step-by-step, on how to train a team of terran marines to defeat a team of marines controlled by the built-in game AI in StarCraft II. How will we achieve this? Using multi-agent reinforcement learning (MARL). MARL is a useful framework for building distributed intelligent systems. In MARL, multiple agents are trained to act as individual decision-makers of some larger system, while learning to work as a team. We will show you how to use Mava (https://github.com/instadeepai/Mava), a newly released research framework for MARL to build a multi-agent learning system for StarCraft II. We will provide the necessary guidance, tools and background to understand the key concepts behind MARL, how to use Mava building blocks to build systems and how to train a system from scratch. We will conclude the session by briefly sharing various exciting real-world application areas for MARL at InstaDeep, such as large-scale autonomous train navigation and circuit board routing. These are problems that become exponentially more difficult to solve as they scale. Finally, we will argue that many of humanity’s most important practical problems are reminiscent of the ones just described. These include, for example, the need for sustainable management of distributed resources under the pressures of climate change, or efficient inventory control and supply routing in critical distribution networks, or robotic teams for rescue missions and exploration. We believe MARL has enormous potential to be applied in these areas and we hope to inspire you to get excited and interested in MARL and perhaps one day contribute to the field!
Network dynamics in the basal ganglia and possible implications for Parkinson’s disease
The basal ganglia are a collection of brain areas that are connected by a variety of synaptic pathways and are a site of significant reward-related dopamine release. These properties suggest a possible role for the basal ganglia in action selection, guided by reinforcement learning. In this talk, I will discuss a framework for how this function might be performed. I will also present some recent experimental results and theory that call for a re-evaluation of certain aspects of this framework. Next, I will turn to the changes in basal ganglia activity observed to occur with the dopamine depletion associated with Parkinson’s disease. I will discuss some of the potential functional implications of some of these changes and, if time permits, will conclude with some new results that focus on delta oscillations under dopamine depletion.
Tuts, a Talk and AGI !!
A panel discussion on "What might we still require to achieve AGI?", a set of Reinforcement Learning and Computer Vision domain tuts and a talk from George Konidaris
Higher cognitive resources for efficient learning
A central issue in reinforcement learning (RL) is the ‘curse-of-dimensionality’, arising when the degrees-of-freedom are much larger than the number of training samples. In such circumstances, the learning process becomes too slow to be plausible. In the brain, higher cognitive functions (such as abstraction or metacognition) may be part of the solution by generating low dimensional representations on which RL can operate. In this talk I will discuss a series of studies in which we used functional magnetic resonance imaging (fMRI) and computational modeling to investigate the neuro-computational basis of efficient RL. We found that people can learn remarkably complex task structures non-consciously, but also that - intriguingly - metacognition appears tightly coupled to this learning ability. Furthermore, when people use an explicit (conscious) policy to select relevant information, learning is accelerated by abstractions. At the neural level, prefrontal cortex subregions are differentially involved in separate aspects of learning: dorsolateral prefrontal cortex pairs with metacognitive processes, while ventromedial prefrontal cortex with valuation and abstraction. I will discuss the implications of these findings, in particular new questions on the function of metacognition in adaptive behavior and the link with abstraction.
Transforming task representations
Humans can adapt to a novel task on our first try. By contrast, artificial intelligence systems often require immense amounts of data to adapt. In this talk, I will discuss my recent work (https://www.pnas.org/content/117/52/32970) on creating deep learning systems that can adapt on their first try by exploiting relationships between tasks. Specifically, the approach is based on transforming a representation for a known task to produce a representation for the novel task, by inferring and then using a higher order function that captures a relationship between the tasks. This approach can be interpreted as a type of analogical reasoning. I will show that task transformation can allow systems to adapt to novel tasks on their first try in domains ranging from card games, to mathematical objects, to image classification and reinforcement learning. I will discuss the analogical interpretation of this approach, an analogy between levels of abstraction within the model architecture that I refer to as homoiconicity, and what this work might suggest about using deep-learning models to infer analogies more generally.
On cognitive maps and reinforcement learning in large-scale animal behaviour
Bats are extreme aviators and amazing navigators. Many bat species nightly commute dozens of kilometres in search of food, and some bat species annually migrate over thousands of kilometres. Studying bats in their natural environment has always been extremely challenging because of their small size (mostly <50 gr) and agile nature. We have recently developed novel miniature technology allowing us to GPS-tag small bats, thus opening a new window to document their behaviour in the wild. We have used this technology to track fruit-bats pups over 5 months from birth to adulthood. Following the bats’ full movement history allowed us to show that they use novel short-cuts which are typical for cognitive-map based navigation. In a second study, we examined how nectar-feeding bats make foraging decisions under competition. We show that by relying on a simple reinforcement learning strategy, the bats can divide the resource between them without aggression or communication. Together, these results demonstrate the power of the large scale natural approach for studying animal behavior.
From function to cognition: New spectroscopic tools for studying brain neurochemistry in-vivo
In this seminar, I will present new methods in magnetic resonance spectroscopy (MRS) we’ve been working on in the lab. The talk will be divided into two parts. In the first, I will talk about neurochemical changes we observe in glutamate and GABA during various paradigms, including simple motors tasks and reinforcement learning. In the second part, I’ll present a new approach to MRS that focuses on measuring the relaxation times (T1, T2) of metabolites, which reflect changes to specific cellular microenvironments. I will explain why these can be exciting markers for studying several in-vivo pathologies, and also present some preliminary data from a cohort of mild cognitive impairment (MCI) patients, showing changes that correlate to cognitive decline.
Learning in pain: probabilistic inference and (mal)adaptive control
Pain is a major clinical problem affecting 1 in 5 people in the world. There are unresolved questions that urgently require answers to treat pain effectively, a crucial one being how the feeling of pain arises from brain activity. Computational models of pain consider how the brain processes noxious information and allow mapping neural circuits and networks to cognition and behaviour. To date, they have generally have assumed two largely independent processes: perceptual and/or predictive inference, typically modelled as an approximate Bayesian process, and action control, typically modelled as a reinforcement learning process. However, inference and control are intertwined in complex ways, challenging the clarity of this distinction. I will discuss how they may comprise a parallel hierarchical architecture that combines pain inference, information-seeking, and adaptive value-based control. Finally, I will discuss whether and how these learning processes might contribute to chronic pain.
Mental Simulation, Imagination, and Model-Based Deep RL
Mental simulation—the capacity to imagine what will or what could be—is a salient feature of human cognition, playing a key role in a wide range of cognitive abilities. In artificial intelligence, the last few years have seen the development of methods which are analogous to mental models and mental simulation. In this talk, I will discuss recent methods in deep learning for constructing such models from data and learning to use them via reinforcement learning, and compare such approaches to human mental simulation. While a number of challenges remain in matching the capacity of human mental simulation, I will highlight some recent progress on developing more compositional and efficient model-based algorithms through the use of graph neural networks and tree search.
Choice engineering and the modeling of operant learning
Organisms modify their behavior in response to its consequences, a phenomenon referred to as operant learning. Contemporary modeling of this learning behavior is based on reinforcement learning algorithms. I will discuss some of the challenges that these models face, and proposed a new approach to model-selection that is based on testing their ability to engineer behavior. Finally, I will present the results of The Choice Engineering Competition – an academic competition that compared the efficacies of qualitative and quantitative models of operant learning in shaping behavior.
Peril, Prudence and Planning as Risk, Avoidance and Worry
Risk occupies a central role in both the theory and practice of decision-making. Although it is deeply implicated in many conditions involving dysfunctional behavior and thought, modern theoretical approaches to understanding and mitigating risk in either one-shot or sequential settings, which are derived largely from finance and economics, have yet to permeate fully the fields of neural reinforcement learning and computational psychiatry. I will discuss the use of dynamic and static versions of one prominent approach, namely conditional value-at-risk, to examine both the nature of risk avoidant choices, encompassing such things as justified gambler's fallacies, and the optimal planning that can lead to consideration of such choices, with implications for offline, ruminative, thinking.
Navigation Turing Test: Toward Human-like RL
tbc
A machine learning way to analyse white matter tractography streamlines / Application of artificial intelligence in correcting motion artifacts and reducing scan time in MRI
1. Embedding is all you need: A machine learning way to analyse white matter tractography streamlines - Dr Shenjun Zhong, Monash Biomedical Imaging Embedding white matter streamlines with various lengths into fixed-length latent vectors enables users to analyse them with general data mining techniques. However, finding a good embedding schema is still a challenging task as the existing methods based on spatial coordinates rely on manually engineered features, and/or labelled dataset. In this webinar, Dr Shenjun Zhong will discuss his novel deep learning model that identifies latent space and solves the problem of streamline clustering without needing labelled data. Dr Zhong is a Research Fellow and Informatics Officer at Monash Biomedical Imaging. His research interests are sequence modelling, reinforcement learning and federated learning in the general medical imaging domain. 2. Application of artificial intelligence in correcting motion artifacts and reducing scan time in MRI - Dr Kamlesh Pawar, Monash Biomedical imaging Magnetic Resonance Imaging (MRI) is a widely used imaging modality in clinics and research. Although MRI is useful it comes with an overhead of longer scan time compared to other medical imaging modalities. The longer scan times also make patients uncomfortable and even subtle movements during the scan may result in severe motion artifact in the images. In this seminar, Dr Kamlesh Pawar will discuss how artificial intelligence techniques can reduce scan time and correct motion artifacts. Dr Pawar is a Research Fellow at Monash Biomedical Imaging. His research interest includes deep learning, MR physics, MR image reconstruction and computer vision.
Uncertainty in learning and decision making
Uncertainty plays a critical role in reinforcement learning and decision making. However, exactly how subjective uncertainty influences behaviour remains unclear. Multi-armed bandits are a useful framework to gain more insight into this. Paired with computational tools such as Kalman filters, they allow us to closely characterize the interplay between trial-by-trial value, uncertainty, learning, and choice. In this talk, I will present recent research where we also measured participants visual fixations on the options in a multi-armed bandit task. The estimated value of each option, and the uncertainty in these estimations, influenced what subjects looked at in the period before making a choice and their subsequent choice, as additionally did fixation itself. Uncertainty also determined how long participants looked at the obtained outcomes. Our findings clearly show the importance of uncertainty in learning and decision making.
An inference perspective on meta-learning
While meta-learning algorithms are often viewed as algorithms that learn to learn, an alternative viewpoint frames meta-learning as inferring a hidden task variable from experience consisting of observations and rewards. From this perspective, learning to learn is learning to infer. This viewpoint can be useful in solving problems in meta-RL, which I’ll demonstrate through two examples: (1) enabling off-policy meta-learning, and (2) performing efficient meta-RL from image observations. I’ll also discuss how this perspective leads to an algorithm for few-shot image segmentation.
On cognitive maps and reinforcement learning in large-scale animal behaviour
Bats are extreme aviators and amazing navigators. Many bat species nightly com-mute dozens of kilometres in search of food, and some bat species annually migrate over thousands of kilometres. Studying bats in their natural environment has al-ways been extremely challenging because of their small size (mostly <50 gr) and agile nature. We have recently developed novel miniature technology allowing us to GPS-tag small bats, thus opening a new window to document their behaviour in the wild. We have used this technology to track fruit-bats pups over 5 months from birth to adulthood. Following the bats’ full movement history allowed us to show that they use novel short-cuts which are typical for cognitive-map based naviga-tion. In a second study, we examined how nectar-feeding bats make foraging deci-sions under competition. We show that by relying on a simple reinforcement learn-ing strategy, the bats can divide the resource between them without aggression or communication. Together, these results demonstrate the power of the large scale natural approach for studying animal behavior.
On climate change, multi-agent systems and the behaviour of networked control
Multi-agent reinforcement learning (MARL) has recently shown great promise as an approach to networked system control. Arguably, one of the most difficult and important tasks for which large scale networked system control is applicable is common-pool resource (CPR) management. Crucial CPRs include arable land, fresh water, wetlands, wildlife, fish stock, forests and the atmosphere, of which proper management is related to some of society’s greatest challenges such as food security, inequality and climate change. This talk will consist of three parts. In the first, we will briefly look at climate change and how it poses a significant threat to life on our planet. In the second, we will consider the potential of multi-agent systems for climate change mitigation and adaptation. And finally, in the third, we will discuss recent research from InstaDeep into better understanding the behaviour of networked MARL systems used for CPR management. More specifically, we will see how the tools from empirical game-theoretic analysis may be harnessed to analyse the differences in networked MARL systems. The results give new insights into the consequences associated with certain design choices and provide an additional dimension of comparison between systems beyond efficiency, robustness, scalability and mean control performance.
A journey through connectomics: from manual tracing to the first fully automated basal ganglia connectomes
The "mind of the worm", the first electron microscopy-based connectome of C. elegans, was an early sign of where connectomics is headed, followed by a long time of little progress in a field held back by the immense manual effort required for data acquisition and analysis. This changed over the last few years with several technological breakthroughs, which allowed increases in data set sizes by several orders of magnitude. Brain tissue can now be imaged in 3D up to a millimeter in size at nanometer resolution, revealing tissue features from synapses to the mitochondria of all contained cells. These breakthroughs in acquisition technology were paralleled by a revolution in deep-learning segmentation techniques, that equally reduced manual analysis times by several orders of magnitude, to the point where fully automated reconstructions are becoming useful. Taken together, this gives neuroscientists now access to the first wiring diagrams of thousands of automatically reconstructed neurons connected by millions of synapses, just one line of program code away. In this talk, I will cover these developments by describing the past few years' technological breakthroughs and discuss remaining challenges. Finally, I will show the potential of automated connectomics for neuroscience by demonstrating how hypotheses in reinforcement learning can now be tackled through virtual experiments in synaptic wiring diagrams of the songbird basal ganglia.
The geometry of abstraction in hippocampus and pre-frontal cortex
The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. Here we characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.
E-prop: A biologically inspired paradigm for learning in recurrent networks of spiking neurons
Transformative advances in deep learning, such as deep reinforcement learning, usually rely on gradient-based learning methods such as backpropagation through time (BPTT) as a core learning algorithm. However, BPTT is not argued to be biologically plausible, since it requires to a propagate gradients backwards in time and across neurons. Here, we propose e-prop, a novel gradient-based learning method with local and online weight update rules for recurrent neural networks, and in particular recurrent spiking neural networks (RSNNs). As a result, e-prop has the potential to provide a substantial fraction of the power of deep learning to RSNNs. In this presentation, we will motivate e-prop from the perspective of recent insights in neuroscience and show how these have to be combined to form an algorithm for online gradient descent. The mathematical results will be supported by empirical evidence in supervised and reinforcement learning tasks. We will also discuss how limitations that are inherited from gradient-based learning methods, such as sample-efficiency, can be addressed by considering an evolution-like optimization that enhances learning on particular task families. The emerging learning architecture can be used to learn tasks by a single demonstration, hence enabling one-shot learning.
Working memory transforms goals into rewards
Humans continuously need to learn to make good choices – be it using a new video-conferencing set up, figuring out what questions to ask to successfully secure a reliable babysitter, or just selecting which location in a house is least likely to be interrupted by toddlers during work calls. However, the goals we seek to attain – such as using zoom successfully – are often vaguely defined and previously unexperienced, and in that sense cannot be known by us as being rewarding. We hypothesized that learning to make good choices in such situations nevertheless leverages reinforcement learning processes, and that executive functions in general, and working memory in particular, play a crucial role in defining the reward function for arbitrary outcomes in such a way that they become reinforcing. I will show results from a novel behavioral protocol, as well as preliminary computational and imaging evidence supporting our hypothesis.
Reward foraging task, and model-based analysis reveal how fruit flies learn the value of available options
Understanding what drives foraging decisions in animals requires careful manipulation of the value of available options while monitoring animal choices. Value-based decision-making tasks, in combination with formal learning models, have provided both an experimental and theoretical framework to study foraging decisions in lab settings. While these approaches were successfully used in the past to understand what drives choices in mammals, very little work has been done on fruit flies. This is even though fruit flies have served as a model organism for many complex behavioural paradigms. To fill this gap we developed a single-animal, trial-based decision-making task, where freely walking flies experienced optogenetic sugar-receptor neuron stimulation. We controlled the value of available options by manipulating the probabilities of optogenetic stimulation. We show that flies integrate a reward history of chosen options and forget value of unchosen options. We further discover that flies assign higher values to rewards experienced early in the behavioural session, consistent with formal reinforcement learning models. Finally, we show that the probabilistic rewards affect walking trajectories of flies, suggesting that accumulated value is controlling the navigation vector of flies in a graded fashion. These findings establish the fruit fly as a model organism to explore the genetic and circuit basis of value-based decisions.
Delineating Reward/Avoidance Decision Process in the Impulsive-compulsive Spectrum Disorders through a Probabilistic Reversal Learning Task
Impulsivity and compulsivity are behavioural traits that underlie many aspects of decision-making and form the characteristic symptoms of Obsessive Compulsive Disorder (OCD) and Gambling Disorder (GD). The neural underpinnings of aspects of reward and avoidance learning under the expression of these traits and symptoms are only partially understood. " "The present study combined behavioural modelling and neuroimaging technique to examine brain activity associated with critical phases of reward and loss processing in OCD and GD. " "Forty-two healthy controls (HC), forty OCD and twenty-three GD participants were recruited in our study to complete a two-session reinforcement learning (RL) task featuring a “probability switch (PS)” with imaging scanning. Finally, 39 HC (20F/19M, 34 yrs +/- 9.47), 28 OCD (14F/14M, 32.11 yrs ±9.53) and 16 GD (4F/12M, 35.53yrs ± 12.20) were included with both behavioural and imaging data available. The functional imaging was conducted by using 3.0-T SIEMENS MAGNETOM Skyra syngo MR D13C at Monash Biomedical Imaging. Each volume compromised 34 coronal slices of 3 mm thickness with 2000 ms TR and 30 ms TE. A total of 479 volumes were acquired for each participant in each session in an interleaved-ascending manner. " " The standard Q-learning model was fitted to the observed behavioural data and the Bayesian model was used for the parameter estimation. Imaging analysis was conducted using SPM12 (Welcome Department of Imaging Neuroscience, London, United Kingdom) in the Matlab (R2015b) environment. The pre-processing commenced with the slice timing, realignment, normalization to MNI space according to T1-weighted image and smoothing with a 8 mm Gaussian kernel. " " The frontostriatal brain circuit including the putamen and medial orbitofrontal (mOFC) were significantly more active in response to receiving reward and avoiding punishment compared to receiving an aversive outcome and missing reward at 0.001 with FWE correction at cluster level; While the right insula showed greater activation in response to missing rewards and receiving punishment. Compared to healthy participants, GD patients showed significantly lower activation in the left superior frontal and posterior cingulum at 0.001 for the gain omission. " " The reward prediction error (PE) signal was found positively correlated with the activation at several clusters expanding across cortical and subcortical region including the striatum, cingulate, bilateral insula, thalamus and superior frontal at 0.001 with FWE correction at cluster level. The GD patients showed a trend of decreased reward PE response in the right precentral extending to left posterior cingulate compared to controls at 0.05 with FWE correction. " " The aversive PE signal was negatively correlated with brain activity in regions including bilateral thalamus, hippocampus, insula and striatum at 0.001 with FWE correction. Compared with the control group, GD group showed an increased aversive PE activation in the cluster encompassing right thalamus and right hippocampus, and also the right middle frontal extending to the right anterior cingulum at 0.005 with FWE correction. " " Through the reversal learning task, the study provided a further support of the dissociable brain circuits for distinct phases of reward and avoidance learning. Also, the OCD and GD is characterised by aberrant patterns of reward and avoidance processing.
Deep reinforcement learning and its neuroscientific implications
The last few years have seen some dramatic developments in artificial intelligence research. What implications might these have for neuroscience? Investigations of this question have, to date, focused largely on deep neural networks trained using supervised learning, in tasks such as image classification. However, there is another area of recent AI work which has so far received less attention from neuroscientists, but which may have more profound neuroscientific implications: Deep reinforcement learning. Deep RL offers a rich framework for studying the interplay among learning, representation and decision-making, offering to the brain sciences a new set of research tools and a wide range of novel hypotheses. I’ll provide a high level introduction to deep RL, discuss some recent neuroscience-oriented investigations from my group at DeepMind, and survey some wider implications for research on brain and behavior.
Thinking Fast and Slow in AlphaZero and the Brain
In his bestseller 'Thinking, Fast and Slow', Daniel Kahneman popularized the idea that there are two fundamentally different process of thought: a 'System 1' process that is unconscious and instinctive, and a 'System 2' process that is deliberative and requires conscious attention. There is a growing recognition that machine learning is mostly stuck at the 'System 1' level of cognition, and that moving to 'System 2' methods are key to solving long-standing challenges such as out-of-distribution generalization. In this talk, AlphaZero will be used as a case-study of the power of combining 'System 1' and 'System 2' processes. The similarities and differences between AlphaZero and human learning will be explored, along with drawing lessons for the future of machine learning.
Deep learning for model-based RL
Model-based approaches to control and decision making have long held the promise of being more powerful and data efficient than model-free counterparts. However, success with model-based methods has been limited to those cases where a perfect model can be queried. The game of Go was mastered by AlphaGo using a combination of neural networks and the MCTS planning algorithm. But planning required a perfect representation of the game rules. I will describe new algorithms that instead leverage deep neural networks to learn models of the environment which are then used to plan, and update policy and value functions. These new algorithms offer hints about how brains might approach planning and acting in complex environments.
Striatal circuits for reward learning and decision-making
How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens (NAc), which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex (PL) and midline regions of the thalamus (mTH). However, little is known about what is represented in PL or mTH neurons that project to NAc (PL-NAc and mTH-NAc). By comparing these inputs during a reinforcement learning task in mice, we discovered that i) PL-NAc preferentially represents actions and choices, ii) mTH-NAc preferentially represents cues, iii) choice-selective activity in PL-NAc is organized in sequences that persist beyond the outcome. Through computational modelling, we demonstrate that these sequences can support the neural implementation of temporal difference learning, a powerful algorithm to connect actions and outcomes across time. Finally, we test and confirm predictions of our circuit model by direct manipulation of PL-NAc neurons. Thus, we integrate experiment and modelling to suggest a neural solution for credit assignment.
The geometry of abstraction in artificial and biological neural networks
The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. We characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.
Spanning the arc between optimality theories and data
Ideas about optimization are at the core of how we approach biological complexity. Quantitative predictions about biological systems have been successfully derived from first principles in the context of efficient coding, metabolic and transport networks, evolution, reinforcement learning, and decision making, by postulating that a system has evolved to optimize some utility function under biophysical constraints. Yet as normative theories become increasingly high-dimensional and optimal solutions stop being unique, it gets progressively hard to judge whether theoretical predictions are consistent with, or "close to", data. I will illustrate these issues using efficient coding applied to simple neuronal models as well as to a complex and realistic biochemical reaction network. As a solution, we developed a statistical framework which smoothly interpolates between ab initio optimality predictions and Bayesian parameter inference from data, while also permitting statistically rigorous tests of optimality hypotheses.
Adaptive brain-computer interfaces based on error-related potentials and reinforcement learning
Bernstein Conference 2024
How Do Bees See the World? A (Normative) Deep Reinforcement Learning Model for Insect Navigation
Bernstein Conference 2024
Competition and integration of sensory signals in a deep reinforcement learning agent
Bernstein Conference 2024
Controversial Opinions on Model Based and Model Free Reinforcement Learning in the Brain
Bernstein Conference 2024
Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron
Bernstein Conference 2024
Neuromodulated online cognitive maps for reinforcement learning
Bernstein Conference 2024
Automatic Task Decomposition using Compositional Reinforcement Learning
COSYNE 2022
Continual Reinforcement Learning with Multi-Timescale Successor Features
COSYNE 2022
Deep Reinforcement Learning mimics Neural Strategies for Limb Movements
COSYNE 2022
Energy efficient reinforcement learning as a matter of life and death
COSYNE 2022
Integrating deep reinforcement learning agents with the C. elegans nervous system
COSYNE 2022
Integrating deep reinforcement learning agents with the C. elegans nervous system
COSYNE 2022
Linking tonic dopamine and biased value predictions in a biologically inspired reinforcement learning model
COSYNE 2022
Linking tonic dopamine and biased value predictions in a biologically inspired reinforcement learning model
COSYNE 2022
Soft-actor-critic for model-free reinforcement learning of eye saccade control
COSYNE 2022
Soft-actor-critic for model-free reinforcement learning of eye saccade control
COSYNE 2022
A striatal probabilistic population code for reward underlies distributional reinforcement learning
COSYNE 2022
A striatal probabilistic population code for reward underlies distributional reinforcement learning
COSYNE 2022
Time cell encoding in deep reinforcement learning agents depends on mnemonic demands
COSYNE 2022
Time cell encoding in deep reinforcement learning agents depends on mnemonic demands
COSYNE 2022
What do meta-reinforcement learning networks learn in two-stage decision-making?
COSYNE 2022
What do meta-reinforcement learning networks learn in two-stage decision-making?
COSYNE 2022
Controlling human cortical and striatal reinforcement learning with meta prediction error
COSYNE 2023
Cortical dopamine enables deep reinforcement learning and leverages dopaminergic heterogeneity
COSYNE 2023
Language emergence in reinforcement learning agents performing navigational tasks
COSYNE 2023
Modelling ecological constraints on visual processing with deep reinforcement learning
COSYNE 2023
Reinforcement learning at multiple timescales in biological and artificial neural networks
COSYNE 2023
Two types of locus coeruleus norepinephrine neurons drive reinforcement learning
COSYNE 2023
Violations of transitivity disrupt relational inference in humans and reinforcement learning models
COSYNE 2023
Brain-like neural dynamics for behavioral control develop through reinforcement learning
COSYNE 2025
Correctness is its own reward: bootstrapping error codes in self-guided reinforcement learning
COSYNE 2025
Deep reinforcement learning trains agents to track odor plumes with active sensing
COSYNE 2025
Dual-Model Framework for Cerebellar Function: Integrating Reinforcement Learning and Adaptive Control
COSYNE 2025
A GPU-Accelerated Deep Reinforcement Learning Pipeline for Simulating Animal Behavior
COSYNE 2025
Humans forage for reward in classic reinforcement learning tasks
COSYNE 2025
Intracranial recordings uncover neuronal dynamics of multidimensional reinforcement learning.
COSYNE 2025
Inverse reinforcement learning with switching rewards and history dependency for studying behaviors
COSYNE 2025
Selective representation of reinforcement learning variables in subpopulations of the external globus pallidus
COSYNE 2025
Acquiring musculoskeletal skills with curriculum-based reinforcement learning
FENS Forum 2024
Modeling the sensorimotor system with deep reinforcement learning
FENS Forum 2024