Resources
Authors & Affiliations
Carlos Enrique Gutierrez,Jean Lienard,Benoît Girard,Hidetoshi Urakubo,Yuko Ishiwaka,Kenji Doya
Abstract
The basal ganglia (BG) play a crucial role in action-selection and reinforcement learning (RL), but how multiple nuclei, transmitters and receptors realize computations for reward-based learning is still unclear.
We built a topologically organized spiking BG model. Striatal medium spiny neurons (MSN) were classified based on the expression of dopamine D1 and D2 receptors. We implemented spike-timing dependent plasticity and structural parameters: i) the asymmetry of connections between MSN’s; and ii) the overlap between direct and indirect pathways.
In action-selection simulations, we assumed two functional channels representing competitive sensory inputs and actions. We activated two neighboring ensembles of cortical neurons and observed the responses on two adjacent MSN ensembles and downstream nuclei.
In RL simulations, we investigated transient increase and decreases of dopamine in a generalization-discrimination task. In generalization-learning (classical conditioning), upon the selection of the preferred channel, reward was delivered as dopamine burst, causing the potentiation of connections to MSN-D1. After several episodes, tests showed the preferred channel selection across both stimuli.
In discrimination-learning, the previously learned action-selection upon a non-preferred channel triggered reward omission as dopamine dip, causing the potentiation of cortical synapses to MSN-D2. After several episodes, the prediction was refined, producing the corresponding channel selection for each stimulus.
Our simulation results show that discrimination learning, converges faster for higher values of ii). This suggest that overlapping pathways may provide learning advantages, which support the idea of a functional cooperation between direct and indirect pathways. This was possible given a high asymmetry i), with sparse connections from MSN-D1 to MSN-D2.
Based on our results, we hypothesize that lateral inhibition from MSN-D2 to other MSN’s increases during dopamine dips, and this modulation is crucial for discrimination learning convergence.
In addition, we demonstrate that this model simulation can scales to the size of macaque BG, using the Fugaku supercomputer.