ePoster

The cost of behavioral flexibility: a modeling study of reversal learning using a spiking neural network

Behnam Ghazinouri, Sen Cheng
Bernstein Conference 2024(2024)
Goethe University, Frankfurt, Germany

Conference

Bernstein Conference 2024

Goethe University, Frankfurt, Germany

Resources

Authors & Affiliations

Behnam Ghazinouri, Sen Cheng

Abstract

To survive in a changing world, animals often need to suppress a previously learnt, but now obsolete, behavior and acquire a new behavior. This process is known as reversal learning (RL). The neural mechanisms underlying RL in spatial navigation has received limited attention and it remains unclear what neural mechanisms maintain behavioral flexibility. To address this issue, we extended an existing closed-loop simulator of spatial navigation and learning [1], based on spiking neural networks. The activity of place cells and boundary cells were fed as inputs to action selection neurons, which drove the movement of the agent. When the agent reached the goal, behavior was reinforced with spike-timing-dependent plasticity (STDP) coupled with an eligibility trace which marks synaptic connections for future reward-based updates. The modeled RL task had an ABA design, where the goal was switched between two locations A and B after 10 trials. This task touches on two trade-offs: stability vs. plasticity, well known in the neural network literature [2], and exploitation vs. exploration, which is central in the reinforcement learning literature [3]. For the RL task, maintaining behavioral flexibility requires exploration and plasticity, but this reduces performance and stability. The challenge is understanding how a biologically plausible spiking neural network maintains flexibility and its associated costs. To measure the agent's performance, we employed three methods: trial duration, proximity, and similarity between the agent's traversed trajectory and an ideal trajectory (DTW). All measurements were consistent. A combination of symmetric STDP and optimized place field parameters performs well on the first target but lacks flexibility for the second. In three other cases, the agent remains flexible, but incurs different costs. Asymmetric STDP results in highly variable behavior. Using many small place fields leads to low overall performance. Providing an external supervisory signal (injecting noise when unrewarded for too long) results in slow RL and variable performance on the second target, but better performance on the first. In conclusion, our model suggests that intrinsic neural mechanisms may not be sufficient to simultaneously ensure behavioral flexibility, rapid learning, and good performance. A second system might be needed to monitor and intervene in the agent's navigation and learning.

Unique ID: bernstein-24/cost-behavioral-flexibility-modeling-23f139f2