Resources
Authors & Affiliations
Raymond Chua,Christos Kaplanis,Doina Precup
Abstract
Learning and memory consolidation in the brain occur over multiple timescales. Inspired by this observation, it has been shown that catastrophic forgetting in reinforcement learning agents can be mitigated by consolidating Q-value function parameters at multiple timescales. In this work, we combine this approach with successor features, and show that by consolidating successor features and preferences learned over multiple timescales we can further mitigate catastrophic forgetting. In particular, we show that agents trained with this approach rapidly recall previously rewarding sites in large environments, whereas those trained without this decomposition and consolidation mechanism do not. These results therefore contribute to our understanding of the functional role of synaptic plasticity and memory systems operating at multiple timescales, and demonstrate that reinforcement learning can be improved by capturing features of biological memory with greater fidelity.