Resources
Authors & Affiliations
Viet Anh Khoa Tran, Emre Neftci, Willem Wybo
Abstract
The brain is remarkably adept at learning from a continuous stream of data without significantly forgetting
previously learnt skills. Conventional machine learning models struggle at continual learning, as weight
updates that optimize the current task interfere with previously learnt tasks. A simple remedy to
catastrophic forgetting is freezing a network pretrained on a set of base tasks, and training task-specific
readouts on this shared trunk. However, this assumes that representations in the frozen network are
separable under new tasks, therefore leading to sub-par performance. To continually learn on novel task
data, previous methods suggest weight consolidation - preserving weights that are most impactful for the
performance of previous tasks - and memory-based approaches - where the network is allowed to see a
subset of images from previous tasks.
For biological networks, prior work showed that dendritic top-down modulations provide a powerful
mechanism to solve complex tasks while initial feedforward weights solely extract generic view-invariant
features (A). This view aligns with the ‘neural collapse’ phenomenon from supervised machine learning, as
the optimal solution for such algorithms is to be invariant to task-irrelevant features that are potentially
relevant for other tasks (B). Instead, we posit that feature extraction can be learned solely by optimizing
the networks to attract representations of smoothly moving visual stimuli, akin to contrastive self-
supervised learning methods (C).
We propose a continual learner that optimizes the feedforward weights towards view-invariant
representations while training task-specific modulations in a supervised manner towards separable class
clusters, which we train in a standard task-incremental setting (C). We show that this simple approach
avoids catastrophic forgetting of class clusters, as opposed to training the whole network in a supervised
manner, while also outperforming (1) task-specific readout without modulations and (2) frozen feedforward
weights (D). This suggests that (1) top-down modulations are necessary and sufficient to shift the
representations towards separable clusters and that (2) the SSL objective learns novel features based on
the newly presented objects while maintaining features relevant to previous tasks, without requiring
specific synaptic consolidation mechanisms.