ePoster

A Study of a biologically plausible combination of Sparsity, Weight Imprinting and Forward Inhibition in Continual Learning

Golzar Atefi, Justus Westerhof, Felix Gers, Erik Rodner
Bernstein Conference 2024(2024)
Goethe University, Frankfurt, Germany

Conference

Bernstein Conference 2024

Goethe University, Frankfurt, Germany

Resources

Authors & Affiliations

Golzar Atefi, Justus Westerhof, Felix Gers, Erik Rodner

Abstract

Continual learning focuses on systems that incrementally acquire and update their knowledge. It is a pressing problem in machine learning as it is challenging to learn new tasks without forgetting knowledge about old ones, a phenomenon known as catastrophic forgetting [1]. Since biological systems are able to efficiently learn and update their knowledge continually during their lifetime, we are interested in principles inspired by these systems to try to tackle this challenge. One of the ideas in biology that is often overlooked in artificial neural networks is Dale’s principle [2], stating that presynaptic neurons either have exclusively excitatory or inhibitory effects on postsynaptic neurons. We incorporate this separation in our network architecture, taking inspiration from feedforward inhibitory interneurons. Meanwhile, two types of inhibition, namely subtractive (rotational) and divisive (scaling) [3,4] and their respective relevance are investigated in this work. Furthermore, instead of using SGD and optimizing from random weights, we initialize the weights using a scaled version of input samples directly, also known as weight imprinting [5]. This allows for one shot learning which is a common feature of mammalian brains and circumvents biologically implausible non-linear optimization. As depicted in Figure 1, we use a two layer neural network with a preceding, fixed, pretrained feature extractor (e.g., ResNet, ConvNet), as in [6], where the first layer projects the input to a high-dimensional representation with a winner-take-all mechanism, which is inspired by the mushroom body circuitry of the fruit fly. This mechanism induces sparsity, which has been shown to reduce interference between tasks and encourage the generation of subnetworks [7]. Our second layer comprises a combination of imprinting and inhibition. Applying this method in a class-incremental setting on common benchmark datasets like sequential MNIST, fashion MNIST and CIFAR-10, we show that our combination of approaches leads to promising results without the use of gradient descent or a memory buffer typically used in continual learning. Additionally, it can also be fine-tuned via gradient descent.

Unique ID: bernstein-24/study-biologically-plausible-combination-b205e37b