Resources
Authors & Affiliations
Christian Klos, Raoul-Martin Memmesheimer
Abstract
The ability to train spiking neural network models is essential for the modeling of biological neural networks as well as for neuromorphic computing. The standard approach when training non-spiking neural networks is gradient descent. In spiking neural networks, however, gradient descent learning is complicated by the all-or-none character of spikes (Fig. A). It can lead to unexpected (dis-)appearances of spikes during training, resulting in disruptive, discontinuous changes in the network dynamics [1]. Further, it seemingly prohibits exact gradients to systematically generate or remove spikes. These problems have so far been ignored or circumvented by using heuristics [2, 3].
Here we show that and how they can be solved. Specifically, we demonstrate non-disruptive, exact gradient descent learning of spiking dynamics in neural network models. Perhaps surprisingly, among others, networks of the arguably simplest truly spiking neuron model, the standard quadratic leaky integrate-and-fire (QIF) neuron (Fig. B), generate suitable dynamics. The reason is that, as we show, the spikes of QIF neurons change continuously or even smoothly with the inputs and the neuron and network parameters. In particular, spikes vanish and appear only at the end of a trial, where this does not influence any future dynamics. Interestingly, these properties also enable gradient-based spike addition, besides spike removal. To achieve this, we continue the dynamics as pseudodynamics behind the trial end. Specifically, the neurons continue to evolve as autonomous QIF neurons, but with an added suprathreshold drive until they have spiked sufficiently often for the task at hand. The resulting pseudospike times depend continuously and mostly smoothly on the network parameters and allow to backpropagate errors through neurons that are inactive during the actual trial duration.
We apply our scheme to individual and networks of QIF neurons using event-based simulations and automatic differentiation to compute spike-based gradients. This allows, in particular, to induce and continuously move spikes to desired times (Fig. C). Further, we match the performance on MNIST of previous studies using time-to-first-spike coding (e.g. [4]) but starting from an initially silent deep network.
Taken together, our results show how non-disruptive, exact learning is possible despite discrete spikes.