ePoster

Learning predictable factors from sequences: it’s not only about slow features

Ashena Gorgan Mohammadi, Manu Srinath Halvagal, Friedemann Zenke
Bernstein Conference 2024(2024)
Goethe University, Frankfurt, Germany

Conference

Bernstein Conference 2024

Goethe University, Frankfurt, Germany

Resources

Authors & Affiliations

Ashena Gorgan Mohammadi, Manu Srinath Halvagal, Friedemann Zenke

Abstract

Animals experience the world as a continuous sequence of complex high-dimensional sensory stimuli. They need to build internal models from this ongoing stream of experiences with few supervisory and reinforcement signals. How they accomplish this feat remains unclear. One possibility is that the brain relies on a slowness objective (SO) which stipulates that neuronal representations change little over time$^1$. This allows extracting slowly varying underlying factors from the sensory experience, in some cases, with local plasticity rules$^2$. However, the approach precludes exploiting the rich learning signals generated from predictable changes in the underlying factors. During speech, for instance, the identity of a speaker changes slowly whereas the spoken phonemes change rapidly but in a predictable manner. Leveraging the information from changing factors requires a mechanism accounting for predictable transitions$^3$. One such mechanism from self-supervised machine learning$^4$ is a dedicated predictor network (PN) which has been linked to cortical microcircuits$^{5,6}$. This suggests an alternative possibility for how the brain builds internal models (Fig. A). However, it remains unknown to what extent rapidly varying factors contribute useful learning signals in practice, and whether a dedicated PN can leverage them effectively. Here, we investigate the capabilities of SO and PN to extract the underlying factors in natural sensory data. First, we demonstrate that PN can learn both fast and slow factors in a synthetic setting whereas SO only learns slow factors (Fig. B). Next, we examine the algorithms on LibriSpeech$^7$, a real-world natural speech dataset, and try to infer speaker identity and phoneme from the learned representations. We find that a network trained by SO has useful representations for classifying speaker identity, but not phonemes. In contrast, PN forms representations useful for both tasks (Fig. C). Finally, we evaluate how both mechanisms are affected by the choice of the temporal prediction horizon and discuss the possible benefits of multi-timescale prediction. Our results support the notion that rapidly varying factors in natural sensory stimulus sequences constitute useful learning signals in addition to slow factors and that exploiting them necessitates additional PNs. These findings motivate future work on mapping PNs to cortical microcircuit structures and understanding how they interact with synaptic plasticity.

Unique ID: bernstein-24/learning-predictable-factors-from-35ba1a22