ePoster

PREDICTIVE LEARNING IN LATENT SPACE ALIGNS ARTIFICIAL AND NEURAL SPEECH REPRESENTATIONS

Alessandro Corsiniand 5 co-authors

University of Ferrara

FENS Forum 2026 (2026)

Barcelona, Spain

Board PS02-07PM-573

Presentation

Date TBA

View poster

Board: PS02-07PM-573

Poster preview

Event Information

Poster Board

PS02-07PM-573

Poster

View poster

Abstract

Speech perception requires transforming complex acoustic signals into abstract representations that support linguistic understanding, yet the computational principles underlying this transformation in the brain remain unclear. Neural representations may support optimal compression of sensory input, or instead be optimized for extracting predictive information. These competing hypotheses can be tested by comparing neural activity with representations learned by artificial neural networks trained under corresponding learning objectives. To this end, we trained deep autoencoders and contrastive predictive coding (CPC) models on top of wav2vec 2.0 representations and compared their latent spaces with EEG activity. Autoencoders instantiated the compression hypothesis by learning representations through signal reconstruction, whereas CPC instantiated the predictive hypothesis by learning representations through prediction in latent space. CPC latents showed the strongest correspondence with neural signals. Moreover, poorer reconstruction accuracy was associated with better prediction of behavioral performance in a speech comprehension task (r = 0.45, p = 0.01). In contrast, the speech recognition performance of wav2vec 2.0 did not translate into stronger neural alignment. A principal component analysis revealed that the final 1% of variance in wav2vec 2.0 features was more robustly encoded in EEG activity than the first 99%, yet none of the examined subspaces predicted human comprehension performance. Strikingly, the original wav2vec model—natively trained with contrastive predictive coding—showed even stronger alignment with both neural and behavioral data (p = 0.01, Bonferroni corrected). Overall, these findings support predictive information extraction, rather than signal reconstruction, as a core computational principle underlying neural sensory representations.

a. describes the experimental setup, b. describes the latent analysis setup, c. describes the PCA analysis of wav2vec 2.0 features, d. describes the three alternative models used for hypothesis testing, e. describes consistency of models, f. describes performance at prediction task, g. describes leg encoding of latents, h. describes behavioral correlation results of latent encodings, j,k. describes wav2vec vs wav2vec 2.0 encoding results.

PREDICTIVE LEARNING IN LATENT SPACE ALIGNS ARTIFICIAL AND NEURAL SPEECH REPRESENTATIONS

Poster preview

Event Information

Abstract

Recommended posters