ePoster

Rhythm-structured predictive coding for contextualized speech processing

Olesia Dogonasheva, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin
Bernstein Conference 2024(2024)
Goethe University, Frankfurt, Germany

Conference

Bernstein Conference 2024

Goethe University, Frankfurt, Germany

Resources

Authors & Affiliations

Olesia Dogonasheva, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin

Abstract

Our ability to perceive speech remains resilient to variations in voice, rate, and temporal interruptions. While inference models addressed the first two [1], the computational principles by which speech understanding remains impervious to temporal restructuring are largely unexplored. Previous studies indicated intriguing recoveries in comprehension when speech is temporally modulated [2-4]. We show that predictive coding, constrained by endogenous rhythms, accounts for these quizzical results and enables robust lexical speech recovery. We build upon the major hypotheses that the rhythmic structure of speech establishes temporal windows, allowing the brain circuits to effectively process auditory signals. Moreover, rhythmic activity is hierarchically structured in line with the structure of speech [5] and modulates predictive coding. Successful comprehension relies on actively minimizing contextual uncertainty and surprise [6], which modulate theta and delta rhythms, respectively [7]. Integrating this evidence, we propose a predictive coding framework (BRyBI), which implements a hierarchy of rhythms and actively minimizes both uncertainty and surprise. The theta rhythm in the BRyBI reduces uncertainty in the subsequent phoneme distribution. Theta rhythm entrainment by speech minimizes errors in recognized syllables. On the other hand, the delta rhythm enables temporally-structured semantic prediction error minimization, thereby implementing online word-context inference. BRyBI allows for robust speech recognition under temporal perturbations such as compression, interruption, and segmentation. Furthermore, behaviors observed experimentally that so far have escaped explanation, such as error-related potentials, emerge naturally in BRyBI; speech-rhythm coherence decreases for theta and grows for delta with increased uncertainty/surprise. The model reproduces key features of the observed human performance, such as resistance to noise, invariance to voices, dialects, and tempos, and the ability to restore speech understanding in experiments with temporal interruption and signal segmentation. We found that the delta rhythm may serve to drive semantic contextual prediction and therefore is a bottleneck for speech comprehension. In sum, we suggest that oscillation-constrained predictive coding generically explains the results of multiple experiments with temporal scale alterations and provides a new view of the speech recognition process in the brain.

Unique ID: bernstein-24/rhythm-structured-predictive-coding-b0241838