AUDIOPRISM: OSCILLATORY FREQUENCY DECOMPOSITION ENHANCES SPEECH RECOGNITION THROUGH MULTI-SCALE TEMPORAL INTEGRATION
NISYS GmbH
Presentation
Date TBA
Event Information
Poster Board
PS01-07AM-357
Poster
View posterAbstract
Current speech recognition systems struggle with the multi-scale temporal dependencies in acoustic signals, requiring complex, computationally intensive architectures unsuitable for real-time use. Biological auditory systems, in contrast, use oscillatory dynamics to efficiently process acoustic information across diverse timescales—a principle largely ignored by conventional neural networks.
To address this, we introduce AudioPrism, a novel hierarchical architecture inspired by the cochlea and auditory cortex. It combines frequency-specific oscillatory decomposition with robust recurrent neural dynamics. The initial stage uses damped harmonic oscillators spanning the speech bandwidth for frequency-resolved components. A second layer employs recurrent oscillatory dynamics for temporal integration, capturing both high-frequency phonetic transients and low-frequency prosodic modulations.
AudioPrism was evaluated using the spoken digits of the Google Speech Command dataset. Without the preprocessing prism, the oscillatory recurrent neural network performed near chance level. By contrast, AudioPrism achieved >70% accuracy within a few epochs. After training, recurrent weights efficiently streamlined acoustic information from high-frequency to low-frequency nodes, whose slow reverberations were crucial for the final readout. Ablation studies demonstrated that both frequency decomposition and the prism-layer's filtering properties were essential for performance gains.
Oscillatory dynamics enable the network to maintain multiple temporal representations simultaneously, with different frequency channels encoding information at distinct timescales. This multi-scale integration allows AudioPrism to capture the hierarchical structure of speech efficiently. Our results demonstrate that biologically-inspired oscillatory dynamics can significantly enhance the computational efficiency of speech recognition systems, opening new directions for designing resource-efficient neural architectures for real-time applications.
Recommended posters
OSCILLATORY DYNAMICS AS A UNIVERSAL SUBSTRATE FOR COMPUTATION: FROM NEURAL CIRCUITS TO ARTIFICIAL INTELLIGENCE
Felix Effenberger, Pedro Carvalho, Igor Dubinin, Bastian Pietras, Wolf Singer
PREDICTIVE LEARNING IN LATENT SPACE ALIGNS ARTIFICIAL AND NEURAL SPEECH REPRESENTATIONS
Alessandro Corsini, Steffen Schneider, Alice Tomassini, Lorenzo Pedani, Luciano Fadiga, Alessandro D'Ausilio
ENDOGENOUS BRAIN RHYTHMS PREDICT INDIVIDUAL DIFFERENCES IN SPEECH COMPREHENSION
Tanja Atanasova, Rosanne Timmerman, Anne Keitel
UNRAVELING AUDITORY CORTEX ENCODING OF COMPLEX SOUNDS
Margaux Roulet, Simon Dahan, Lukas Anschuetz, Alix Trouillet
NEURAL PROCESSING OF HIGH-ORDER AUDITORY STRUCTURES IN HUMANS
Jacques Pesnot Lerousseau, Lucas Benjamin, Manuel Mercier, Philippe Albouy, Benjamin Morillon
THE NEURAL PROCESSING AND PLASTICITY OF THE AUDITORY CORTEX EVOKED BY A RAPID TEMPORAL SEQUENCES
Ema Cicelova, Ivan Sleziak, Kristina Kostrubanicova, Peter Hubka