ePoster

AUDIOPRISM: OSCILLATORY FREQUENCY DECOMPOSITION ENHANCES SPEECH RECOGNITION THROUGH MULTI-SCALE TEMPORAL INTEGRATION

Bastian Pietrasand 4 co-authors

NISYS GmbH

FENS Forum 2026 (2026)
Barcelona, Spain
Board PS01-07AM-357

Presentation

Date TBA

Board: PS01-07AM-357

Poster preview

AUDIOPRISM: OSCILLATORY FREQUENCY DECOMPOSITION ENHANCES SPEECH RECOGNITION THROUGH MULTI-SCALE TEMPORAL INTEGRATION poster preview

Event Information

Poster Board

PS01-07AM-357

Abstract

Current speech recognition systems struggle with the multi-scale temporal dependencies in acoustic signals, requiring complex, computationally intensive architectures unsuitable for real-time use. Biological auditory systems, in contrast, use oscillatory dynamics to efficiently process acoustic information across diverse timescales—a principle largely ignored by conventional neural networks.

To address this, we introduce AudioPrism, a novel hierarchical architecture inspired by the cochlea and auditory cortex. It combines frequency-specific oscillatory decomposition with robust recurrent neural dynamics. The initial stage uses damped harmonic oscillators spanning the speech bandwidth for frequency-resolved components. A second layer employs recurrent oscillatory dynamics for temporal integration, capturing both high-frequency phonetic transients and low-frequency prosodic modulations.

AudioPrism was evaluated using the spoken digits of the Google Speech Command dataset. Without the preprocessing prism, the oscillatory recurrent neural network performed near chance level. By contrast, AudioPrism achieved >70% accuracy within a few epochs. After training, recurrent weights efficiently streamlined acoustic information from high-frequency to low-frequency nodes, whose slow reverberations were crucial for the final readout. Ablation studies demonstrated that both frequency decomposition and the prism-layer's filtering properties were essential for performance gains.

Oscillatory dynamics enable the network to maintain multiple temporal representations simultaneously, with different frequency channels encoding information at distinct timescales. This multi-scale integration allows AudioPrism to capture the hierarchical structure of speech efficiently. Our results demonstrate that biologically-inspired oscillatory dynamics can significantly enhance the computational efficiency of speech recognition systems, opening new directions for designing resource-efficient neural architectures for real-time applications.


Sketch of AudioPrism: (A) Setup of the network with prism- and recurrent HORN layer. (B) DHO dynamics. (C) Gain functions of prism DHOs. (D) Exemplary time series from input to output nodes. (E) Test accuracy over training.

Recommended posters

Cookies

We use essential cookies to run the site. Analytics cookies are optional and help us improve World Wide. Learn more.