Resources
Authors & Affiliations
Michael Thornton, Danilo Mandic, Tobias Reichenbach
Abstract
During speech perception, a listener’s electroencephalogram (EEG) reflects low-level acoustic processing as well as higher-level cognitive processes such as speech comprehension and selective attention. However, relating EEG signals to speech remains challenging, owing in part to the low SNRs of EEG signals, as well as the fact that EEG signals are highly specific to individuals. To address this, we developed a deep-learning decoder which targeted the match-mismatch problem: given two candidate speech segments and a short segment of EEG measurements, the task is to identify which of the speech segments was being played to the participant when their EEG was recorded. The decoder exploited two well-known speech-related auditory responses: envelope tracking, and speech-related frequency-following responses. The decoder generalised extremely well between participants and datasets, and won the ICASSP 2023 ‘Auditory EEG Decoding’ Signal Processing Grand Challenge. By varying the levels of background noise, we showed that the decoding accuracy was affected by speech clarity. However, the accuracy did not decrease when participants listened to speech in a foreign language, suggesting that speech comprehension was not important. The decoder reliably detected the focus of participants’ auditory attention in competing-speakers scenarios, even though it was trained using EEG recorded under single-speaker conditions only. We conclude that highly robust auditory EEG decoders can be developed using deep learning with large EEG datasets. The participant-independent nature of the decoder has compelling implications for applications such as auditory attention decoding for cognitively-steered hearing aids.