TopicNeuro

speech recognition

4 Seminars1 ePoster

Latest

SeminarNeuroscienceRecording

Encoding and perceiving the texture of sounds: auditory midbrain codes for recognizing and categorizing auditory texture and for listening in noise

Monty Escabi
University of Connecticut
Oct 1, 2021

Natural soundscapes such as from a forest, a busy restaurant, or a busy intersection are generally composed of a cacophony of sounds that the brain needs to interpret either independently or collectively. In certain instances sounds - such as from moving cars, sirens, and people talking - are perceived in unison and are recognized collectively as single sound (e.g., city noise). In other instances, such as for the cocktail party problem, multiple sounds compete for attention so that the surrounding background noise (e.g., speech babble) interferes with the perception of a single sound source (e.g., a single talker). I will describe results from my lab on the perception and neural representation of auditory textures. Textures, such as a from a babbling brook, restaurant noise, or speech babble are stationary sounds consisting of multiple independent sound sources that can be quantitatively defined by summary statistics of an auditory model (McDermott & Simoncelli 2011). How and where in the auditory system are summary statistics represented and the neural codes that potentially contribute towards their perception, however, are largely unknown. Using high-density multi-channel recordings from the auditory midbrain of unanesthetized rabbits and complementary perceptual studies on human listeners, I will first describe neural and perceptual strategies for encoding and perceiving auditory textures. I will demonstrate how distinct statistics of sounds, including the sound spectrum and high-order statistics related to the temporal and spectral correlation structure of sounds, contribute to texture perception and are reflected in neural activity. Using decoding methods I will then demonstrate how various low and high-order neural response statistics can differentially contribute towards a variety of auditory tasks including texture recognition, discrimination, and categorization. Finally, I will show examples from our recent studies on how high-order sound statistics and accompanying neural activity underlie difficulties for recognizing speech in background noise.

SeminarNeuroscienceRecording

Decoding the neural processing of speech

Tobias Reichenbach
Friedrich-Alexander-University
Mar 23, 2021

Understanding speech in noisy backgrounds requires selective attention to a particular speaker. Humans excel at this challenging task, while current speech recognition technology still struggles when background noise is loud. The neural mechanisms by which we process speech remain, however, poorly understood, not least due to the complexity of natural speech. Here we describe recent progress obtained through applying machine-learning to neuroimaging data of humans listening to speech in different types of background noise. In particular, we develop statistical models to relate characteristic features of speech such as pitch, amplitude fluctuations and linguistic surprisal to neural measurements. We find neural correlates of speech processing both at the subcortical level, related to the pitch, as well as at the cortical level, related to amplitude fluctuations and linguistic structures. We also show that some of these measures allow to diagnose disorders of consciousness. Our findings may be applied in smart hearing aids that automatically adjust speech processing to assist a user, as well as in the diagnosis of brain disorders.

SeminarNeuroscienceRecording

Do deep learning latent spaces resemble human brain representations?

Rufin VanRullen
Centre de Recherche Cerveau et Cognition (CERCO)
Mar 13, 2021

In recent years, artificial neural networks have demonstrated human-like or super-human performance in many tasks including image or speech recognition, natural language processing (NLP), playing Go, chess, poker and video-games. One remarkable feature of the resulting models is that they can develop very intuitive latent representations of their inputs. In these latent spaces, simple linear operations tend to give meaningful results, as in the well-known analogy QUEEN-WOMAN+MAN=KING. We postulate that human brain representations share essential properties with these deep learning latent spaces. To verify this, we test whether artificial latent spaces can serve as a good model for decoding brain activity. We report improvements over state-of-the-art performance for reconstructing seen and imagined face images from fMRI brain activation patterns, using the latent space of a GAN (Generative Adversarial Network) model coupled with a Variational AutoEncoder (VAE). With another GAN model (BigBiGAN), we can decode and reconstruct natural scenes of any category from the corresponding brain activity. Our results suggest that deep learning can produce high-level representations approaching those found in the human brain. Finally, I will discuss whether these deep learning latent spaces could be relevant to the study of consciousness.

ePosterNeuroscience

Geometric Signatures of Speech Recognition: Insights from Deep Neural Networks to the Brain

Jiaqi Shang, Shailee Jain, Haim Sompolinsky, Edward Chang

COSYNE 2025

speech recognition coverage

5 items

Seminar4
ePoster1
Domain spotlight

Explore how speech recognition research is advancing inside Neuro.

Visit domain