← Back

Speech

Topic spotlight
TopicWorld Wide

speech

Discover seminars, jobs, and research tagged with speech across World Wide.
63 curated items40 Seminars19 ePosters4 Positions
Updated 2 days ago
63 items · speech
63 results
Position

Prof David Brang

University of Michigan
Ann Arbor, Michigan
Dec 5, 2025

We are seeking a full-time post-doctoral research fellow to study computational and neuroscientific models of perception and cognition. The research fellow will be jointly supervised by Dr. David Brang (https://sites.lsa.umich.edu/brang-lab/) and Zhongming Liu (https://libi.engin.umich.edu). The goal of this collaboration is to build computational models of cognitive and perceptual processes using data combined from electrocorticography (ECoG) and fMRI. The successful applicant will also have freedom to conduct additional research based on their interests, using a variety of methods -- ECoG, fMRI, DTI, lesion mapping, and EEG. The ideal start date is from spring to fall 2021 and the position is expected to last for at least two years, with the possibility of extension for subsequent years. We are also recruiting a Post-Doc for research on multisensory interactions (particularly how vision modulates speech perception) using Cognitive Neuroscience techniques or to help with our large-scale brain tumor collaboration with Shawn Hervey-Jumper at UCSF (https://herveyjumperlab.ucsf.edu). In this latter collaboration we collect iEEG (from ~50 patients/year) and lesion mapping data (from ~150 patients/year) in patients with a brain tumor to study sensory and cognitive functions in patients. The goals of this project are to better understand the physiology of tumors, study causal mechanisms of brain functions, and generalize iEEG/ECoG findings from epilepsy patients to a second patient population.

Position

Jörn Anemüller

Department of Medical Physics and Acoustics, University of Oldenburg
Oldenburg University
Dec 5, 2025

We have are looking to fill a fully funded 3-year Ph.D. student position in the field of deep learning-based signal processing algorithms for speech enhancement and computational audition. The position is funded by the German research council (DFG) within the Collaborative Research Centre SFB 1330 “Hearing Acoustics” at the Department of Medical Physics and Acoustics, University of Oldenburg. Within project B3 of the research centre, the Computational Audition Group develops machine learning algorithms for signal processing of speech and audio data.

Position

Steve Schneider

University of Surrey
University of Surrey
Dec 5, 2025

The School of Computer Science and Electronic Engineering is seeking to recruit a full-time Lecturer in Natural Language Processing to grow our AI research. The School is home to two established research centres with expertise in AI and Machine Learning: the Computer Science Research Centre and the Centre for Vision, Speech and Signal Processing (CVSSP). This post is aligned to the Nature Inspired Computer and Engineering group within Computer Science. This role encourages applicants from the areas of natural language processing including language modelling, language generation (machine translation/summarisation), explainability and reasoning in NLP, and/or aligned multimodal challenges for NLP (vision-language, audio-language, and so on) and we are particularly interested in candidates who enhance our current strengths and bring complementary areas of AI expertise. Surrey has an established international reputation in AI research, 1st in the UK for computer vision and top 10 for AI, computer vision, machine learning and natural language processing (CSRankings.org) and were 7th in the UK for REF2021 outputs in Computer Science research. Computer Science and CVSSP are at the core of the Surrey Institute for People-Centred AI (PAI), established in 2021 as a pan-University initiative which brings together leading AI research with cross-discipline expertise across health, social, behavioural, and engineering sciences, and business, law, and the creative arts to shape future AI to benefit people and society. PAI leads a portfolio of £100m in grant awards including major research activities in creative industries and healthcare, and two doctoral training programmes with funding for over 100 PhD researchers: the UKRI AI Centre for Doctoral Training in AI for Digital Media Inclusion, and the Leverhulme Trust Doctoral Training Network in AI-Enabled Digital Accessibility.

SeminarNeuroscience

Simulating Thought Disorder: Fine-Tuning Llama-2 for Synthetic Speech in Schizophrenia

Alban Elias Voppel
McGill University
Apr 30, 2025
SeminarNeuroscience

Relating circuit dynamics to computation: robustness and dimension-specific computation in cortical dynamics

Shaul Druckmann
Stanford department of Neurobiology and department of Psychiatry and Behavioral Sciences
Apr 22, 2025

Neural dynamics represent the hard-to-interpret substrate of circuit computations. Advances in large-scale recordings have highlighted the sheer spatiotemporal complexity of circuit dynamics within and across circuits, portraying in detail the difficulty of interpreting such dynamics and relating it to computation. Indeed, even in extremely simplified experimental conditions, one observes high-dimensional temporal dynamics in the relevant circuits. This complexity can be potentially addressed by the notion that not all changes in population activity have equal meaning, i.e., a small change in the evolution of activity along a particular dimension may have a bigger effect on a given computation than a large change in another. We term such conditions dimension-specific computation. Considering motor preparatory activity in a delayed response task we utilized neural recordings performed simultaneously with optogenetic perturbations to probe circuit dynamics. First, we revealed a remarkable robustness in the detailed evolution of certain dimensions of the population activity, beyond what was thought to be the case experimentally and theoretically. Second, the robust dimension in activity space carries nearly all of the decodable behavioral information whereas other non-robust dimensions contained nearly no decodable information, as if the circuit was setup to make informative dimensions stiff, i.e., resistive to perturbations, leaving uninformative dimensions sloppy, i.e., sensitive to perturbations. Third, we show that this robustness can be achieved by a modular organization of circuitry, whereby modules whose dynamics normally evolve independently can correct each other’s dynamics when an individual module is perturbed, a common design feature in robust systems engineering. Finally, we will recent work extending this framework to understanding the neural dynamics underlying preparation of speech.

SeminarNeuroscience

The representation of speech conversations in the human auditory cortex

Etienne Abassi
McGill University
Apr 2, 2025
SeminarNeuroscience

LLMs and Human Language Processing

Maryia Toneva, Ariel Goldstein, Jean-Remi King
Max Planck Institute of Software Systems; Hebrew University; École Normale Supérieure
Nov 28, 2024

This webinar convened researchers at the intersection of Artificial Intelligence and Neuroscience to investigate how large language models (LLMs) can serve as valuable “model organisms” for understanding human language processing. Presenters showcased evidence that brain recordings (fMRI, MEG, ECoG) acquired while participants read or listened to unconstrained speech can be predicted by representations extracted from state-of-the-art text- and speech-based LLMs. In particular, text-based LLMs tend to align better with higher-level language regions, capturing more semantic aspects, while speech-based LLMs excel at explaining early auditory cortical responses. However, purely low-level features can drive part of these alignments, complicating interpretations. New methods, including perturbation analyses, highlight which linguistic variables matter for each cortical area and time scale. Further, “brain tuning” of LLMs—fine-tuning on measured neural signals—can improve semantic representations and downstream language tasks. Despite open questions about interpretability and exact neural mechanisms, these results demonstrate that LLMs provide a promising framework for probing the computations underlying human language comprehension and production at multiple spatiotemporal scales.

SeminarNeuroscienceRecording

Sophie Scott - The Science of Laughter from Evolution to Neuroscience

Sophie Scott
University College London, UK
Sep 9, 2024

Keynote Address to British Association of Cognitive Neuroscience, London, 10th September 2024

SeminarNeuroscience

Exploring the cerebral mechanisms of acoustically-challenging speech comprehension - successes, failures and hope

Alexis Hervais-Adelman
University of Geneva
May 20, 2024

Comprehending speech under acoustically challenging conditions is an everyday task that we can often execute with ease. However, accomplishing this requires the engagement of cognitive resources, such as auditory attention and working memory. The mechanisms that contribute to the robustness of speech comprehension are of substantial interest in the context of hearing mild to moderate hearing impairment, in which affected individuals typically report specific difficulties in understanding speech in background noise. Although hearing aids can help to mitigate this, they do not represent a universal solution, thus, finding alternative interventions is necessary. Given that age-related hearing loss (“presbycusis”) is inevitable, developing new approaches is all the more important in the context of aging populations. Moreover, untreated hearing loss in middle age has been identified as the most significant potentially modifiable predictor of dementia in later life. I will present research that has used a multi-methodological approach (fMRI, EEG, MEG and non-invasive brain stimulation) to try to elucidate the mechanisms that comprise the cognitive “last mile” in speech acousticallychallenging speech comprehension and to find ways to enhance them.

SeminarNeuroscience

Dyslexia, Rhythm, Language and the Developing Brain

Usha Goswami CBE
University of Cambridge
Feb 21, 2024

Recent insights from auditory neuroscience provide a new perspective on how the brain encodes speech. Using these recent insights, I will provide an overview of key factors underpinning individual differences in children’s development of language and phonology, providing a context for exploring atypical reading development (dyslexia). Children with dyslexia are relatively insensitive to acoustic cues related to speech rhythm patterns. This lack of rhythmic sensitivity is related to the atypical neural encoding of rhythm patterns in speech by the brain. I will describe our recent data from infants as well as children, demonstrating developmental continuity in the key neural variables.

SeminarCognition

Prosody in the voice, face, and hands changes which words you hear

Hans Rutger Bosker
Donders Institute of Radboud University
May 22, 2023

Speech may be characterized as conveying both segmental information (i.e., about vowels and consonants) as well as suprasegmental information - cued through pitch, intensity, and duration - also known as the prosody of speech. In this contribution, I will argue that prosody shapes low-level speech perception, changing which speech sounds we hear. Perhaps the most notable example of how prosody guides word recognition is the phenomenon of lexical stress, whereby suprasegmental F0, intensity, and duration cues can distinguish otherwise segmentally identical words, such as "PLAto" vs. "plaTEAU" in Dutch. Work from our group showcases the vast variability in how different talkers produce stressed vs. unstressed syllables, while also unveiling the remarkable flexibility with which listeners can learn to handle this between-talker variability. It also emphasizes that lexical stress is a multimodal linguistic phenomenon, with the voice, lips, and even hands conveying stress in concert. In turn, human listeners actively weigh these multisensory cues to stress depending on the listening conditions at hand. Finally, lexical stress is presented as having a robust and lasting impact on low-level speech perception, even down to changing vowel perception. Thus, prosody - in all its multisensory forms - is a potent factor in speech perception, determining what speech sounds we hear.

SeminarNeuroscienceRecording

Silences, Spikes and Bursts: Three-Part Knot of the Neural Code

Richard Naud
University of Ottawa
Feb 28, 2023

When a neuron breaks silence, it can emit action potentials in a number of patterns. Some responses are so sudden and intense that electrophysiologists felt the need to single them out, labeling action potentials emitted at a particularly high frequency with a metonym – bursts. Is there more to bursts than a figure of speech? After all, sudden bouts of high-frequency firing are expected to occur whenever inputs surge. In this talk, I will discuss the implications of seeing the neural code as having three syllables: silences, spikes and bursts. In particular, I will describe recent theoretical and experimental results that implicate bursting in the implementation of top-down attention and the coordination of learning.

SeminarPsychology

The speaker identification ability of blind and sighted listeners

Almut Braun
Bundeskriminalamt, Wiesbaden
Feb 21, 2023

Previous studies have shown that blind individuals outperform sighted controls in a variety of auditory tasks; however, only few studies have investigated blind listeners’ speaker identification abilities. In addition, existing studies in the area show conflicting results. The presented empirical investigation with 153 blind (74 of them congenitally blind) and 153 sighted listeners is the first of its kind and scale in which long-term memory effects of blind listeners’ speaker identification abilities are examined. For the empirical investigation, all listeners were evenly assigned to one of nine subgroups (3 x 3 design) in order to investigate the influence of two parameters with three levels, respectively, on blind and sighted listeners’ speaker identification performance. The parameters were a) time interval; i.e. a time interval of 1, 3 or 6 weeks between the first exposure to the voice to be recognised (familiarisation) and the speaker identification task (voice lineup); and b) signal quality; i.e. voice recordings were presented in either studio-quality, mobile phone-quality or as recordings of whispered speech. Half of the presented voice lineups were target-present lineups in which the previously heard target voice was included. The other half consisted of target-absent lineups which contained solely distractor voices. Blind individuals outperformed sighted listeners only under studio quality conditions. Furthermore, for blind and sighted listeners no significant performance differences were found with regard to the three investigated time intervals of 1, 3 and 6 weeks. Blind as well as sighted listeners were significantly better at picking the target voice from target-present lineups than at indicating that the target voice was absent in target-absent lineups. Within the blind group, no significant correlations were found between identification performance and onset or duration of blindness. Implications for the field of forensic phonetics are discussed.

SeminarNeuroscienceRecording

Pitch and Time Interact in Auditory Perception

Jesse Pazdera
McMaster University, Canada
Oct 25, 2022

Research into pitch perception and time perception has typically treated the two as independent processes. However, previous studies of music and speech perception have suggested that pitch and timing information may be processed in an integrated manner, such that the pitch of an auditory stimulus can influence a person’s perception, expectation, and memory of its duration and tempo. Typically, higher-pitched sounds are perceived as faster and longer in duration than lower-pitched sounds with identical timing. We conducted a series of experiments to better understand the limits of this pitch-time integrality. Across several experiments, we tested whether the higher-equals-faster illusion generalizes across the broader frequency range of human hearing by asking participants to compare the tempo of a repeating tone played in one of six octaves to a metronomic standard. When participants heard tones from all six octaves, we consistently found an inverted U-shaped effect of the tone’s pitch height, such that perceived tempo peaked between A4 (440 Hz) and A5 (880 Hz) and decreased at lower and higher octaves. However, we found that the decrease in perceived tempo at extremely high octaves could be abolished by exposing participants to high-pitched tones only, suggesting that pitch-induced timing biases are context sensitive. We additionally tested how the timing of an auditory stimulus influences the perception of its pitch, using a pitch discrimination task in which probe tones occurred early, late, or on the beat within a rhythmic context. Probe timing strongly biased participants to rate later tones as lower in pitch than earlier tones. Together, these results suggest that pitch and time exert a bidirectional influence on one another, providing evidence for integrated processing of pitch and timing information in auditory perception. Identifying the mechanisms behind this pitch-time interaction will be critical for integrating current models of pitch and tempo processing.

SeminarNeuroscienceRecording

Hierarchical transformation of visual event timing representations in the human brain: response dynamics in early visual cortex and timing-tuned responses in association cortices

Evi Hendrikx
Utrecht University
Sep 27, 2022

Quantifying the timing (duration and frequency) of brief visual events is vital to human perception, multisensory integration and action planning. For example, this allows us to follow and interact with the precise timing of speech and sports. Here we investigate how visual event timing is represented and transformed across the brain’s hierarchy: from sensory processing areas, through multisensory integration areas, to frontal action planning areas. We hypothesized that the dynamics of neural responses to sensory events in sensory processing areas allows derivation of event timing representations. This would allow higher-level processes such as multisensory integration and action planning to use sensory timing information, without the need for specialized central pacemakers or processes. Using 7T fMRI and neural model-based analyses, we found responses that monotonically increase in amplitude with visual event duration and frequency, becoming increasingly clear from primary visual cortex to lateral occipital visual field maps. Beginning in area MT/V5, we found a gradual transition from monotonic to tuned responses, with response amplitudes peaking at different event timings in different recording sites. While monotonic response components were limited to the retinotopic location of the visual stimulus, timing-tuned response components were independent of the recording sites' preferred visual field positions. These tuned responses formed a network of topographically organized timing maps in superior parietal, postcentral and frontal areas. From anterior to posterior timing maps, multiple events were increasingly integrated, response selectivity narrowed, and responses focused increasingly on the middle of the presented timing range. These results suggest that responses to event timing are transformed from the human brain’s sensory areas to the association cortices, with the event’s temporal properties being increasingly abstracted from the response dynamics and locations of early sensory processing. The resulting abstracted representation of event timing is then propagated through areas implicated in multisensory integration and action planning.

SeminarNeuroscienceRecording

A Framework for a Conscious AI: Viewing Consciousness through a Theoretical Computer Science Lens

Lenore and Manuel Blum
Carnegie Mellon University
Aug 4, 2022

We examine consciousness from the perspective of theoretical computer science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations. We propose a formal TCS model, the Conscious Turing Machine (CTM). The CTM is influenced by Alan Turing's simple yet powerful model of computation, the Turing machine (TM), and by the global workspace theory (GWT) of consciousness originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, Jean-Pierre Changeux, George Mashour, and others. However, the CTM is not a standard Turing Machine. It’s not the input-output map that gives the CTM its feeling of consciousness, but what’s under the hood. Nor is the CTM a standard GW model. In addition to its architecture, what gives the CTM its feeling of consciousness is its predictive dynamics (cycles of prediction, feedback and learning), its internal multi-modal language Brainish, and certain special Long Term Memory (LTM) processors, including its Inner Speech and Model of the World processors. Phenomena generally associated with consciousness, such as blindsight, inattentional blindness, change blindness, dream creation, and free will, are considered. Explanations derived from the model draw confirmation from consistencies at a high level, well above the level of neurons, with the cognitive neuroscience literature. Reference. L. Blum and M. Blum, "A theory of consciousness from a theoretical computer science perspective: Insights from the Conscious Turing Machine," PNAS, vol. 119, no. 21, 24 May 2022. https://www.pnas.org/doi/epdf/10.1073/pnas.2115934119

SeminarNeuroscience

Language Representations in the Human Brain: A naturalistic approach

Fatma Deniz
TU Berlin & Berkeley
Apr 26, 2022

Natural language is strongly context-dependent and can be perceived through different sensory modalities. For example, humans can easily comprehend the meaning of complex narratives presented through auditory speech, written text, or visual images. To understand how complex language-related information is represented in the human brain there is a necessity to map the different linguistic and non-linguistic information perceived under different modalities across the cerebral cortex. To map this information to the brain, I suggest following a naturalistic approach and observing the human brain performing tasks in its naturalistic setting, designing quantitative models that transform real-world stimuli into specific hypothesis-related features, and building predictive models that can relate these features to brain responses. In my talk, I will present models of brain responses collected using functional magnetic resonance imaging while human participants listened to or read natural narrative stories. Using natural text and vector representations derived from natural language processing tools I will present how we can study language processing in the human brain across modalities, in different levels of temporal granularity, and across different languages.

SeminarNeuroscienceRecording

Artificial Intelligence and Racism – What are the implications for scientific research?

ALBA Network
Mar 6, 2022

As questions of race and justice have risen to the fore across the sciences, the ALBA Network has invited Dr Shakir Mohamed (Senior Research Scientist at DeepMind, UK) to provide a keynote speech on Artificial Intelligence and racism, and the implications for scientific research, that will be followed by a discussion chaired by Dr Konrad Kording (Department of Neuroscience at University of Pennsylvania, US - neuromatch co-founder)

SeminarNeuroscience

Electrophysiological investigations of natural speech and language processing

Edmund Lalor
University of Rochester, USA
Feb 13, 2022
SeminarNeuroscience

Representation of speech temporal structure in human cortex

Yulia Oganian
Werner Reichardt Centre for Integrative Neuroscience (CIN), Tübingen
Feb 2, 2022
SeminarNeuroscience

Hearing in an acoustically varied world

Kerry Walker
University of Oxford
Jan 24, 2022

In order for animals to thrive in their complex environments, their sensory systems must form representations of objects that are invariant to changes in some dimensions of their physical cues. For example, we can recognize a friend’s speech in a forest, a small office, and a cathedral, even though the sound reaching our ears will be very different in these three environments. I will discuss our recent experiments into how neurons in auditory cortex can form stable representations of sounds in this acoustically varied world. We began by using a normative computational model of hearing to examine how the brain may recognize a sound source across rooms with different levels of reverberation. The model predicted that reverberations can be removed from the original sound by delaying the inhibitory component of spectrotemporal receptive fields in the presence of stronger reverberation. Our electrophysiological recordings then confirmed that neurons in ferret auditory cortex apply this algorithm to adapt to different room sizes. Our results demonstrate that this neural process is dynamic and adaptive. These studies provide new insights into how we can recognize auditory objects even in highly reverberant environments, and direct further research questions about how reverb adaptation is implemented in the cortical circuit.

SeminarNeuroscienceRecording

Development of multisensory perception and attention and their role in audiovisual speech processing

David Lewkowicz
Haskins Labs & Yale Child Study Ctr.
Oct 20, 2021
SeminarNeuroscience

Speak your mind: cortical predictions of speech sensory feedback

Caroline Niziolek
University of Wisconsin, USA
Oct 20, 2021
SeminarNeuroscienceRecording

Encoding and perceiving the texture of sounds: auditory midbrain codes for recognizing and categorizing auditory texture and for listening in noise

Monty Escabi
University of Connecticut
Sep 30, 2021

Natural soundscapes such as from a forest, a busy restaurant, or a busy intersection are generally composed of a cacophony of sounds that the brain needs to interpret either independently or collectively. In certain instances sounds - such as from moving cars, sirens, and people talking - are perceived in unison and are recognized collectively as single sound (e.g., city noise). In other instances, such as for the cocktail party problem, multiple sounds compete for attention so that the surrounding background noise (e.g., speech babble) interferes with the perception of a single sound source (e.g., a single talker). I will describe results from my lab on the perception and neural representation of auditory textures. Textures, such as a from a babbling brook, restaurant noise, or speech babble are stationary sounds consisting of multiple independent sound sources that can be quantitatively defined by summary statistics of an auditory model (McDermott & Simoncelli 2011). How and where in the auditory system are summary statistics represented and the neural codes that potentially contribute towards their perception, however, are largely unknown. Using high-density multi-channel recordings from the auditory midbrain of unanesthetized rabbits and complementary perceptual studies on human listeners, I will first describe neural and perceptual strategies for encoding and perceiving auditory textures. I will demonstrate how distinct statistics of sounds, including the sound spectrum and high-order statistics related to the temporal and spectral correlation structure of sounds, contribute to texture perception and are reflected in neural activity. Using decoding methods I will then demonstrate how various low and high-order neural response statistics can differentially contribute towards a variety of auditory tasks including texture recognition, discrimination, and categorization. Finally, I will show examples from our recent studies on how high-order sound statistics and accompanying neural activity underlie difficulties for recognizing speech in background noise.

SeminarNeuroscienceRecording

Multisensory speech perception

Michael Beauchamp
University of Pennsylvania
Sep 15, 2021
SeminarNeuroscience

Exploring the neurogenetic basis of speech, language, and vocal communication

Sonja Vernes
Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands
Sep 15, 2021
SeminarNeuroscienceRecording

Speech as a biomarker in ataxia: What can it tell us and how should we use it?

Adam Vogel
University of Melbourne, Australia
Jul 5, 2021
SeminarPsychology

The Jena Voice Learning and Memory Test (JVLMT)

Romi Zäske
University of Jena
May 26, 2021

The ability to recognize someone’s voice spans a broad spectrum with phonagnosia on the low end and super recognition at the high end. Yet there is no standardized test to measure the individual ability to learn and recognize newly-learnt voices with samples of speech-like phonetic variability. We have developed the Jena Voice Learning and Memory Test (JVLMT), a 20 min-test based on item response theory and applicable across different languages. The JVLMT consists of three phases in which participants are familiarized with eight speakers in two stages and then perform a three-alternative forced choice recognition task, using pseudo sentences devoid of semantic content. Acoustic (dis)similarity analyses were used to create items with different levels of difficulty. Test scores are based on 22 Rasch-conform items. Items were selected and validated in online studies based on 232 and 454 participants, respectively. Mean accuracy is 0.51 with an SD of .18. The JVLMT showed high and moderate correlations with convergent validation tests (Bangor Voice Matching Test; Glasgow Voice Memory Test) and a weak correlation with a discriminant validation test (Digit Span). Empirical (marginal) reliability is 0.66. Four participants with super recognition (at least 2 SDs above the mean) and 7 participants with phonagnosia (at least 2 SDs below the mean) were identified. The JVLMT is a promising screen too for voice recognition abilities in a scientific and neuropsychological context.

SeminarNeuroscienceRecording

Direction selectivity in hearing: monaural phase sensitivity in octopus neurons

Philip Joris
KU Leuven
May 16, 2021

The processing of temporal sound features is fundamental to hearing, and the auditory system displays a plethora of specializations, at many levels, to enable such processing. Octopus neurons are the most extreme temporally-specialized cells in the auditory (and perhaps entire) brain, which make them intriguing but also difficult to study. Notwithstanding the scant physiological data, these neurons have been a favorite cell type of modeling studies which have proposed that octopus cells have critical roles in pitch and speech perception. We used a range of in vivo recording and labeling methods to examine the hypothesis that tonotopic ordering of cochlear afferents combines with dendritic delays to compensate for cochlear delay - which would explain the highly entrained responses of octopus cells to sound transients. Unexpectedly, the experiments revealed that these neurons have marked selectivity to the direction of fast frequency glides, which is tied in a surprising way to intrinsic membrane properties and subthreshold events. The data suggest that octopus cells have a role in temporal comparisons across frequency and may play a role in auditory scene analysis.

SeminarNeuroscience

Learning Speech Perception and Action through Sensorimotor Interactions

Shihab Shamma
University of Maryland
Mar 28, 2021
SeminarNeuroscienceRecording

Decoding the neural processing of speech

Tobias Reichenbach
Friedrich-Alexander-University
Mar 22, 2021

Understanding speech in noisy backgrounds requires selective attention to a particular speaker. Humans excel at this challenging task, while current speech recognition technology still struggles when background noise is loud. The neural mechanisms by which we process speech remain, however, poorly understood, not least due to the complexity of natural speech. Here we describe recent progress obtained through applying machine-learning to neuroimaging data of humans listening to speech in different types of background noise. In particular, we develop statistical models to relate characteristic features of speech such as pitch, amplitude fluctuations and linguistic surprisal to neural measurements. We find neural correlates of speech processing both at the subcortical level, related to the pitch, as well as at the cortical level, related to amplitude fluctuations and linguistic structures. We also show that some of these measures allow to diagnose disorders of consciousness. Our findings may be applied in smart hearing aids that automatically adjust speech processing to assist a user, as well as in the diagnosis of brain disorders.

SeminarNeuroscienceRecording

Do deep learning latent spaces resemble human brain representations?

Rufin VanRullen
Centre de Recherche Cerveau et Cognition (CERCO)
Mar 11, 2021

In recent years, artificial neural networks have demonstrated human-like or super-human performance in many tasks including image or speech recognition, natural language processing (NLP), playing Go, chess, poker and video-games. One remarkable feature of the resulting models is that they can develop very intuitive latent representations of their inputs. In these latent spaces, simple linear operations tend to give meaningful results, as in the well-known analogy QUEEN-WOMAN+MAN=KING. We postulate that human brain representations share essential properties with these deep learning latent spaces. To verify this, we test whether artificial latent spaces can serve as a good model for decoding brain activity. We report improvements over state-of-the-art performance for reconstructing seen and imagined face images from fMRI brain activation patterns, using the latent space of a GAN (Generative Adversarial Network) model coupled with a Variational AutoEncoder (VAE). With another GAN model (BigBiGAN), we can decode and reconstruct natural scenes of any category from the corresponding brain activity. Our results suggest that deep learning can produce high-level representations approaching those found in the human brain. Finally, I will discuss whether these deep learning latent spaces could be relevant to the study of consciousness.

SeminarNeuroscienceRecording

Kamala Harris and the Construction of Complex Ethnolinguistic Political Identity

Nicole Holliday
University of Pennsylvania
Feb 25, 2021

Over the past 50 years, sociolinguistic studies on black Americans have expanded in both theoretical and technical scope, and newer research has moved beyond seeing speakers, especially black speakers, as a monolithic sociolinguistic community (Wolfram 2007, Blake 2014). Yet there remains a dearth of critical work on complex identities existing within black American communities as well as how these identities are reflected and perceived in linguistic practice. At the same time, linguists have begun to take greater interest in the ways in which public figures, such as politicians, may illuminate the wider social meaning of specific linguistic variables. In this talk, I will present results from analyses of multiple aspects of ethnolinguistic variation in the speech of Vice President Kamala Harris during the 2019-2020 Democratic Party Primary debates. Together, these results show how VP Harris expertly employs both enregistered and subtle linguistic variables, including aspects of African American Language morphosyntax, vowels, and intonational phonology in the construction and performance of a highly specific sociolinguistic identity that reflects her unique positions politically, socially, and racially. The results of this study expand our knowledge about how the complexities of speaker identity are reflected in sociolinguistic variation, as well as press on the boundaries of what we know about how speakers in the public sphere use variation to reflect both who they are and who we want them to be.

SeminarNeuroscienceRecording

Space for Thinking - Spatial Reference Frames and Abstract Concepts

Ariel Starr
University of Washington
Dec 9, 2020

People from cultures around the world tend to borrow from the domain of space to represent abstract concepts. For example, in the domain on time, we use spatial metaphors (e.g., describing the future as being in front and the past behind), accompany our speech with spatial gestures (e.g., gesturing to the left to refer to a past event), and use external tools that project time onto a spatial reference frame (e.g., calendars). Importantly, these associations are also present in the way we think and reason about time, suggesting that space and time are also linked in the mind. In this talk, I will explore the developmental origins and functional implications of these types of cross-dimensional associations. To start, I will discuss the roles that language and culture play in shaping how children in the US and India represent time. Next, I will use word learning and memory as test cases for exploring why cross-dimensional associations may be cognitively advantageous. Finally, I will talk about future directions and the practical implications for this line of work, with a focus on how encouraging spatial representations of abstract concepts could improve learning outcomes.

SeminarNeuroscience

Low dimensional models and electrophysiological experiments to study neural dynamics in songbirds

Ana Amador
University of Buenos Aires
Dec 1, 2020

Birdsong emerges when a set of highly interconnected brain areas manage to generate a complex output. The similarities between birdsong production and human speech have positioned songbirds as unique animal models for studying learning and production of this complex motor skill. In this work, we developed a low dimensional model for a neural network in which the variables were the average activities of different neural populations within the nuclei of the song system. This neural network is active during production, perception and learning of birdsong. We performed electrophysiological experiments to record neural activity from one of these nuclei and found that the low dimensional model could reproduce the neural dynamics observed during the experiments. Also, this model could reproduce the respiratory motor patterns used to generate song. We showed that sparse activity in one of the neural nuclei could drive a more complex activity downstream in the neural network. This interdisciplinary work shows how low dimensional neural models can be a valuable tool for studying the emergence of complex motor tasks

SeminarNeuroscience

Monkey Talk – what studies about nonhuman primate vocal communication reveal about the evolution of speech

Julia Fischer
Deutsche Primate Center
Oct 20, 2020

The evolution of speech is considered to be one of the hardest problems in science. Studies of the communicative abilities of our closest living relatives, the nonhuman primates, aim to contribute to a better understanding of the emergence of this uniquely human capability. Following a brief introduction over the key building blocks that make up the human speech faculty, I will focus on the question of meaning in nonhuman primate vocalizations. While nonhuman primate calls may be highly context specific, thus giving rise to the notion of ‘referentiality’, comparisons across closely related species suggest that this specificity is evolved rather than learned. Yet, as in humans, the structure of calls varies with arousal and affective state, and there is some evidence for effects of sensory-motor integration in vocal production. Thus, the vocal production of nonhuman primates bears little resemblance to the symbolic and combinatorial features of human speech, while basic production mechanisms are shared. Listeners, in contrast, are able learning the meaning of new sounds. A recent study using artificial predator shows that this learning may be extremely rapid. Furthermore, listeners are able to integrate information from multiple sources to make adaptive decisions, which renders the vocal communication system as a whole relatively flexible and powerful. In conclusion, constraints at the side of vocal production, including limits in social cognition and motivation to share experiences, rather than constraints at the side of the recipient explain the differences in communicative abilities between humans and other animals.

SeminarNeuroscienceRecording

Neural control of vocal interactions in songbirds

Daniela Vallentin
Max Planck Institute for Ornithology
May 14, 2020

During conversations we rapidly switch between listening and speaking which often requires withholding or delaying our speech in order to hear others and avoid overlapping. This capacity for vocal turn-taking is exhibited by non-linguistic species as well, however the neural circuit mechanisms that enable us to regulate the precise timing of our vocalizations during interactions are unknown. We aim to identify the neural mechanisms underlying the coordination of vocal interactions. Therefore, we paired zebra finches with a vocal robot (1Hz call playback) and measured the bird’s call response times. We found that individual birds called with a stereotyped delay in respect to the robot call. Pharmacological inactivation of the premotor nucleus HVC revealed its necessity for the temporal coordination of calls. We further investigated the contributing neural activity within HVC by performing intracellular recordings from premotor neurons and inhibitory interneurons in calling zebra finches. We found that inhibition is preceding excitation before and during call onset. To test whether inhibition guides call timing we pharmacologically limited the impact of inhibition on premotor neurons. As a result zebra finches converged on a similar delay time i.e. birds called more rapidly after the vocal robot call suggesting that HVC inhibitory interneurons regulate the coordination of social contact calls. In addition, we aim to investigate the vocal turn-taking capabilities of the common nightingale. Male nightingales learn over 100 different song motifs which are being used in order to attract mates or defend territories. Previously, it has been shown that nightingales counter-sing with each other following a similar temporal structure to human vocal turn-taking. These animals are also able to spontaneously imitate a motif of another nightingale. The neural mechanisms underlying this behaviour are not yet understood. In my lab, we further probe the capabilities of these animals in order to access the dynamic range of their vocal turn taking flexibility.

ePoster

Rhythm-structured predictive coding for contextualized speech processing

Olesia Dogonasheva, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin

Bernstein Conference 2024

ePoster

Brain-Rhythm-based Inference (BRyBI) for time-scale invariant speech processing

Olesia Dogonasheva, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin

COSYNE 2023

ePoster

Cross-trial alignment reveals a low-dimensional cortical manifold of naturalistic speech production

Cheol Jun Cho, Edward Chang, Gopala Anumanchipalli

COSYNE 2023

ePoster

Altered sensory prediction error signaling and dopamine function drive speech hallucinations in schizophrenia

Justin Buck, Mark Slifstein, Jodi Weinstein, Roberto Gil, Jared Van Snellenberg, Christoph Juchem, Anissa Abi-Dargham, Guillermo Horga

COSYNE 2025

ePoster

Bayesian integration of audiovisual speech by DNN models is similar to human observers

Haotian Ma, Xiang Zhang, Zhengjia Wang, John F. Magnotti, Michael S. Beauchamp

COSYNE 2025

ePoster

Geometric Signatures of Speech Recognition: Insights from Deep Neural Networks to the Brain

Jiaqi Shang, Shailee Jain, Haim Sompolinsky, Edward Chang

COSYNE 2025

ePoster

Human precentral gyrus neurons link speech sequences from listening to speaking

Duo Xu, Jason Chung, Quinn Greicius, Yizhen Zhang, Matthew Leonard, Edward Chang

COSYNE 2025

ePoster

Attentional modulation of the cortical contribution to the frequency-following response evoked by continuous speech

Alina Schüller, Achim Schilling, Patrick Krauss, Stefan Rampp, Tobias Reichenbach

FENS Forum 2024

ePoster

EEG beta de-synchronization signs the efficacy of a rehabilitation treatment for speech impairment in Parkinson’s disease population

Giovanni Vecchiato, Chiara Palmisano, Elena Hilary Rondoni, Ioannis Ugo Isaias, Daniele Volpe, Alberto Mazzoni

FENS Forum 2024

ePoster

Brain-rhythm-based inference (BRyBI) for time-scale invariant speech processing

Olesia Dogonasheva, Olesia Platonova, Sophie Bouton, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin

FENS Forum 2024

ePoster

The cortical frequency-following response to continuous speech in musicians and non-musicians

Jasmin Riegel, Alina Schueller, Achim Schilling, Patrick Krauss, Tobias Reichenbach

FENS Forum 2024

ePoster

Decoding envelope and frequency-following responses to speech using deep neural networks

Michael Thornton, Danilo Mandic, Tobias Reichenbach

FENS Forum 2024

ePoster

Decoding of selective attention to speech in CI patients using linear and non-linear methods

Constantin Jehn, Adrian Kossmann, Anja Hahne, Niki Vavatzanidis, Tobias Reichenbach

FENS Forum 2024

ePoster

Decoding spatiotemporal processing of speech and melody in the brain

Akanksha Gupta, Agnès Trébuchon, Benjamin Morillon

FENS Forum 2024

ePoster

The effects and interactions of top-down influences on speech perception

Reuben Chaudhuri, Ryszard Auksztulewicz, Ruofan Wu, Colin Blakemore, Jan Schnupp

FENS Forum 2024

ePoster

EEG-based source analysis of the neural response at the fundamental frequency of speech

Jonas Auernheimer, Tobias Reichenbach

FENS Forum 2024

ePoster

Examining speech disfluency through the analysis of grey matter densities in 5-year-olds using voxel-based morphometry

Ashmeet Jolly, Elmo Pulli, Henry Railo, Elina Mainela-Arnold, Jetro Tuulari

FENS Forum 2024

ePoster

The neural processing of natural audiovisual speech in noise in autism: A TRF approach

Theo Vanneau, Michael Crosse, John Foxe, Sophie Molholm

FENS Forum 2024

ePoster

Web-based speech transcription tool for efficient quantification of memory performance

Marina Galanina, Kucewicz Michal Tomasz, Jesus Salvador Garcia-Salinas, Sathwik Prathapagiri, Nastaran Hamedi, Maria Renke

FENS Forum 2024