speech
Latest
Simulating Thought Disorder: Fine-Tuning Llama-2 for Synthetic Speech in Schizophrenia
Relating circuit dynamics to computation: robustness and dimension-specific computation in cortical dynamics
Neural dynamics represent the hard-to-interpret substrate of circuit computations. Advances in large-scale recordings have highlighted the sheer spatiotemporal complexity of circuit dynamics within and across circuits, portraying in detail the difficulty of interpreting such dynamics and relating it to computation. Indeed, even in extremely simplified experimental conditions, one observes high-dimensional temporal dynamics in the relevant circuits. This complexity can be potentially addressed by the notion that not all changes in population activity have equal meaning, i.e., a small change in the evolution of activity along a particular dimension may have a bigger effect on a given computation than a large change in another. We term such conditions dimension-specific computation. Considering motor preparatory activity in a delayed response task we utilized neural recordings performed simultaneously with optogenetic perturbations to probe circuit dynamics. First, we revealed a remarkable robustness in the detailed evolution of certain dimensions of the population activity, beyond what was thought to be the case experimentally and theoretically. Second, the robust dimension in activity space carries nearly all of the decodable behavioral information whereas other non-robust dimensions contained nearly no decodable information, as if the circuit was setup to make informative dimensions stiff, i.e., resistive to perturbations, leaving uninformative dimensions sloppy, i.e., sensitive to perturbations. Third, we show that this robustness can be achieved by a modular organization of circuitry, whereby modules whose dynamics normally evolve independently can correct each other’s dynamics when an individual module is perturbed, a common design feature in robust systems engineering. Finally, we will recent work extending this framework to understanding the neural dynamics underlying preparation of speech.
The representation of speech conversations in the human auditory cortex
LLMs and Human Language Processing
This webinar convened researchers at the intersection of Artificial Intelligence and Neuroscience to investigate how large language models (LLMs) can serve as valuable “model organisms” for understanding human language processing. Presenters showcased evidence that brain recordings (fMRI, MEG, ECoG) acquired while participants read or listened to unconstrained speech can be predicted by representations extracted from state-of-the-art text- and speech-based LLMs. In particular, text-based LLMs tend to align better with higher-level language regions, capturing more semantic aspects, while speech-based LLMs excel at explaining early auditory cortical responses. However, purely low-level features can drive part of these alignments, complicating interpretations. New methods, including perturbation analyses, highlight which linguistic variables matter for each cortical area and time scale. Further, “brain tuning” of LLMs—fine-tuning on measured neural signals—can improve semantic representations and downstream language tasks. Despite open questions about interpretability and exact neural mechanisms, these results demonstrate that LLMs provide a promising framework for probing the computations underlying human language comprehension and production at multiple spatiotemporal scales.
Sophie Scott - The Science of Laughter from Evolution to Neuroscience
Keynote Address to British Association of Cognitive Neuroscience, London, 10th September 2024
Exploring the cerebral mechanisms of acoustically-challenging speech comprehension - successes, failures and hope
Comprehending speech under acoustically challenging conditions is an everyday task that we can often execute with ease. However, accomplishing this requires the engagement of cognitive resources, such as auditory attention and working memory. The mechanisms that contribute to the robustness of speech comprehension are of substantial interest in the context of hearing mild to moderate hearing impairment, in which affected individuals typically report specific difficulties in understanding speech in background noise. Although hearing aids can help to mitigate this, they do not represent a universal solution, thus, finding alternative interventions is necessary. Given that age-related hearing loss (“presbycusis”) is inevitable, developing new approaches is all the more important in the context of aging populations. Moreover, untreated hearing loss in middle age has been identified as the most significant potentially modifiable predictor of dementia in later life. I will present research that has used a multi-methodological approach (fMRI, EEG, MEG and non-invasive brain stimulation) to try to elucidate the mechanisms that comprise the cognitive “last mile” in speech acousticallychallenging speech comprehension and to find ways to enhance them.
Dyslexia, Rhythm, Language and the Developing Brain
Recent insights from auditory neuroscience provide a new perspective on how the brain encodes speech. Using these recent insights, I will provide an overview of key factors underpinning individual differences in children’s development of language and phonology, providing a context for exploring atypical reading development (dyslexia). Children with dyslexia are relatively insensitive to acoustic cues related to speech rhythm patterns. This lack of rhythmic sensitivity is related to the atypical neural encoding of rhythm patterns in speech by the brain. I will describe our recent data from infants as well as children, demonstrating developmental continuity in the key neural variables.
Silences, Spikes and Bursts: Three-Part Knot of the Neural Code
When a neuron breaks silence, it can emit action potentials in a number of patterns. Some responses are so sudden and intense that electrophysiologists felt the need to single them out, labeling action potentials emitted at a particularly high frequency with a metonym – bursts. Is there more to bursts than a figure of speech? After all, sudden bouts of high-frequency firing are expected to occur whenever inputs surge. In this talk, I will discuss the implications of seeing the neural code as having three syllables: silences, spikes and bursts. In particular, I will describe recent theoretical and experimental results that implicate bursting in the implementation of top-down attention and the coordination of learning.
Motor contribution to auditory temporal predictions
Temporal predictions are fundamental instruments for facilitating sensory selection, allowing humans to exploit regularities in the world. Recent evidence indicates that the motor system instantiates predictive timing mechanisms, helping to synchronize temporal fluctuations of attention with the timing of events in a task-relevant stream, thus facilitating sensory selection. Accordingly, in the auditory domain auditory-motor interactions are observed during perception of speech and music, two temporally structured sensory streams. I will present a behavioral and neurophysiological account for this theory and will detail the parameters governing the emergence of this auditory-motor coupling, through a set of behavioral and magnetoencephalography (MEG) experiments.
Pitch and Time Interact in Auditory Perception
Research into pitch perception and time perception has typically treated the two as independent processes. However, previous studies of music and speech perception have suggested that pitch and timing information may be processed in an integrated manner, such that the pitch of an auditory stimulus can influence a person’s perception, expectation, and memory of its duration and tempo. Typically, higher-pitched sounds are perceived as faster and longer in duration than lower-pitched sounds with identical timing. We conducted a series of experiments to better understand the limits of this pitch-time integrality. Across several experiments, we tested whether the higher-equals-faster illusion generalizes across the broader frequency range of human hearing by asking participants to compare the tempo of a repeating tone played in one of six octaves to a metronomic standard. When participants heard tones from all six octaves, we consistently found an inverted U-shaped effect of the tone’s pitch height, such that perceived tempo peaked between A4 (440 Hz) and A5 (880 Hz) and decreased at lower and higher octaves. However, we found that the decrease in perceived tempo at extremely high octaves could be abolished by exposing participants to high-pitched tones only, suggesting that pitch-induced timing biases are context sensitive. We additionally tested how the timing of an auditory stimulus influences the perception of its pitch, using a pitch discrimination task in which probe tones occurred early, late, or on the beat within a rhythmic context. Probe timing strongly biased participants to rate later tones as lower in pitch than earlier tones. Together, these results suggest that pitch and time exert a bidirectional influence on one another, providing evidence for integrated processing of pitch and timing information in auditory perception. Identifying the mechanisms behind this pitch-time interaction will be critical for integrating current models of pitch and tempo processing.
Hierarchical transformation of visual event timing representations in the human brain: response dynamics in early visual cortex and timing-tuned responses in association cortices
Quantifying the timing (duration and frequency) of brief visual events is vital to human perception, multisensory integration and action planning. For example, this allows us to follow and interact with the precise timing of speech and sports. Here we investigate how visual event timing is represented and transformed across the brain’s hierarchy: from sensory processing areas, through multisensory integration areas, to frontal action planning areas. We hypothesized that the dynamics of neural responses to sensory events in sensory processing areas allows derivation of event timing representations. This would allow higher-level processes such as multisensory integration and action planning to use sensory timing information, without the need for specialized central pacemakers or processes. Using 7T fMRI and neural model-based analyses, we found responses that monotonically increase in amplitude with visual event duration and frequency, becoming increasingly clear from primary visual cortex to lateral occipital visual field maps. Beginning in area MT/V5, we found a gradual transition from monotonic to tuned responses, with response amplitudes peaking at different event timings in different recording sites. While monotonic response components were limited to the retinotopic location of the visual stimulus, timing-tuned response components were independent of the recording sites' preferred visual field positions. These tuned responses formed a network of topographically organized timing maps in superior parietal, postcentral and frontal areas. From anterior to posterior timing maps, multiple events were increasingly integrated, response selectivity narrowed, and responses focused increasingly on the middle of the presented timing range. These results suggest that responses to event timing are transformed from the human brain’s sensory areas to the association cortices, with the event’s temporal properties being increasingly abstracted from the response dynamics and locations of early sensory processing. The resulting abstracted representation of event timing is then propagated through areas implicated in multisensory integration and action planning.
A Framework for a Conscious AI: Viewing Consciousness through a Theoretical Computer Science Lens
We examine consciousness from the perspective of theoretical computer science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations. We propose a formal TCS model, the Conscious Turing Machine (CTM). The CTM is influenced by Alan Turing's simple yet powerful model of computation, the Turing machine (TM), and by the global workspace theory (GWT) of consciousness originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, Jean-Pierre Changeux, George Mashour, and others. However, the CTM is not a standard Turing Machine. It’s not the input-output map that gives the CTM its feeling of consciousness, but what’s under the hood. Nor is the CTM a standard GW model. In addition to its architecture, what gives the CTM its feeling of consciousness is its predictive dynamics (cycles of prediction, feedback and learning), its internal multi-modal language Brainish, and certain special Long Term Memory (LTM) processors, including its Inner Speech and Model of the World processors. Phenomena generally associated with consciousness, such as blindsight, inattentional blindness, change blindness, dream creation, and free will, are considered. Explanations derived from the model draw confirmation from consistencies at a high level, well above the level of neurons, with the cognitive neuroscience literature. Reference. L. Blum and M. Blum, "A theory of consciousness from a theoretical computer science perspective: Insights from the Conscious Turing Machine," PNAS, vol. 119, no. 21, 24 May 2022. https://www.pnas.org/doi/epdf/10.1073/pnas.2115934119
Language Representations in the Human Brain: A naturalistic approach
Natural language is strongly context-dependent and can be perceived through different sensory modalities. For example, humans can easily comprehend the meaning of complex narratives presented through auditory speech, written text, or visual images. To understand how complex language-related information is represented in the human brain there is a necessity to map the different linguistic and non-linguistic information perceived under different modalities across the cerebral cortex. To map this information to the brain, I suggest following a naturalistic approach and observing the human brain performing tasks in its naturalistic setting, designing quantitative models that transform real-world stimuli into specific hypothesis-related features, and building predictive models that can relate these features to brain responses. In my talk, I will present models of brain responses collected using functional magnetic resonance imaging while human participants listened to or read natural narrative stories. Using natural text and vector representations derived from natural language processing tools I will present how we can study language processing in the human brain across modalities, in different levels of temporal granularity, and across different languages.
Artificial Intelligence and Racism – What are the implications for scientific research?
As questions of race and justice have risen to the fore across the sciences, the ALBA Network has invited Dr Shakir Mohamed (Senior Research Scientist at DeepMind, UK) to provide a keynote speech on Artificial Intelligence and racism, and the implications for scientific research, that will be followed by a discussion chaired by Dr Konrad Kording (Department of Neuroscience at University of Pennsylvania, US - neuromatch co-founder)
Electrophysiological investigations of natural speech and language processing
Representation of speech temporal structure in human cortex
Towards an inclusive neurobiology of language
Understanding how our brains process language is one of the fundamental issues in cognitive science. In order to reach such understanding, it is critical to cover the full spectrum of manners in which humans acquire and experience language. However, due to a myriad of socioeconomic factors, research has disproportionately focused on monolingual English speakers. In this talk, I present a series of studies that systematically target fundamental questions about bilingual language use across a range of conversational contexts, both in production and comprehension. The results lay the groundwork to propose a more inclusive theory of the neurobiology of language, with an architecture that assumes a common selection principle at each linguistic level and can account for attested features of both bilingual and monolingual speech in, but crucially also out of, experimental settings.
Hearing in an acoustically varied world
In order for animals to thrive in their complex environments, their sensory systems must form representations of objects that are invariant to changes in some dimensions of their physical cues. For example, we can recognize a friend’s speech in a forest, a small office, and a cathedral, even though the sound reaching our ears will be very different in these three environments. I will discuss our recent experiments into how neurons in auditory cortex can form stable representations of sounds in this acoustically varied world. We began by using a normative computational model of hearing to examine how the brain may recognize a sound source across rooms with different levels of reverberation. The model predicted that reverberations can be removed from the original sound by delaying the inhibitory component of spectrotemporal receptive fields in the presence of stronger reverberation. Our electrophysiological recordings then confirmed that neurons in ferret auditory cortex apply this algorithm to adapt to different room sizes. Our results demonstrate that this neural process is dynamic and adaptive. These studies provide new insights into how we can recognize auditory objects even in highly reverberant environments, and direct further research questions about how reverb adaptation is implemented in the cortical circuit.
Conflict in Multisensory Perception
Multisensory perception is often studied through the effects of inter-sensory conflict, such as in the McGurk effect, the Ventriloquist illusion, and the Rubber Hand Illusion. Moreover, Bayesian approaches to cue fusion and causal inference overwhelmingly draw on cross-modal conflict to measure and to model multisensory perception. Given the prevalence of conflict, it is remarkable that accounts of multisensory perception have so far neglected the theory of conflict monitoring and cognitive control, established about twenty years ago. I hope to make a case for the role of conflict monitoring and resolution during multisensory perception. To this end, I will present EEG and fMRI data showing that cross-modal conflict in speech, resulting in either integration or segregation, triggers neural mechanisms of conflict detection and resolution. I will also present data supporting a role of these mechanisms during perceptual conflict in general, using Binocular Rivalry, surrealistic imagery, and cinema. Based on this preliminary evidence, I will argue that it is worth considering the potential role of conflict in multisensory perception and its incorporation in a causal inference framework. Finally, I will raise some potential problems associated with this proposal.
Development of multisensory perception and attention and their role in audiovisual speech processing
Speak your mind: cortical predictions of speech sensory feedback
Encoding and perceiving the texture of sounds: auditory midbrain codes for recognizing and categorizing auditory texture and for listening in noise
Natural soundscapes such as from a forest, a busy restaurant, or a busy intersection are generally composed of a cacophony of sounds that the brain needs to interpret either independently or collectively. In certain instances sounds - such as from moving cars, sirens, and people talking - are perceived in unison and are recognized collectively as single sound (e.g., city noise). In other instances, such as for the cocktail party problem, multiple sounds compete for attention so that the surrounding background noise (e.g., speech babble) interferes with the perception of a single sound source (e.g., a single talker). I will describe results from my lab on the perception and neural representation of auditory textures. Textures, such as a from a babbling brook, restaurant noise, or speech babble are stationary sounds consisting of multiple independent sound sources that can be quantitatively defined by summary statistics of an auditory model (McDermott & Simoncelli 2011). How and where in the auditory system are summary statistics represented and the neural codes that potentially contribute towards their perception, however, are largely unknown. Using high-density multi-channel recordings from the auditory midbrain of unanesthetized rabbits and complementary perceptual studies on human listeners, I will first describe neural and perceptual strategies for encoding and perceiving auditory textures. I will demonstrate how distinct statistics of sounds, including the sound spectrum and high-order statistics related to the temporal and spectral correlation structure of sounds, contribute to texture perception and are reflected in neural activity. Using decoding methods I will then demonstrate how various low and high-order neural response statistics can differentially contribute towards a variety of auditory tasks including texture recognition, discrimination, and categorization. Finally, I will show examples from our recent studies on how high-order sound statistics and accompanying neural activity underlie difficulties for recognizing speech in background noise.
Multisensory speech perception
Exploring the neurogenetic basis of speech, language, and vocal communication
Speech as a biomarker in ataxia: What can it tell us and how should we use it?
Direction selectivity in hearing: monaural phase sensitivity in octopus neurons
The processing of temporal sound features is fundamental to hearing, and the auditory system displays a plethora of specializations, at many levels, to enable such processing. Octopus neurons are the most extreme temporally-specialized cells in the auditory (and perhaps entire) brain, which make them intriguing but also difficult to study. Notwithstanding the scant physiological data, these neurons have been a favorite cell type of modeling studies which have proposed that octopus cells have critical roles in pitch and speech perception. We used a range of in vivo recording and labeling methods to examine the hypothesis that tonotopic ordering of cochlear afferents combines with dendritic delays to compensate for cochlear delay - which would explain the highly entrained responses of octopus cells to sound transients. Unexpectedly, the experiments revealed that these neurons have marked selectivity to the direction of fast frequency glides, which is tied in a surprising way to intrinsic membrane properties and subthreshold events. The data suggest that octopus cells have a role in temporal comparisons across frequency and may play a role in auditory scene analysis.
Learning Speech Perception and Action through Sensorimotor Interactions
Decoding the neural processing of speech
Understanding speech in noisy backgrounds requires selective attention to a particular speaker. Humans excel at this challenging task, while current speech recognition technology still struggles when background noise is loud. The neural mechanisms by which we process speech remain, however, poorly understood, not least due to the complexity of natural speech. Here we describe recent progress obtained through applying machine-learning to neuroimaging data of humans listening to speech in different types of background noise. In particular, we develop statistical models to relate characteristic features of speech such as pitch, amplitude fluctuations and linguistic surprisal to neural measurements. We find neural correlates of speech processing both at the subcortical level, related to the pitch, as well as at the cortical level, related to amplitude fluctuations and linguistic structures. We also show that some of these measures allow to diagnose disorders of consciousness. Our findings may be applied in smart hearing aids that automatically adjust speech processing to assist a user, as well as in the diagnosis of brain disorders.
Do deep learning latent spaces resemble human brain representations?
In recent years, artificial neural networks have demonstrated human-like or super-human performance in many tasks including image or speech recognition, natural language processing (NLP), playing Go, chess, poker and video-games. One remarkable feature of the resulting models is that they can develop very intuitive latent representations of their inputs. In these latent spaces, simple linear operations tend to give meaningful results, as in the well-known analogy QUEEN-WOMAN+MAN=KING. We postulate that human brain representations share essential properties with these deep learning latent spaces. To verify this, we test whether artificial latent spaces can serve as a good model for decoding brain activity. We report improvements over state-of-the-art performance for reconstructing seen and imagined face images from fMRI brain activation patterns, using the latent space of a GAN (Generative Adversarial Network) model coupled with a Variational AutoEncoder (VAE). With another GAN model (BigBiGAN), we can decode and reconstruct natural scenes of any category from the corresponding brain activity. Our results suggest that deep learning can produce high-level representations approaching those found in the human brain. Finally, I will discuss whether these deep learning latent spaces could be relevant to the study of consciousness.
Kamala Harris and the Construction of Complex Ethnolinguistic Political Identity
Over the past 50 years, sociolinguistic studies on black Americans have expanded in both theoretical and technical scope, and newer research has moved beyond seeing speakers, especially black speakers, as a monolithic sociolinguistic community (Wolfram 2007, Blake 2014). Yet there remains a dearth of critical work on complex identities existing within black American communities as well as how these identities are reflected and perceived in linguistic practice. At the same time, linguists have begun to take greater interest in the ways in which public figures, such as politicians, may illuminate the wider social meaning of specific linguistic variables. In this talk, I will present results from analyses of multiple aspects of ethnolinguistic variation in the speech of Vice President Kamala Harris during the 2019-2020 Democratic Party Primary debates. Together, these results show how VP Harris expertly employs both enregistered and subtle linguistic variables, including aspects of African American Language morphosyntax, vowels, and intonational phonology in the construction and performance of a highly specific sociolinguistic identity that reflects her unique positions politically, socially, and racially. The results of this study expand our knowledge about how the complexities of speaker identity are reflected in sociolinguistic variation, as well as press on the boundaries of what we know about how speakers in the public sphere use variation to reflect both who they are and who we want them to be.
Space for Thinking - Spatial Reference Frames and Abstract Concepts
People from cultures around the world tend to borrow from the domain of space to represent abstract concepts. For example, in the domain on time, we use spatial metaphors (e.g., describing the future as being in front and the past behind), accompany our speech with spatial gestures (e.g., gesturing to the left to refer to a past event), and use external tools that project time onto a spatial reference frame (e.g., calendars). Importantly, these associations are also present in the way we think and reason about time, suggesting that space and time are also linked in the mind. In this talk, I will explore the developmental origins and functional implications of these types of cross-dimensional associations. To start, I will discuss the roles that language and culture play in shaping how children in the US and India represent time. Next, I will use word learning and memory as test cases for exploring why cross-dimensional associations may be cognitively advantageous. Finally, I will talk about future directions and the practical implications for this line of work, with a focus on how encouraging spatial representations of abstract concepts could improve learning outcomes.
Low dimensional models and electrophysiological experiments to study neural dynamics in songbirds
Birdsong emerges when a set of highly interconnected brain areas manage to generate a complex output. The similarities between birdsong production and human speech have positioned songbirds as unique animal models for studying learning and production of this complex motor skill. In this work, we developed a low dimensional model for a neural network in which the variables were the average activities of different neural populations within the nuclei of the song system. This neural network is active during production, perception and learning of birdsong. We performed electrophysiological experiments to record neural activity from one of these nuclei and found that the low dimensional model could reproduce the neural dynamics observed during the experiments. Also, this model could reproduce the respiratory motor patterns used to generate song. We showed that sparse activity in one of the neural nuclei could drive a more complex activity downstream in the neural network. This interdisciplinary work shows how low dimensional neural models can be a valuable tool for studying the emergence of complex motor tasks
Monkey Talk – what studies about nonhuman primate vocal communication reveal about the evolution of speech
The evolution of speech is considered to be one of the hardest problems in science. Studies of the communicative abilities of our closest living relatives, the nonhuman primates, aim to contribute to a better understanding of the emergence of this uniquely human capability. Following a brief introduction over the key building blocks that make up the human speech faculty, I will focus on the question of meaning in nonhuman primate vocalizations. While nonhuman primate calls may be highly context specific, thus giving rise to the notion of ‘referentiality’, comparisons across closely related species suggest that this specificity is evolved rather than learned. Yet, as in humans, the structure of calls varies with arousal and affective state, and there is some evidence for effects of sensory-motor integration in vocal production. Thus, the vocal production of nonhuman primates bears little resemblance to the symbolic and combinatorial features of human speech, while basic production mechanisms are shared. Listeners, in contrast, are able learning the meaning of new sounds. A recent study using artificial predator shows that this learning may be extremely rapid. Furthermore, listeners are able to integrate information from multiple sources to make adaptive decisions, which renders the vocal communication system as a whole relatively flexible and powerful. In conclusion, constraints at the side of vocal production, including limits in social cognition and motivation to share experiences, rather than constraints at the side of the recipient explain the differences in communicative abilities between humans and other animals.
Towards a speech neuroprosthesis
I will review advances in understanding the cortical encoding of speech-related oral movements. These discoveries are being translated to develop algorithms to decode speech from population neural activity.
Unsupervised deep learning identifies semantic disentanglement in single inferotemporal neurons
Irina is a research scientist at DeepMind, where she works in the Froniers team. Her work aims to bring together insights from the fields of neuroscience and physics to advance general artificial intelligence through improved representation learning. Before joining DeepMind, Irina was a British Psychological Society Undergraduate Award winner for her achievements as an undergraduate student in Experimental Psychology at Westminster University, followed by a DPhil at the Oxford Centre for Computational Neuroscience and Artificial Intelligence, where she focused on understanding the computational principles underlying speech processing in the auditory brain. During her DPhil, Irina also worked on developing poker AI, applying machine learning in the finance sector, and working on speech recognition at Google Research."" https://arxiv.org/pdf/2006.14304.pdf
Neural control of vocal interactions in songbirds
During conversations we rapidly switch between listening and speaking which often requires withholding or delaying our speech in order to hear others and avoid overlapping. This capacity for vocal turn-taking is exhibited by non-linguistic species as well, however the neural circuit mechanisms that enable us to regulate the precise timing of our vocalizations during interactions are unknown. We aim to identify the neural mechanisms underlying the coordination of vocal interactions. Therefore, we paired zebra finches with a vocal robot (1Hz call playback) and measured the bird’s call response times. We found that individual birds called with a stereotyped delay in respect to the robot call. Pharmacological inactivation of the premotor nucleus HVC revealed its necessity for the temporal coordination of calls. We further investigated the contributing neural activity within HVC by performing intracellular recordings from premotor neurons and inhibitory interneurons in calling zebra finches. We found that inhibition is preceding excitation before and during call onset. To test whether inhibition guides call timing we pharmacologically limited the impact of inhibition on premotor neurons. As a result zebra finches converged on a similar delay time i.e. birds called more rapidly after the vocal robot call suggesting that HVC inhibitory interneurons regulate the coordination of social contact calls. In addition, we aim to investigate the vocal turn-taking capabilities of the common nightingale. Male nightingales learn over 100 different song motifs which are being used in order to attract mates or defend territories. Previously, it has been shown that nightingales counter-sing with each other following a similar temporal structure to human vocal turn-taking. These animals are also able to spontaneously imitate a motif of another nightingale. The neural mechanisms underlying this behaviour are not yet understood. In my lab, we further probe the capabilities of these animals in order to access the dynamic range of their vocal turn taking flexibility.
Rhythm-structured predictive coding for contextualized speech processing
Bernstein Conference 2024
Brain-Rhythm-based Inference (BRyBI) for time-scale invariant speech processing
COSYNE 2023
Cross-trial alignment reveals a low-dimensional cortical manifold of naturalistic speech production
COSYNE 2023
Altered sensory prediction error signaling and dopamine function drive speech hallucinations in schizophrenia
COSYNE 2025
Bayesian integration of audiovisual speech by DNN models is similar to human observers
COSYNE 2025
Geometric Signatures of Speech Recognition: Insights from Deep Neural Networks to the Brain
COSYNE 2025
Human precentral gyrus neurons link speech sequences from listening to speaking
COSYNE 2025
Attentional modulation of the cortical contribution to the frequency-following response evoked by continuous speech
FENS Forum 2024
EEG beta de-synchronization signs the efficacy of a rehabilitation treatment for speech impairment in Parkinson’s disease population
FENS Forum 2024
Brain-rhythm-based inference (BRyBI) for time-scale invariant speech processing
FENS Forum 2024
The cortical frequency-following response to continuous speech in musicians and non-musicians
FENS Forum 2024
Decoding envelope and frequency-following responses to speech using deep neural networks
FENS Forum 2024
Decoding of selective attention to speech in CI patients using linear and non-linear methods
FENS Forum 2024
Decoding spatiotemporal processing of speech and melody in the brain
FENS Forum 2024
EEG-based source analysis of the neural response at the fundamental frequency of speech
FENS Forum 2024
The effects and interactions of top-down influences on speech perception
FENS Forum 2024
Examining speech disfluency through the analysis of grey matter densities in 5-year-olds using voxel-based morphometry
FENS Forum 2024
The neural processing of natural audiovisual speech in noise in autism: A TRF approach
FENS Forum 2024
Web-based speech transcription tool for efficient quantification of memory performance
FENS Forum 2024
speech coverage
55 items