Speech Perception
speech perception
Prosody in the voice, face, and hands changes which words you hear
Speech may be characterized as conveying both segmental information (i.e., about vowels and consonants) as well as suprasegmental information - cued through pitch, intensity, and duration - also known as the prosody of speech. In this contribution, I will argue that prosody shapes low-level speech perception, changing which speech sounds we hear. Perhaps the most notable example of how prosody guides word recognition is the phenomenon of lexical stress, whereby suprasegmental F0, intensity, and duration cues can distinguish otherwise segmentally identical words, such as "PLAto" vs. "plaTEAU" in Dutch. Work from our group showcases the vast variability in how different talkers produce stressed vs. unstressed syllables, while also unveiling the remarkable flexibility with which listeners can learn to handle this between-talker variability. It also emphasizes that lexical stress is a multimodal linguistic phenomenon, with the voice, lips, and even hands conveying stress in concert. In turn, human listeners actively weigh these multisensory cues to stress depending on the listening conditions at hand. Finally, lexical stress is presented as having a robust and lasting impact on low-level speech perception, even down to changing vowel perception. Thus, prosody - in all its multisensory forms - is a potent factor in speech perception, determining what speech sounds we hear.
Motor contribution to auditory temporal predictions
Temporal predictions are fundamental instruments for facilitating sensory selection, allowing humans to exploit regularities in the world. Recent evidence indicates that the motor system instantiates predictive timing mechanisms, helping to synchronize temporal fluctuations of attention with the timing of events in a task-relevant stream, thus facilitating sensory selection. Accordingly, in the auditory domain auditory-motor interactions are observed during perception of speech and music, two temporally structured sensory streams. I will present a behavioral and neurophysiological account for this theory and will detail the parameters governing the emergence of this auditory-motor coupling, through a set of behavioral and magnetoencephalography (MEG) experiments.
Pitch and Time Interact in Auditory Perception
Research into pitch perception and time perception has typically treated the two as independent processes. However, previous studies of music and speech perception have suggested that pitch and timing information may be processed in an integrated manner, such that the pitch of an auditory stimulus can influence a person’s perception, expectation, and memory of its duration and tempo. Typically, higher-pitched sounds are perceived as faster and longer in duration than lower-pitched sounds with identical timing. We conducted a series of experiments to better understand the limits of this pitch-time integrality. Across several experiments, we tested whether the higher-equals-faster illusion generalizes across the broader frequency range of human hearing by asking participants to compare the tempo of a repeating tone played in one of six octaves to a metronomic standard. When participants heard tones from all six octaves, we consistently found an inverted U-shaped effect of the tone’s pitch height, such that perceived tempo peaked between A4 (440 Hz) and A5 (880 Hz) and decreased at lower and higher octaves. However, we found that the decrease in perceived tempo at extremely high octaves could be abolished by exposing participants to high-pitched tones only, suggesting that pitch-induced timing biases are context sensitive. We additionally tested how the timing of an auditory stimulus influences the perception of its pitch, using a pitch discrimination task in which probe tones occurred early, late, or on the beat within a rhythmic context. Probe timing strongly biased participants to rate later tones as lower in pitch than earlier tones. Together, these results suggest that pitch and time exert a bidirectional influence on one another, providing evidence for integrated processing of pitch and timing information in auditory perception. Identifying the mechanisms behind this pitch-time interaction will be critical for integrating current models of pitch and tempo processing.
Multisensory speech perception
Direction selectivity in hearing: monaural phase sensitivity in octopus neurons
The processing of temporal sound features is fundamental to hearing, and the auditory system displays a plethora of specializations, at many levels, to enable such processing. Octopus neurons are the most extreme temporally-specialized cells in the auditory (and perhaps entire) brain, which make them intriguing but also difficult to study. Notwithstanding the scant physiological data, these neurons have been a favorite cell type of modeling studies which have proposed that octopus cells have critical roles in pitch and speech perception. We used a range of in vivo recording and labeling methods to examine the hypothesis that tonotopic ordering of cochlear afferents combines with dendritic delays to compensate for cochlear delay - which would explain the highly entrained responses of octopus cells to sound transients. Unexpectedly, the experiments revealed that these neurons have marked selectivity to the direction of fast frequency glides, which is tied in a surprising way to intrinsic membrane properties and subthreshold events. The data suggest that octopus cells have a role in temporal comparisons across frequency and may play a role in auditory scene analysis.
Learning Speech Perception and Action through Sensorimotor Interactions
The effects and interactions of top-down influences on speech perception
FENS Forum 2024