Children perceive speech onsets by ear and eye

AbstractAdults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: –b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children – like adults – perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception.

Download Full-text

Speech Perception as a Multimodal Phenomenon

Current Directions in Psychological Science ◽

10.1111/j.1467-8721.2008.00615.x ◽

2008 ◽

Vol 17 (6) ◽

pp. 405-409 ◽

Cited By ~ 79

Author(s):

Lawrence D. Rosenblum

Keyword(s):

Speech Perception ◽

Semantic Context ◽

Visual Speech ◽

Audiovisual Speech ◽

Lexical Status ◽

Auditory Speech ◽

Lip Reading ◽

Imaging Research ◽

Speech Information ◽

The Brain

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal speech information could explain the reported automaticity, immediacy, and completeness of audiovisual speech integration. However, recent findings suggest that speech integration can be influenced by higher cognitive properties such as lexical status and semantic context. Proponents of amodal accounts will need to explain these results.

Download Full-text

Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution

10.1101/2020.04.16.045716 ◽

2020 ◽

Author(s):

Brian A. Metzger ◽

John F. Magnotti ◽

Zhengjia Wang ◽

Elizabeth Nesbitt ◽

Patrick J. Karas ◽

...

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Time Course ◽

Brain Area ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Audiovisual Speech ◽

Neural Responses ◽

Auditory Speech ◽

Human Epilepsy

AbstractExperimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained, positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory- only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.Significance statementUnderstanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography (iEEG) deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.

Download Full-text

Converging Evidence from Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing

10.1101/272823 ◽

2018 ◽

Author(s):

Muge Ozker ◽

Michael S. Beauchamp

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Recognition Task ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Audiovisual Speech ◽

Auditory Modality ◽

Bold Fmri ◽

Auditory Speech ◽

Functional Boundary

AbstractAlthough humans can understand speech using the auditory modality alone, in noisy environments visual speech information from the talker’s mouth can rescue otherwise unintelligible auditory speech. To investigate the neural substrates of multisensory speech perception, we recorded neural activity from the human superior temporal gyrus using two very different techniques: either directly, using surface electrodes implanted in five participants with epilepsy (electrocorticography, ECOG), or indirectly, using blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) in six healthy control fMRI participants. Both ECOG and fMRI participants viewed the same clear and noisy audiovisual speech stimuli and performed the same speech recognition task. Both techniques demonstrated a sharp functional boundary in the STG, which corresponded to an anatomical boundary defined by the posterior edge of Heschl’s gyrus. On the anterior side of the boundary, cortex responded more strongly to clear audiovisual speech than to noisy audiovisual speech, suggesting that anterior STG is primarily involved in processing unisensory auditory speech. On the posterior side of the boundary, cortex preferred noisy audiovisual speech or showed no preference and showed robust responses to auditory-only and visual-only speech, suggesting that posterior STG is specialized for processing multisensory audiovisual speech. For both ECOG and fMRI, the transition between the functionally distinct regions happened within 10 mm of anterior-to-posterior distance along the STG. We relate this boundary to the multisensory neural code underlying speech perception and propose that it represents an important functional division within the human speech perception network.

Download Full-text

Schizotypal traits are not related to multisensory integration or audiovisual speech perception

10.31234/osf.io/9vqtf ◽

2020 ◽

Author(s):

Anne-Marie Muller ◽

Tyler C. Dalal ◽

Ryan A Stevenson

Keyword(s):

Speech Perception ◽

Multisensory Integration ◽

Temporal Processing ◽

Visual Speech ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Speech In Noise ◽

Auditory Speech ◽

Synchrony Judgment ◽

Schizophrenia Spectrum

Multisensory integration, the process by which sensory information from different sensory modalities are bound together, is hypothesized to contribute to perceptual symptomatology in schizophrenia, including hallucinations and aberrant speech perception. Differences in multisensory integration and temporal processing, an important component of multisensory integration, have been consistently found among individuals with schizophrenia. Evidence is emerging that these differences extend across the schizophrenia spectrum, including individuals in the general population with higher levels of schizotypal traits. In the current study, we measured (1) multisensory integration using an audiovisual speech-in-noise task, and the McGurk task. Using the speech-in-noise task, we assessed (2) susceptibility to distracting auditory speech to test the hypothesis that increased perception of distracting speech that is subsequently bound with mismatching visual speech contributes to hallucination-like experiences. As a measure of (3) temporal processing, we used the ternary synchrony judgment task. We measured schizotypal traits using the Schizotypal Personality Questionnaire (SPQ), hypothesizing that higher levels of schizotypal traits, specifically Unusual Perceptual Experiences and Odd Speech subscales, would be associated with (1) decreased multisensory integration, (2) increased susceptibility to distracting auditory speech, and (3) less precise temporal processing. Surprisingly, neither subscales were associated with any of the measures. These results suggest that these perceptual differences may not be present across the schizophrenia spectrum.

Download Full-text

Increased connectivity among sensory and motor regions during visual and audiovisual speech perception

10.1101/2020.12.15.422726 ◽

2020 ◽

Author(s):

Jonathan E Peelle ◽

Brent Spehar ◽

Michael S Jones ◽

Sarah McConkey ◽

Joel Myerson ◽

...

Keyword(s):

Visual Cortex ◽

Speech Perception ◽

Brain Activity ◽

Premotor Cortex ◽

Temporal Cortex ◽

Primary Auditory Cortex ◽

Visual Speech ◽

Auditory Signal ◽

Audiovisual Speech ◽

Audiovisual Speech Perception

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.

Download Full-text

Audiovisual speech perception in infancy: The influence of vowel identity and infants’ productive abilities on sensitivity to (mis)matches between auditory and visual speech cues.

Developmental Psychology ◽

10.1037/a0039964 ◽

2016 ◽

Vol 52 (2) ◽

pp. 191-204 ◽

Cited By ~ 10

Author(s):

Nicole Altvater-Mackensen ◽

Nivedita Mani ◽

Tobias Grossmann

Keyword(s):

Speech Perception ◽

Visual Speech ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Speech Cues

Download Full-text

Sound Location Can Influence Audiovisual Speech Perception When Spatial Attention Is Manipulated

Seeing and Perceiving ◽

10.1163/187847511x557308 ◽

2011 ◽

Vol 24 (1) ◽

pp. 67-90 ◽

Cited By ~ 11

Author(s):

Riikka Möttönen ◽

Kaisa Tiippana ◽

Mikko Sams ◽

Hanna Puharinen

Keyword(s):

Speech Perception ◽

Spatial Attention ◽

Reaction Times ◽

Mcgurk Effect ◽

Visual Speech ◽

Audiovisual Speech ◽

Sound Location ◽

Audiovisual Speech Perception ◽

The Right ◽

Talking Face

AbstractAudiovisual speech perception has been considered to operate independent of sound location, since the McGurk effect (altered auditory speech perception caused by conflicting visual speech) has been shown to be unaffected by whether speech sounds are presented in the same or different location as a talking face. Here we show that sound location effects arise with manipulation of spatial attention. Sounds were presented from loudspeakers in five locations: the centre (location of the talking face) and 45°/90° to the left/right. Auditory spatial attention was focused on a location by presenting the majority (90%) of sounds from this location. In Experiment 1, the majority of sounds emanated from the centre, and the McGurk effect was enhanced there. In Experiment 2, the major location was 90° to the left, causing the McGurk effect to be stronger on the left and centre than on the right. Under control conditions, when sounds were presented with equal probability from all locations, the McGurk effect tended to be stronger for sounds emanating from the centre, but this tendency was not reliable. Additionally, reaction times were the shortest for a congruent audiovisual stimulus, and this was the case independent of location. Our main finding is that sound location can modulate audiovisual speech perception, and that spatial attention plays a role in this modulation.

Download Full-text

Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

10.1101/2020.09.07.284455 ◽

2020 ◽

Cited By ~ 1

Author(s):

Karthik Ganesan ◽

John Plass ◽

Adriene M. Beltz ◽

Zhongming Liu ◽

Marcia Grabowecky ◽

...

Keyword(s):

Speech Perception ◽

Auditory Cortex ◽

Auditory Processing ◽

Visual Information ◽

Visual Speech ◽

Visual Signals ◽

Audiovisual Speech ◽

Frequency Bands ◽

Beta Power ◽

High Gamma

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.

Download Full-text

Audiovisual speech perception and word recognition

The Oxford Handbook of Psycholinguistics ◽

10.1093/oxfordhb/9780198568971.013.0002 ◽

2007 ◽

pp. 18-36 ◽

Cited By ~ 5

Author(s):

Dominic W. Massaro ◽

Alexandra Jesse

Keyword(s):

Speech Perception ◽

Word Recognition ◽

Language Learning ◽

Visual Speech ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Main Research ◽

Face To Face ◽

Facial Information ◽

Communication Methods

This article gives an overview of the main research questions and findings unique to audiovisual speech perception research, and discusses what general questions about speech perception and cognition the research in this field can answer. The influence of a second perceptual source in audiovisual speech perception compared to auditory speech perception immediately necessitates the question of how the information from the different perceptual sources is used to reach the best overall decision. The article explores how our understanding of speech benefits from having the speaker's face present, and how this benefit makes transparent the nature of speech perception and word recognition. Modern communication methods such as Voice over Internet Protocol find a wide acceptance, but people are reluctant to forfeit face-to-face communication. The article also considers the role of visual speech as a language-learning tool in multimodal training, information and information processing in audiovisual speech perception, lexicon and word recognition, facial information for speech perception, and theories of audiovisual speech perception.

Download Full-text

Hearing Lips and Seeing Voices: the Origins and Development of the ‘McGurk Effect’ and Reflections on Audio–Visual Speech Perception Over the Last 40 Years

Multisensory Research ◽

10.1163/22134808-00002548 ◽

2018 ◽

Vol 31 (1-2) ◽

pp. 7-18 ◽

Cited By ~ 3

Author(s):

John MacDonald

Keyword(s):

Speech Perception ◽

Visual Illusion ◽

Simultaneous Presentation ◽

Mcgurk Effect ◽

Visual Speech ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Profound Impact ◽

Visual Speech Perception

In 1976 Harry McGurk and I published a paper in Nature, entitled ‘Hearing Lips and Seeing Voices’. The paper described a new audio–visual illusion we had discovered that showed the perception of auditorily presented speech could be influenced by the simultaneous presentation of incongruent visual speech. This hitherto unknown effect has since had a profound impact on audiovisual speech perception research. The phenomenon has come to be known as the ‘McGurk effect’, and the original paper has been cited in excess of 4800 times. In this paper I describe the background to the discovery of the effect, the rationale for the generation of the initial stimuli, the construction of the exemplars used and the serendipitous nature of the finding. The paper will also cover the reaction (and non-reaction) to the Nature publication, the growth of research on, and utilizing the ‘McGurk effect’ and end with some reflections on the significance of the finding.

Download Full-text