Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions

Multisensory integration, the process by which sensory information from different sensory modalities are bound together, is hypothesized to contribute to perceptual symptomatology in schizophrenia, including hallucinations and aberrant speech perception. Differences in multisensory integration and temporal processing, an important component of multisensory integration, have been consistently found among individuals with schizophrenia. Evidence is emerging that these differences extend across the schizophrenia spectrum, including individuals in the general population with higher levels of schizotypal traits. In the current study, we measured (1) multisensory integration using an audiovisual speech-in-noise task, and the McGurk task. Using the speech-in-noise task, we assessed (2) susceptibility to distracting auditory speech to test the hypothesis that increased perception of distracting speech that is subsequently bound with mismatching visual speech contributes to hallucination-like experiences. As a measure of (3) temporal processing, we used the ternary synchrony judgment task. We measured schizotypal traits using the Schizotypal Personality Questionnaire (SPQ), hypothesizing that higher levels of schizotypal traits, specifically Unusual Perceptual Experiences and Odd Speech subscales, would be associated with (1) decreased multisensory integration, (2) increased susceptibility to distracting auditory speech, and (3) less precise temporal processing. Surprisingly, neither subscales were associated with any of the measures. These results suggest that these perceptual differences may not be present across the schizophrenia spectrum.

Download Full-text

Evaluating the Effort Expended to Understand Speech in Noise Using a Dual-Task Paradigm: The Effects of Providing Visual Speech Cues

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2009/08-0140) ◽

2010 ◽

Vol 53 (1) ◽

pp. 18-33 ◽

Cited By ~ 97

Author(s):

Sarah Fraser ◽

Jean-Pierre Gagné ◽

Majolaine Alepins ◽

Pascale Dubois

Keyword(s):

Dual Task ◽

Visual Speech ◽

Speech In Noise ◽

Speech Cues ◽

Dual Task Paradigm

Download Full-text

Audio-visual speech in noise perception in dyslexia

Developmental Science ◽

10.1111/desc.12504 ◽

2016 ◽

Vol 21 (1) ◽

pp. e12504 ◽

Cited By ~ 8

Author(s):

Thijs van Laarhoven ◽

Mirjam Keetels ◽

Lemmy Schakel ◽

Jean Vroomen

Keyword(s):

Visual Speech ◽

Speech In Noise

Download Full-text

Children perceive speech onsets by ear and eye

Journal of Child Language ◽

10.1017/s030500091500077x ◽

2016 ◽

Vol 44 (1) ◽

pp. 185-215 ◽

Cited By ~ 8

Author(s):

SUSAN JERGER ◽

MARKUS F. DAMIAN ◽

NANCY TYE-MURRAY ◽

HERVÉ ABDI

Keyword(s):

Speech Perception ◽

Growth Period ◽

Visual Speech ◽

Cognitive Resources ◽

Audiovisual Speech ◽

Auditory Input ◽

Phonological Priming ◽

Complex Tasks ◽

Auditory Speech ◽

Developmental Theories

AbstractAdults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: –b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children – like adults – perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception.

Download Full-text

Visual speech recalibrates auditory speech identification

The Journal of the Acoustical Society of America ◽

10.1121/1.4778922 ◽

2002 ◽

Vol 112 (5) ◽

pp. 2245-2245

Author(s):

Paul Bertelson ◽

Jean Vroomen ◽

Beatrice de Gelder

Keyword(s):

Visual Speech ◽

Auditory Speech ◽

Speech Identification

Download Full-text

Visual Speech Perception in Children With Language Learning Impairments

Journal of Speech Language and Hearing Research ◽

10.1044/2015_jslhr-s-14-0269 ◽

2016 ◽

Vol 59 (1) ◽

pp. 1-14 ◽

Cited By ~ 8

Author(s):

Victoria C. P. Knowland ◽

Sam Evans ◽

Caroline Snell ◽

Stuart Rosen

Keyword(s):

Speech Perception ◽

Language Learning ◽

Visual Cues ◽

Visual Speech ◽

Typically Developing ◽

Cross Sectional ◽

Learning Impairments ◽

Speech In Noise ◽

Listening In Noise ◽

Talking Face

Purpose The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with diagnosed LLI (mean age: 8 years 10 months, range: 5 years 2 months to 11 years 6 months) completed a silent speechreading task and a speech-in-noise task with and without visual support from the talking face. The speech-in-noise task involved the identification of a target word in a carrier sentence with a single competing speaker as a masker. Results Children in the LLI group showed a deficit in speechreading when compared with their typically developing peers. Beyond the single-word level, this deficit became more apparent in older children. On the speech-in-noise task, a substantial benefit of visual cues was found regardless of age or group membership, although the LLI group showed an overall developmental delay in speech perception. Conclusion Although children with LLI were less accurate than their peers on the speechreading and speech-in noise-tasks, both groups were able to make equivalent use of visual cues to boost performance accuracy when listening in noise.

Download Full-text

Linking Activity in Human Superior Temporal Cortex to Perception of Noisy Audiovisual Speech

10.1101/2020.04.02.021774 ◽

2020 ◽

Author(s):

Johannes Rennig ◽

Michael S Beauchamp

Keyword(s):

Univariate Analysis ◽

Temporal Cortex ◽

Visual Speech ◽

Button Press ◽

Auditory Information ◽

Bold Response ◽

Auditory Speech ◽

Auditory Component ◽

Visual Speech Information ◽

Speech Information

AbstractRegions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We hypothesized that these multisensory responses in pSTG/S underlie the observation that comprehension of noisy auditory speech is improved when it is accompanied by visual speech. To test this idea, we presented audiovisual sentences that contained either a clear auditory component or a noisy auditory component while measuring brain activity using BOLD fMRI. Participants reported the intelligibility of the speech on each trial with a button press. Perceptually, adding visual speech to noisy auditory sentences rendered them much more intelligible. Post-hoc trial sorting was used to examine brain activations during noisy sentences that were more or less intelligible, focusing on multisensory speech regions in the pSTG/S identified with an independent visual speech localizer. Univariate analysis showed that less intelligible noisy audiovisual sentences evoked a weaker BOLD response, while more intelligible sentences evoked a stronger BOLD response that was indistinguishable from clear sentences. To better understand these differences, we conducted a multivariate representational similarity analysis. The pattern of response for intelligible noisy audiovisual sentences was more similar to the pattern for clear sentences, while the response pattern for unintelligible noisy sentences was less similar. These results show that for both univariate and multivariate analyses, successful integration of visual and noisy auditory speech normalizes responses in pSTG/S, providing evidence that multisensory subregions of pSTG/S are responsible for the perceptual benefit of visual speech.Significance StatementEnabling social interactions, including the production and perception of speech, is a key function of the human brain. Speech perception is a complex computational problem that the brain solves using both visual information from the talker’s facial movements and auditory information from the talker’s voice. Visual speech information is particularly important under noisy listening conditions when auditory speech is difficult or impossible to understand alone Regions of the human cortex in posterior superior temporal lobe respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We show that the pattern of activity in cortex reflects the successful multisensory integration of auditory and visual speech information in the service of perception.

Download Full-text

Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility

eLife ◽

10.7554/elife.14521 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 65

Author(s):

Hyojin Park ◽

Christoph Kayser ◽

Gregor Thut ◽

Joachim Gross

Keyword(s):

Visual Cortex ◽

Speech Processing ◽

Speech Intelligibility ◽

Brain Activity ◽

Low Frequency ◽

Visual Speech ◽

Visual Signals ◽

Partial Coherence ◽

Auditory Speech ◽

Oscillatory Brain Activity

During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing.

Download Full-text

Vision perceptually restores auditory spectral dynamics in speech

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2002887117 ◽

2020 ◽

Vol 117 (29) ◽

pp. 16920-16927 ◽

Cited By ~ 2

Author(s):

John Plass ◽

David Brang ◽

Satoru Suzuki ◽

Marcia Grabowecky

Keyword(s):

Visual Cues ◽

Perceptual System ◽

Visual Speech ◽

Auditory Information ◽

Speech Comprehension ◽

Facilitation Effect ◽

Time Frequency ◽

Auditory Speech ◽

High Level ◽

Frequency Modulations

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time–frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

Download Full-text

Sources of variance in the audiovisual perception of speech in noise

Seeing and Perceiving ◽

10.1163/187847612x647568 ◽

2012 ◽

Vol 25 (0) ◽

pp. 123

Author(s):

Celina C. Nahanni ◽

Justin M. Deonarine ◽

Martin Paré ◽

Kevin G. Munhall

Keyword(s):

Individual Differences ◽

Acoustic Noise ◽

Auditory Signal ◽

Large Set ◽

Speech In Noise ◽

Auditory Speech ◽

Presentation Modes ◽

Audiovisual Perception ◽

Report Data ◽

Noisy Background

The sight of a talker’s face dramatically influences the perception of auditory speech. This effect is most commonly observed when subjects are presented audiovisual (AV) stimuli in the presence of acoustic noise. However, the magnitude of the gain in perception that vision adds varies considerably in published work. Here we report data from an ongoing study of individual differences in AV speech perception when English words are presented in an acoustically noisy background. A large set of monosyllablic nouns was presented at 7 signal-to-noise ratios (pink noise) in both AV and auditory-only (AO) presentation modes. The stimuli were divided into 14 blocks of 25 words and each block was equated for spoken frequency using the SUBTLEXus database (Brysbaert and New, 2009). The presentation of the stimulus blocks was counterbalanced across subjects for noise level and presentation. In agreement with Sumby and Pollack (1954), the accuracy of both AO and AV increase monotonically with signal strength with the greatest visual gain being when the auditory signal was weakest. These average results mask considerable variability due to subject (individual differences in auditory and visual perception), stimulus (lexical type, token articulation) and presentation (signal and noise attributes) factors. We will discuss how these sources of variance impede comparisons between studies.

Download Full-text