Learning to Integrate Auditory and Visual Information in Speech Perception

Watching moving lips facilitates auditory speech perception when the mouth is attended. However, recent evidence suggests that visual attention and awareness are mediated by separate mechanisms. We investigated whether lip movements suppressed from visual awareness can facilitate speech perception. We used a word categorization task in which participants listened to spoken words and determined as quickly and accurately as possible whether or not each word named a tool. While participants listened to the words they watched a visual display that presented a video clip of the speaker synchronously speaking the auditorily presented words, or the same speaker articulating different words. Critically, the speaker’s face was either visible (the aware trials), or suppressed from awareness using continuous flash suppression. Aware and suppressed trials were randomly intermixed. A secondary probe-detection task ensured that participants attended to the mouth region regardless of whether the face was visible or suppressed. On the aware trials responses to the tool targets were no faster with the synchronous than asynchronous lip movements, perhaps because the visual information was inconsistent with the auditory information on 50% of the trials. However, on the suppressed trials responses to the tool targets were significantly faster with the synchronous than asynchronous lip movements. These results demonstrate that even when a random dynamic mask renders a face invisible, lip movements are processed by the visual system with sufficiently high temporal resolution to facilitate speech perception.

Download Full-text

Merging auditory and visual phonetic information: A critical test for feedback?

Behavioral and Brain Sciences ◽

10.1017/s0140525x00243240 ◽

2000 ◽

Vol 23 (3) ◽

pp. 327-328 ◽

Cited By ~ 1

Author(s):

Lawrence Brancazio ◽

Carol A. Fowler

Keyword(s):

Speech Perception ◽

Visual Information ◽

Audiovisual Speech ◽

Critical Test ◽

Audiovisual Speech Perception ◽

Phonetic Information ◽

Present Description

The present description of the Merge model addresses only auditory, not audiovisual, speech perception. However, recent findings in the audiovisual domain are relevant to the model. We outline a test that we are conducting of the adequacy of Merge, modified to accept visual information about articulation.

Download Full-text

The effect of visual information on speech perception in noise by electroacoustic hearing

The Journal of the Acoustical Society of America ◽

10.1121/1.4755545 ◽

2012 ◽

Vol 132 (3) ◽

pp. 2050-2050

Author(s):

Qudsia Tahmina ◽

Moulesh Bhandary ◽

Behnam Azimi ◽

Yi Hu ◽

Rene L. Utianski ◽

...

Keyword(s):

Speech Perception ◽

Visual Information ◽

Speech Perception In Noise

Download Full-text

Temporal cues from visual information benefit speech perception in noise

The Journal of the Acoustical Society of America ◽

10.1121/1.5137604 ◽

2019 ◽

Vol 146 (4) ◽

pp. 3056-3056

Author(s):

Yi Yuan ◽

Andrew Lotto ◽

Yonghee Oh

Keyword(s):

Speech Perception ◽

Visual Information ◽

Speech Perception In Noise ◽

Temporal Cues

Download Full-text

Perception of Synthesized Audible and Visible Speech

Psychological Science ◽

10.1111/j.1467-9280.1990.tb00068.x ◽

1990 ◽

Vol 1 (1) ◽

pp. 55-63 ◽

Cited By ~ 50

Author(s):

Dominic W. Massaro ◽

Michael M. Cohen

Keyword(s):

Speech Perception ◽

Visual Information ◽

Optimal Algorithm ◽

Sources Of Information ◽

Visible Speech ◽

Multiple Sources ◽

Categorical Information ◽

Fuzzy Logical Model ◽

Jaw Position ◽

Combining Information

The research reported in this paper uses novel stimuli to study how speech perception is influenced by information presented to ear and eye. Auditory and visual sources of information (syllables) were synthesized and presented in isolation or in factorial combination. A five-step continuum between the syllables ibal and idal was synthesized along both auditory and visual dimensions, by varying properties of the syllable at its onset. The onsets of the second and third formants were manipulated in the audible speech. For the visible speech, the shape of the lips and the jaw position at the onset of the syllable were manipulated. Subjects’ identification judgments of the test syllables presented on videotape were influenced by both auditory and visual information. The results were used to test between a fuzzy logical model of speech perception (FLMP) and a categorical model of perception (CMP). These tests indicate that evaluation and integration of the two sources of information makes available continuous as opposed to just categorical information. In addition, the integration of the two sources appears to be nonadditive in that the least ambiguous source has the largest impact on the judgment. The two sources of information appear to be evaluated, integrated, and identified as described by the FLMP-an optimal algorithm for combining information from multiple sources. The research provides a theoretical framework for understanding the improvement in speech perception by hearing-impaired listeners when auditory speech is supplemented with other sources of information.

Download Full-text

Cross-modal suppression model of speech perception: Visual information drives suppressive interactions between visual and auditory speech in pSTG

Journal of Vision ◽

10.1167/jov.20.11.434 ◽

2020 ◽

Vol 20 (11) ◽

pp. 434

Author(s):

Brian A., Metzger ◽

John F., Magnotti ◽

Elizabeth Nesbitt ◽

Daniel Yoshor ◽

Michael S., Beauchamp

Keyword(s):

Speech Perception ◽

Visual Information ◽

Auditory Speech

Download Full-text

Enhanced Speechreading Performance in Young Hearing Aid Users in China

Journal of Speech Language and Hearing Research ◽

10.1044/2018_jslhr-s-18-0153 ◽

2019 ◽

Vol 62 (2) ◽

pp. 307-317 ◽

Cited By ~ 1

Author(s):

Jianghua Lei ◽

Huina Gong ◽

Liang Chen

Keyword(s):

Young Adults ◽

Speech Perception ◽

Hearing Impairment ◽

Hearing Aids ◽

Visual Information ◽

Hearing Aid ◽

Visual Speech ◽

Positive Correlation ◽

Visual Speech Perception

Purpose The study was designed primarily to determine if the use of hearing aids (HAs) in individuals with hearing impairment in China would affect their speechreading performance. Method Sixty-seven young adults with hearing impairment with HAs and 78 young adults with hearing impairment without HAs completed newly developed Chinese speechreading tests targeting 3 linguistic levels (i.e., words, phrases, and sentences). Results Groups with HAs were more accurate at speechreading than groups without HA across the 3 linguistic levels. For both groups, speechreading accuracy was higher for phrases than words and sentences, and speechreading speed was slower for sentences than words and phrases. Furthermore, there was a positive correlation between years of HA use and the accuracy of speechreading performance; longer HA use was associated with more accurate speechreading. Conclusions Young HA users in China have enhanced speechreading performance over their peers with hearing impairment who are not HA users. This result argues against the perceptual dependence hypothesis that suggests greater dependence on visual information leads to improvement in visual speech perception.

Download Full-text

The Effects of Preoperative Audiovisual Speech Perception on the Audiologic Outcomes of Cochlear Implantation in Patients with Postlingual Deafness

Audiology and Neurotology ◽

10.1159/000509969 ◽

2020 ◽

pp. 1-8

Author(s):

Hyun Jin Lee ◽

Jeon Mi Lee ◽

Jae Young Choi ◽

Jinsei Jung

Keyword(s):

Speech Perception ◽

Cochlear Implantation ◽

Visual Information ◽

Significant Negative Correlation ◽

Retrospective Case ◽

Audiovisual Speech Perception ◽

Auditory Rehabilitation ◽

Before And After ◽

Case Comparison ◽

Perception Score

Introduction: Patients with postlingual deafness usually depend on visual information for communication, and their lipreading ability could influence cochlear implantation (CI) outcomes. However, it is unclear whether preoperative visual dependency in postlingual deafness positively or negatively affects auditory rehabilitation after CI. Herein, we investigated the influence of preoperative audiovisual perception on CI outcomes. Method: In this retrospective case-comparison study, 118 patients with postlingual deafness who underwent unilateral CI were enrolled. Evaluation of speech perception was performed under both audiovisual (AV) and audio-only (AO) conditions before and after CI. Before CI, the speech perception test was performed under hearing aid (HA)-assisted conditions. After CI, the speech perception test was performed under the CI-only condition. Only patients with a 10% or less preoperative AO speech perception score were included. Results: Multivariable regression analysis showed that age, gender, residual hearing, operation side, education level, and HA usage were not correlated with either postoperative AV (pAV) or AO (pAO) speech perception. However, duration of deafness showed a significant negative correlation with both pAO (p = 0.003) and pAV (p = 0.015) speech perceptions. Notably, the preoperative AV speech perception score was not correlated with pAO speech perception (R2 = 0.00134, p = 0.693) but was positively associated with pAV speech perception (R2 = 0.0731, p = 0.003). Conclusion: Preoperative dependency on audiovisual information may positively influence pAV speech perception in patients with postlingual deafness.

Download Full-text

Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

10.1101/2020.09.07.284455 ◽

2020 ◽

Cited By ~ 1

Author(s):

Karthik Ganesan ◽

John Plass ◽

Adriene M. Beltz ◽

Zhongming Liu ◽

Marcia Grabowecky ◽

...

Keyword(s):

Speech Perception ◽

Auditory Cortex ◽

Auditory Processing ◽

Visual Information ◽

Visual Speech ◽

Visual Signals ◽

Audiovisual Speech ◽

Frequency Bands ◽

Beta Power ◽

High Gamma

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.

Download Full-text

Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect

Perception & Psychophysics ◽

10.3758/bf03193531 ◽

2005 ◽

Vol 67 (5) ◽

pp. 759-769 ◽

Cited By ~ 23

Author(s):

Lawrence Brancazio ◽

Joanne L. Miller

Keyword(s):

Speech Perception ◽

Visual Information ◽

Rate Effect ◽

Mcgurk Effect

Download Full-text