Learning to Integrate Auditory and Visual Information in Speech Perception

2004 ◽  
Author(s):  
Joseph D. W. Stephens ◽  
Lori L. Holt
2012 ◽  
Vol 25 (0) ◽  
pp. 148
Author(s):  
Marcia Grabowecky ◽  
Emmanuel Guzman-Martinez ◽  
Laura Ortega ◽  
Satoru Suzuki

Watching moving lips facilitates auditory speech perception when the mouth is attended. However, recent evidence suggests that visual attention and awareness are mediated by separate mechanisms. We investigated whether lip movements suppressed from visual awareness can facilitate speech perception. We used a word categorization task in which participants listened to spoken words and determined as quickly and accurately as possible whether or not each word named a tool. While participants listened to the words they watched a visual display that presented a video clip of the speaker synchronously speaking the auditorily presented words, or the same speaker articulating different words. Critically, the speaker’s face was either visible (the aware trials), or suppressed from awareness using continuous flash suppression. Aware and suppressed trials were randomly intermixed. A secondary probe-detection task ensured that participants attended to the mouth region regardless of whether the face was visible or suppressed. On the aware trials responses to the tool targets were no faster with the synchronous than asynchronous lip movements, perhaps because the visual information was inconsistent with the auditory information on 50% of the trials. However, on the suppressed trials responses to the tool targets were significantly faster with the synchronous than asynchronous lip movements. These results demonstrate that even when a random dynamic mask renders a face invisible, lip movements are processed by the visual system with sufficiently high temporal resolution to facilitate speech perception.


2000 ◽  
Vol 23 (3) ◽  
pp. 327-328 ◽  
Author(s):  
Lawrence Brancazio ◽  
Carol A. Fowler

The present description of the Merge model addresses only auditory, not audiovisual, speech perception. However, recent findings in the audiovisual domain are relevant to the model. We outline a test that we are conducting of the adequacy of Merge, modified to accept visual information about articulation.


2012 ◽  
Vol 132 (3) ◽  
pp. 2050-2050
Author(s):  
Qudsia Tahmina ◽  
Moulesh Bhandary ◽  
Behnam Azimi ◽  
Yi Hu ◽  
Rene L. Utianski ◽  
...  

1990 ◽  
Vol 1 (1) ◽  
pp. 55-63 ◽  
Author(s):  
Dominic W. Massaro ◽  
Michael M. Cohen

The research reported in this paper uses novel stimuli to study how speech perception is influenced by information presented to ear and eye. Auditory and visual sources of information (syllables) were synthesized and presented in isolation or in factorial combination. A five-step continuum between the syllables ibal and idal was synthesized along both auditory and visual dimensions, by varying properties of the syllable at its onset. The onsets of the second and third formants were manipulated in the audible speech. For the visible speech, the shape of the lips and the jaw position at the onset of the syllable were manipulated. Subjects’ identification judgments of the test syllables presented on videotape were influenced by both auditory and visual information. The results were used to test between a fuzzy logical model of speech perception (FLMP) and a categorical model of perception (CMP). These tests indicate that evaluation and integration of the two sources of information makes available continuous as opposed to just categorical information. In addition, the integration of the two sources appears to be nonadditive in that the least ambiguous source has the largest impact on the judgment. The two sources of information appear to be evaluated, integrated, and identified as described by the FLMP-an optimal algorithm for combining information from multiple sources. The research provides a theoretical framework for understanding the improvement in speech perception by hearing-impaired listeners when auditory speech is supplemented with other sources of information.


2020 ◽  
Vol 20 (11) ◽  
pp. 434
Author(s):  
Brian A., Metzger ◽  
John F., Magnotti ◽  
Elizabeth Nesbitt ◽  
Daniel Yoshor ◽  
Michael S., Beauchamp

2019 ◽  
Vol 62 (2) ◽  
pp. 307-317 ◽  
Author(s):  
Jianghua Lei ◽  
Huina Gong ◽  
Liang Chen

Purpose The study was designed primarily to determine if the use of hearing aids (HAs) in individuals with hearing impairment in China would affect their speechreading performance. Method Sixty-seven young adults with hearing impairment with HAs and 78 young adults with hearing impairment without HAs completed newly developed Chinese speechreading tests targeting 3 linguistic levels (i.e., words, phrases, and sentences). Results Groups with HAs were more accurate at speechreading than groups without HA across the 3 linguistic levels. For both groups, speechreading accuracy was higher for phrases than words and sentences, and speechreading speed was slower for sentences than words and phrases. Furthermore, there was a positive correlation between years of HA use and the accuracy of speechreading performance; longer HA use was associated with more accurate speechreading. Conclusions Young HA users in China have enhanced speechreading performance over their peers with hearing impairment who are not HA users. This result argues against the perceptual dependence hypothesis that suggests greater dependence on visual information leads to improvement in visual speech perception.


2020 ◽  
pp. 1-8
Author(s):  
Hyun Jin Lee ◽  
Jeon Mi Lee ◽  
Jae Young Choi ◽  
Jinsei Jung

<b><i>Introduction:</i></b> Patients with postlingual deafness usually depend on visual information for communication, and their lipreading ability could influence cochlear implantation (CI) outcomes. However, it is unclear whether preoperative visual dependency in postlingual deafness positively or negatively affects auditory rehabilitation after CI. Herein, we investigated the influence of preoperative audiovisual per­ception on CI outcomes. <b><i>Method:</i></b> In this retrospective case-comparison study, 118 patients with postlingual deafness who underwent unilateral CI were enrolled. Evaluation of speech perception was performed under both audiovisual (AV) and audio-only (AO) conditions before and after CI. Before CI, the speech perception test was performed under hearing aid (HA)-assisted conditions. After CI, the speech perception test was performed under the CI-only condition. Only patients with a 10% or less preoperative AO speech perception score were included. <b><i>Results:</i></b> Multivariable regression analysis showed that age, gender, residual hearing, operation side, education level, and HA usage were not correlated with either postoperative AV (pAV) or AO (pAO) speech perception. However, duration of deafness showed a significant negative correlation with both pAO (<i>p</i> = 0.003) and pAV (<i>p</i> = 0.015) speech perceptions. Notably, the preoperative AV speech perception score was not correlated with pAO speech perception (<i>R</i><sup>2</sup> = 0.00134, <i>p</i> = 0.693) but was positively associated with pAV speech perception (<i>R</i><sup>2</sup> = 0.0731, <i>p</i> = 0.003). <b><i>Conclusion:</i></b> Preoperative dependency on audiovisual information may positively influence pAV speech perception in patients with postlingual deafness.


Author(s):  
Karthik Ganesan ◽  
John Plass ◽  
Adriene M. Beltz ◽  
Zhongming Liu ◽  
Marcia Grabowecky ◽  
...  

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.


Sign in / Sign up

Export Citation Format

Share Document