scholarly journals Cross-modal Suppression of Auditory Association Cortex by Visual Speech as a Mechanism for Audiovisual Speech Perception

2019 ◽  
Author(s):  
Patrick J. Karas ◽  
John F. Magnotti ◽  
Brian A. Metzger ◽  
Lin L. Zhu ◽  
Kristen B. Smith ◽  
...  

AbstractVision provides a perceptual head start for speech perception because most speech is “mouth-leading”: visual information from the talker’s mouth is available before auditory information from the voice. However, some speech is “voice-leading” (auditory before visual). Consistent with a model in which vision modulates subsequent auditory processing, there was a larger perceptual benefit of visual speech for mouth-leading vs. voice-leading words (28% vs. 4%). The neural substrates of this difference were examined by recording broadband high-frequency activity from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. Responses were smaller for audiovisual vs. auditory-only mouth-leading words (34% difference) while there was little difference (5%) for voice-leading words. Evidence for cross-modal suppression of auditory cortex complements our previous work showing enhancement of visual cortex (Ozker et al., 2018b) and confirms that multisensory interactions are a powerful modulator of activity throughout the speech perception network.Impact StatementHuman perception and brain responses differ between words in which mouth movements are visible before the voice is heard and words for which the reverse is true.

eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Patrick J Karas ◽  
John F Magnotti ◽  
Brian A Metzger ◽  
Lin L Zhu ◽  
Kristen B Smith ◽  
...  

Visual information about speech content from the talker’s mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Muge Ozker ◽  
Daniel Yoshor ◽  
Michael S Beauchamp

Human faces contain multiple sources of information. During speech perception, visual information from the talker’s mouth is integrated with auditory information from the talker's voice. By directly recording neural responses from small populations of neurons in patients implanted with subdural electrodes, we found enhanced visual cortex responses to speech when auditory speech was absent (rendering visual speech especially relevant). Receptive field mapping demonstrated that this enhancement was specific to regions of the visual cortex with retinotopic representations of the mouth of the talker. Connectivity between frontal cortex and other brain regions was measured with trial-by-trial power correlations. Strong connectivity was observed between frontal cortex and mouth regions of visual cortex; connectivity was weaker between frontal cortex and non-mouth regions of visual cortex or auditory cortex. These results suggest that top-down selection of visual information from the talker’s mouth by frontal cortex plays an important role in audiovisual speech perception.


Author(s):  
Karthik Ganesan ◽  
John Plass ◽  
Adriene M. Beltz ◽  
Zhongming Liu ◽  
Marcia Grabowecky ◽  
...  

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.


2012 ◽  
Vol 25 (0) ◽  
pp. 194
Author(s):  
Carolina Sánchez-García ◽  
Sonia Kandel ◽  
Christophe Savariaux ◽  
Nara Ikumi ◽  
Salvador Soto-Faraco

When both present, visual and auditory information are combined in order to decode the speech signal. Past research has addressed to what extent visual information contributes to distinguish confusable speech sounds, but usually ignoring the continuous nature of speech perception. Here we tap at the temporal course of the contribution of visual and auditory information during the process of speech perception. To this end, we designed an audio–visual gating task with videos recorded with high speed camera. Participants were asked to identify gradually longer fragments of pseudowords varying in the central consonant. Different Spanish consonant phonemes with different degree of visual and acoustic saliency were included, and tested on visual-only, auditory-only and audio–visual trials. The data showed different patterns of contribution of unimodal and bimodal information during identification, depending on the visual saliency of the presented phonemes. In particular, for phonemes which are clearly more salient in one modality than the other, audio–visual performance equals that of the best unimodal. In phonemes with more balanced saliency, audio–visual performance was better than both unimodal conditions. These results shed new light on the temporal course of audio–visual speech integration.


Neurosurgery ◽  
2019 ◽  
Vol 66 (Supplement_1) ◽  
Author(s):  
Patrick J Karas ◽  
John F Magnotti ◽  
Zhengjia Wang ◽  
Brian A Metzger ◽  
Daniel Yoshor ◽  
...  

Abstract INTRODUCTION Speech is multisensory. The addition of visual speech to auditory speech greatly improves comprehension, especially under noisy auditory conditions. However, the neural mechanism for this visual enhancement of auditory speech is poorly understood. We used electrocorticography (ECoG) to study how auditory, visual, and audiovisual speech is processed in the posterior superior temporal gyrus (pSTG), an area of auditory association cortex involved in audiovisual speech integration. We hypothesized that early visual mouth movements modulate audiovisual speech integration through a mechanism of cross-modal suppression, suggesting that the pSTG response to early mouth movements should correlate with comprehension benefits gained by the addition of visual speech to auditory speech. METHODS Words were presented under auditory-only (AUD), visual-only (VIS), and audiovisual (AV) conditions to epilepsy patients (n = 8) implanted with intracranial electrodes for phase-2 monitoring. We measured high-frequency broadband activity (75-150 Hz), a marker for local neuronal firing, in 28 electrodes over the pSTG. RESULTS The early neural response to visual-only words was compared to the reduction in neural response seen from AUD to AV words, a reduction correlated with an improvement in speech comprehension that occurs with the addition of visual to auditory speech. In words that showed a comprehension benefit with the addition of visual speech, there was a strong early response to visual speech and a correlation between early visual response and the AUD-AV difference (r = 0.64, P = 104). In words where visual speech did not provide any comprehension benefit, there was a weak early visual response and no correlation (r = 0.18, P = .35). CONCLUSION Words with a visual speech comprehension benefit also elicit a strong neural response to early visual speech in pSTG, while words with no comprehension benefit do not cause a strong early response. This suggests that cross-modal suppression of auditory association cortex (pSTG) by early visual plays an important role in audiovisual speech perception.


Perception ◽  
10.1068/p5852 ◽  
2007 ◽  
Vol 36 (10) ◽  
pp. 1535-1545 ◽  
Author(s):  
Ian T Everdell ◽  
Heidi Marsh ◽  
Micheal D Yurick ◽  
Kevin G Munhall ◽  
Martin Paré

Speech perception under natural conditions entails integration of auditory and visual information. Understanding how visual and auditory speech information are integrated requires detailed descriptions of the nature and processing of visual speech information. To understand better the process of gathering visual information, we studied the distribution of face-directed fixations of humans performing an audiovisual speech perception task to characterise the degree of asymmetrical viewing and its relationship to speech intelligibility. Participants showed stronger gaze fixation asymmetries while viewing dynamic faces, compared to static faces or face-like objects, especially when gaze was directed to the talkers' eyes. Although speech perception accuracy was significantly enhanced by the viewing of congruent, dynamic faces, we found no correlation between task performance and gaze fixation asymmetry. Most participants preferentially fixated the right side of the faces and their preferences persisted while viewing horizontally mirrored stimuli, different talkers, or static faces. These results suggest that the asymmetrical distributions of gaze fixations reflect the participants' viewing preferences, rather than being a product of asymmetrical faces, but that this behavioural bias does not predict correct audiovisual speech perception.


2011 ◽  
Vol 26 (S2) ◽  
pp. 1512-1512
Author(s):  
G.R. Szycik ◽  
Z. Ye ◽  
B. Mohammadi ◽  
W. Dillo ◽  
B.T. te Wildt ◽  
...  

IntroductionNatural speech perception relies on both, auditory and visual information. Both sensory channels provide redundant and complementary information, such that speech perception is enhanced in healthy subjects, when both information channels are present.ObjectivesPatients with schizophrenia have been reported to have problems regarding this audiovisual integration process, but little is known about which neural processes are altered.AimsIn this study we investigated functional connectivity of Broca’s area in patients with schizophrenia.MethodsFunctional magnetic resonance imaging (fMRI) was performed in 15 schizophrenia patients and 15 healthy controls to study functional connectivity of Broca’s area during perception of videos of bisyllabic German nouns, in which audio and video either matched (congruent condition) or die not match (incongruent; e.g. video = hotel, audio = island).ResultsThere were differences in connectivity between experimental groups and between conditions. Broca’s area of the patient group showed connections to more brain areas than the control group. This difference was more prominent in the incongruent condition, for which only one connection between Broca's area and the supplementary motor area was found in control participants, whereas patients showed connections to 8 widely distributed brain areas.ConclusionsThe findings imply that audiovisual integration problems in schizophrenia result from maladaptive connectivity of Broca's area in particular when confronted with incongruent stimuli and are discussed in light of recent audio visual speech models.


2021 ◽  
pp. 1-17
Author(s):  
Yuta Ujiie ◽  
Kohske Takahashi

Abstract While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin’s vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ–Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants’ AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.


2020 ◽  
Author(s):  
Jonathan E Peelle ◽  
Brent Spehar ◽  
Michael S Jones ◽  
Sarah McConkey ◽  
Joel Myerson ◽  
...  

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.


2012 ◽  
Vol 25 (0) ◽  
pp. 148
Author(s):  
Marcia Grabowecky ◽  
Emmanuel Guzman-Martinez ◽  
Laura Ortega ◽  
Satoru Suzuki

Watching moving lips facilitates auditory speech perception when the mouth is attended. However, recent evidence suggests that visual attention and awareness are mediated by separate mechanisms. We investigated whether lip movements suppressed from visual awareness can facilitate speech perception. We used a word categorization task in which participants listened to spoken words and determined as quickly and accurately as possible whether or not each word named a tool. While participants listened to the words they watched a visual display that presented a video clip of the speaker synchronously speaking the auditorily presented words, or the same speaker articulating different words. Critically, the speaker’s face was either visible (the aware trials), or suppressed from awareness using continuous flash suppression. Aware and suppressed trials were randomly intermixed. A secondary probe-detection task ensured that participants attended to the mouth region regardless of whether the face was visible or suppressed. On the aware trials responses to the tool targets were no faster with the synchronous than asynchronous lip movements, perhaps because the visual information was inconsistent with the auditory information on 50% of the trials. However, on the suppressed trials responses to the tool targets were significantly faster with the synchronous than asynchronous lip movements. These results demonstrate that even when a random dynamic mask renders a face invisible, lip movements are processed by the visual system with sufficiently high temporal resolution to facilitate speech perception.


Sign in / Sign up

Export Citation Format

Share Document