scholarly journals Visual speech is processed differently in auditory and visual cortex: evidence from MEG and partial coherence analysis

2021 ◽  
Author(s):  
Mate Aller ◽  
Heidi Solberg Okland ◽  
Lucy J MacGregor ◽  
Helen Blank ◽  
Matthew H. Davis

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase locking to auditory and visual signals in MEG recordings from 14 human participants (6 female) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared to audio-only (AO) speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus (STG) did not show above-chance partial coherence with visual speech signals during AV conditions, but did show partial coherence in VO conditions. Hence, visual speech enabled stronger phase locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.

eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Hyojin Park ◽  
Christoph Kayser ◽  
Gregor Thut ◽  
Joachim Gross

During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing.


2020 ◽  
Author(s):  
Jonathan E Peelle ◽  
Brent Spehar ◽  
Michael S Jones ◽  
Sarah McConkey ◽  
Joel Myerson ◽  
...  

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.


2009 ◽  
pp. 439-461
Author(s):  
Lynne E. Bernstein ◽  
Jintao Jiang

The information in optical speech signals is phonetically impoverished compared to the information in acoustic speech signals that are presented under good listening conditions. But high lipreading scores among prelingually deaf adults inform us that optical speech signals are in fact rich in phonetic information. Hearing lipreaders are not as accurate as deaf lipreaders, but they too demonstrate perception of detailed optical phonetic information. This chapter briefly sketches the historical context of and impediments to knowledge about optical phonetics and visual speech perception (lipreading). The authors review findings on deaf and hearing lipreaders. Then we review recent results on relationships between optical speech signals and visual speech perception. We extend the discussion of these relationships to the development of visual speech synthesis. We advocate for a close relationship between visual speech perception research and development of synthetic visible speech.


Author(s):  
Karthik Ganesan ◽  
John Plass ◽  
Adriene M. Beltz ◽  
Zhongming Liu ◽  
Marcia Grabowecky ◽  
...  

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.


2000 ◽  
Vol 84 (3) ◽  
pp. 1168-1179 ◽  
Author(s):  
Peter D. Larsen ◽  
Craig D. Lewis ◽  
Gerard L. Gebber ◽  
Sheng Zhong

We have studied the relationship between pulse synchronous baroreceptor input (represented by the arterial pulse, AP) and the cardiac-related rhythm in sympathetic nerve discharge (SND) of urethan-anesthetized cats by using partial autospectral and partial coherence analysis. Partial autospectral analysis was used to mathematically remove the portion of SND that can be directly attributed to the AP, while partial coherence analysis was used to removed the portion of the relationship between the discharges of sympathetic nerve pairs that can be attributed to linear AP-SND relationships that are common to the nerves. The ordinary autospectrum of SND (ASSND) and coherence functions relating the discharges of nerve pairs (CohSND-SND) contained a peak at the frequency of the heart beat. When the predominant mode of coordination between AP and SND was a phase walk, partialization of the autospectra of SND with AP (ASSND/AP) left considerable power in the cardiac-related band. In contrast, when the predominant mode of coordination between AP and SND was phase-locking, there was virtually no cardiac-related activity remaining in ASSND/AP. Partialization of CohSND-SND with AP reduced the peak coherence within the cardiac-related band in both modes of coordination but to a much greater extent during phase-locking. After baroreceptor denervation, CohSND-SND at the cardiac frequency remained significant, although a clear peak above background coherence was no longer apparent. These results are consistent with a model in which the central circuits controlling different sympathetic nerves share baroreceptor inputs and in addition are physically interconnected. The baroreceptor-sympathetic relationship contains both linear and nonlinear components, the former reflected by phase-locking and the latter by phase walk. The residual power in ASSND/AP during phase walk can be attributed to the nonlinear relationship, and the residual peak in partialized nerve-to-nerve coherence (CohSND-SND/AP) arises largely from nonlinearities that are common to the two nerves. During both phase walk and phase-locking, in addition to common nonlinear AP-SND relationships, coupling of the central circuits generating the nerve activities may contribute to CohSND-SND/APbecause significant CohSND-SND was still observed following baroreceptor denervation.


Perception ◽  
10.1068/p3316 ◽  
2003 ◽  
Vol 32 (8) ◽  
pp. 921-936 ◽  
Author(s):  
Maxine V McCotter ◽  
Timothy R Jordan

We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.


This paper reviews progress in understanding the psychology of lipreading and audio-visual speech perception. It considers four questions. What distinguishes better from poorer lipreaders? What are the effects of introducing a delay between the acoustical and optical speech signals? What have attempts to produce computer animations of talking faces contributed to our understanding of the visual cues that distinguish consonants and vowels? Finally, how should the process of audio-visual integration in speech perception be described; that is, how are the sights and sounds of talking faces represented at their conflux?


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Muge Ozker ◽  
Daniel Yoshor ◽  
Michael S Beauchamp

Human faces contain multiple sources of information. During speech perception, visual information from the talker’s mouth is integrated with auditory information from the talker's voice. By directly recording neural responses from small populations of neurons in patients implanted with subdural electrodes, we found enhanced visual cortex responses to speech when auditory speech was absent (rendering visual speech especially relevant). Receptive field mapping demonstrated that this enhancement was specific to regions of the visual cortex with retinotopic representations of the mouth of the talker. Connectivity between frontal cortex and other brain regions was measured with trial-by-trial power correlations. Strong connectivity was observed between frontal cortex and mouth regions of visual cortex; connectivity was weaker between frontal cortex and non-mouth regions of visual cortex or auditory cortex. These results suggest that top-down selection of visual information from the talker’s mouth by frontal cortex plays an important role in audiovisual speech perception.


Sign in / Sign up

Export Citation Format

Share Document