scholarly journals The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech

2020 ◽  
Vol 6 (45) ◽  
pp. eabc6348
Author(s):  
Raphaël Thézé ◽  
Anne-Lise Giraud ◽  
Pierre Mégevand

When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.

2020 ◽  
Author(s):  
Aisling E. O’Sullivan ◽  
Michael J. Crosse ◽  
Giovanni M. Di Liberto ◽  
Alain de Cheveigné ◽  
Edmund C. Lalor

AbstractSeeing a speaker’s face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker’s face provides temporal cues to auditory cortex, and articulatory information from the speaker’s mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However it remains unclear how the integration of these cues varies as a function of listening conditions. Here we sought to provide insight on these questions by examining EEG responses to natural audiovisual, audio, and visual speech in quiet and in noise. Specifically, we represented our speech stimuli in terms of their spectrograms and their phonetic features, and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis. The encoding of both spectrotemporal and phonetic features was shown to be more robust in audiovisual speech responses then what would have been expected from the summation of the audio and visual speech responses, consistent with the literature on multisensory integration. Furthermore, the strength of this multisensory enhancement was more pronounced at the level of phonetic processing for speech in noise relative to speech in quiet, indicating that listeners rely more on articulatory details from visual speech in challenging listening conditions. These findings support the notion that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize comprehension based on the current listening conditions.Significance StatementDuring conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here we examine audiovisual integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how audiovisual integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions, and when the speech is noisy, we find enhanced integration at the phonetic stage of processing. These findings provide support for the multistage integration framework and demonstrate its flexibility in terms of a greater reliance on visual articulatory information in challenging listening conditions.


2020 ◽  
Author(s):  
Jonathan E Peelle ◽  
Brent Spehar ◽  
Michael S Jones ◽  
Sarah McConkey ◽  
Joel Myerson ◽  
...  

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.


Author(s):  
Nada Chaari ◽  
Hatice Camgöz Akdağ ◽  
Islem Rekik

Abstract The estimation of a connectional brain template (CBT) integrating a population of brain networks while capturing shared and differential connectional patterns across individuals remains unexplored in gender fingerprinting. This paper presents the first study to estimate gender-specific CBTs using multi-view cortical morphological networks (CMNs) estimated from conventional T1-weighted magnetic resonance imaging (MRI). Specifically, each CMN view is derived from a specific cortical attribute (e.g. thickness), encoded in a network quantifying the dissimilarity in morphology between pairs of cortical brain regions. To this aim, we propose Multi-View Clustering and Fusion Network (MVCF-Net), a novel multi-view network fusion method, which can jointly identify consistent and differential clusters of multi-view datasets in order to capture simultaneously similar and distinct connectional traits of samples. Our MVCF-Net method estimates a representative and well-centered CBTs for male and female populations, independently, to eventually identify their fingerprinting regions of interest (ROIs) in four main steps. First, we perform multi-view network clustering model based on manifold optimization which groups CMNs into shared and differential clusters while preserving their alignment across views. Second, for each view, we linearly fuse CMNs belonging to each cluster, producing local CBTs. Third, for each cluster, we non-linearly integrate the local CBTs across views, producing a cluster-specific CBT. Finally, by linearly fusing the cluster-specific centers we estimate a final CBT of the input population. MVCF-Net produced the most centered and representative CBTs for male and female populations and identified the most discriminative ROIs marking gender differences. The most two gender-discriminative ROIs involved the lateral occipital cortex and pars opercularis in the left hemisphere and the middle temporal gyrus and lingual gyrus in the right hemisphere.


NeuroImage ◽  
2019 ◽  
Vol 192 ◽  
pp. 76-87 ◽  
Author(s):  
Zhenghan Qi ◽  
Michelle Han ◽  
Yunxin Wang ◽  
Carlo de los Angeles ◽  
Qi Liu ◽  
...  

Author(s):  
Lise Van der Haegen ◽  
Qing Cai

It is intriguing that the two brain halves of the human brain look so similar, but are in fact quite different at the anatomical level, and even more so at the functional level. In particular, the highly frequent co-occurrence of right-handedness and left hemisphere dominance of language has led to an abundance of laterality research. This chapter discusses the most important recent finding on laterality (i.e., left or right hemisphere) and degree of hemispheric specialization for speech production, auditory speech processing, and reading. Following a descriptive overview of these three core sub-processes of language, the chapter summarizes possible influences on the lateralization of each, including anatomical, evolutionary, genetic, developmental, and experiential factors, as well as handedness and impairment. It will become clear that language is a heterogeneous cognitive function driven by a variety of underpinning origins. Next, the often-underestimated role of the right hemisphere for language is discussed with respect to prosody and metaphor comprehension, as well as individual differences in the lateralization of healthy and language-impaired brains. Finally, recent insights into the relationship between lateralized language and non-language functions are discussed, highlighting the unique contribution of lateralization research to the growing knowledge of general human brain mechanisms.


2021 ◽  
pp. 1-25
Author(s):  
Tania S. ZAMUNER ◽  
Theresa RABIDEAU ◽  
Margarethe MCDONALD ◽  
H. Henny YEUNG

Abstract This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.


Author(s):  
Andrew Kirk ◽  
L.C. Ang

Abstract:A 64-year-old man presented with a three day history of progressive Broca’s aphasia, followed within 3 weeks by exclusively right-sided myoclonus, rigidity, and dystonia. Within 4 weeks he was globally aphasie. He died within 7 weeks of onset. In the final week, rigidity and myoclonus became bilateral. CT and MRI were normal. SPECT showed diminished perfusion of the left hemisphere. EEG showed periodic discharges on the left. At autopsy, there were marked cortical spongiform change, neuronal loss, and gliosis throughout the left hemisphere and in the right occipital cortex. Elsewhere in the right hemisphere, spongiform change was non-existent to minimal. There was moderate spongiform change in the molecular layer of the cerebellar cortex, much more marked on the left. Clinical and pathological unilateral cerebral predominance extended to the ipsilateral cerebellum. Creutzfeldt-Jakob disease is an important consideration in patients with rapidly progressive unilateral cerebral signs associated with a movement disorder.


2011 ◽  
Vol 24 (1) ◽  
pp. 67-90 ◽  
Author(s):  
Riikka Möttönen ◽  
Kaisa Tiippana ◽  
Mikko Sams ◽  
Hanna Puharinen

AbstractAudiovisual speech perception has been considered to operate independent of sound location, since the McGurk effect (altered auditory speech perception caused by conflicting visual speech) has been shown to be unaffected by whether speech sounds are presented in the same or different location as a talking face. Here we show that sound location effects arise with manipulation of spatial attention. Sounds were presented from loudspeakers in five locations: the centre (location of the talking face) and 45°/90° to the left/right. Auditory spatial attention was focused on a location by presenting the majority (90%) of sounds from this location. In Experiment 1, the majority of sounds emanated from the centre, and the McGurk effect was enhanced there. In Experiment 2, the major location was 90° to the left, causing the McGurk effect to be stronger on the left and centre than on the right. Under control conditions, when sounds were presented with equal probability from all locations, the McGurk effect tended to be stronger for sounds emanating from the centre, but this tendency was not reliable. Additionally, reaction times were the shortest for a congruent audiovisual stimulus, and this was the case independent of location. Our main finding is that sound location can modulate audiovisual speech perception, and that spatial attention plays a role in this modulation.


RELC Journal ◽  
2020 ◽  
pp. 003368822096663
Author(s):  
Debra M Hardison ◽  
Martha C Pennington

This article reviews research findings involving visual input in speech processing in the form of facial cues and co-speech gestures for second-language (L2) learners, and provides pedagogical implications for the teaching of listening and speaking. It traces the foundations of auditory–visual speech research and explores the role of a speaker’s facial cues in L2 perception training and gestural cues in listening comprehension. There is a strong role for pedagogy to maximize the salience of multimodal cues for L2 learners. Visible articulatory gestures that precede the acoustic signal and the preparation phase of a hand gesture that precedes the acoustic onset of a word provide a priming effect on perceivers’ attention to signal upcoming information and facilitate processing, and visible gestures that co-occur with speech aid ongoing processing and comprehension. L2 learners benefit from an awareness of these visual cues and exposure to input.


2019 ◽  
Author(s):  
Violet Aurora Brown ◽  
Julia Feld Strand

The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept (McGurk & MacDonald, 1976). McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration (e.g., Alsius et al., 2005), but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing—susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal (Van Engen et al., 2017), and distinct cortical regions are recruited when processing congruent versus incongruent speech (Erickson et al., 2014). In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed (Erickson et al., 2014; Van Engen, Xie, & Chandrasekaran, 2017), they do not appear to differ in terms of processing time or effort. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.


Sign in / Sign up

Export Citation Format

Share Document