Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects

The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept (McGurk & MacDonald, 1976). McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration (e.g., Alsius et al., 2005), but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing—susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal (Van Engen et al., 2017), and distinct cortical regions are recruited when processing congruent versus incongruent speech (Erickson et al., 2014). In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed (Erickson et al., 2014; Van Engen, Xie, & Chandrasekaran, 2017), they do not appear to differ in terms of processing time or effort. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.

Download Full-text

An acquired deficit of audiovisual speech processing

Brain and Language ◽

10.1016/j.bandl.2006.02.001 ◽

2006 ◽

Vol 98 (1) ◽

pp. 66-73 ◽

Cited By ~ 22

Author(s):

Roy H. Hamilton ◽

Jeffrey T. Shenton ◽

H. Branch Coslett

Keyword(s):

Speech Processing ◽

Audiovisual Speech

Download Full-text

Developmental Trajectory of Audiovisual Speech Integration in Early Infancy. A Review of Studies Using the McGurk Paradigm

Psychology of Language and Communication ◽

10.1515/plc-2015-0006 ◽

2015 ◽

Vol 19 (2) ◽

pp. 77-100 ◽

Cited By ~ 4

Author(s):

Przemysław Tomalski

Keyword(s):

Speech Processing ◽

Language Production ◽

Developmental Trajectory ◽

First Year ◽

Audiovisual Speech ◽

Perceptual Development ◽

Phonological Knowledge ◽

Young Infants ◽

Age Related ◽

First Year Of Life

Abstract Apart from their remarkable phonological skills young infants prior to their first birthday show ability to match the mouth articulation they see with the speech sounds they hear. They are able to detect the audiovisual conflict of speech and to selectively attend to articulating mouth depending on audiovisual congruency. Early audiovisual speech processing is an important aspect of language development, related not only to phonological knowledge, but also to language production during subsequent years. Th is article reviews recent experimental work delineating the complex developmental trajectory of audiovisual mismatch detection. Th e central issue is the role of age-related changes in visual scanning of audiovisual speech and the corresponding changes in neural signatures of audiovisual speech processing in the second half of the first year of life. Th is phenomenon is discussed in the context of recent theories of perceptual development and existing data on the neural organisation of the infant ‘social brain’.

Download Full-text

Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments

Scientific Reports ◽

10.1038/s41598-020-72375-y ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Raphaël Thézé ◽

Mehdi Ali Gadiri ◽

Louis Albert ◽

Antoine Provost ◽

Anne-Lise Giraud ◽

...

Keyword(s):

Speech Processing ◽

Background Noise ◽

Mcgurk Effect ◽

Visual Speech ◽

Natural Speech ◽

Virtual Characters ◽

Speech Stimuli ◽

Stimulus Timing ◽

Phonetic Features ◽

Set Up

Abstract Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.

Download Full-text

Audiovisual Speech Perception: Acoustic and Visual Phonetic Features Contributing to the McGurk Effect

i-Perception ◽

10.1068/ic768 ◽

2011 ◽

Vol 2 (8) ◽

pp. 768-768

Author(s):

Kaisa Tiippana ◽

Martti Vainio ◽

Mikko Tiainen

Keyword(s):

Speech Perception ◽

Mcgurk Effect ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Phonetic Features

Download Full-text

Distinct cortical locations for integration of audiovisual speech and the McGurk effect

Frontiers in Psychology ◽

10.3389/fpsyg.2014.00534 ◽

2014 ◽

Vol 5 ◽

Cited By ~ 27

Author(s):

Laura C. Erickson ◽

Brandon A. Zielinski ◽

Jennifer E. V. Zielinski ◽

Guoying Liu ◽

Peter E. Turkeltaub ◽

...

Keyword(s):

Mcgurk Effect ◽

Audiovisual Speech

Download Full-text

Relative Contribution of Auditory and Visual Information to Mandarin Chinese Tone Identification by Native and Tone-naïve Listeners

Language and Speech ◽

10.1177/0023830919889995 ◽

2019 ◽

Vol 63 (4) ◽

pp. 856-876

Author(s):

Yueqiao Han ◽

Martijn Goudbeek ◽

Maria Mos ◽

Marc Swerts

Keyword(s):

Mandarin Chinese ◽

Visual Information ◽

Native Speakers ◽

Mcgurk Effect ◽

Auditory Speech ◽

Relative Contribution ◽

Lip Reading ◽

Show Evidence ◽

Visual Materials ◽

Chinese Subjects

Speech perception is a multisensory process: what we hear can be affected by what we see. For instance, the McGurk effect occurs when auditory speech is presented in synchrony with discrepant visual information. A large number of studies have targeted the McGurk effect at the segmental level of speech (mainly consonant perception), which tends to be visually salient (lip-reading based), while the present study aims to extend the existing body of literature to the suprasegmental level, that is, investigating a McGurk effect for the identification of tones in Mandarin Chinese. Previous studies have shown that visual information does play a role in Chinese tone perception, and that the different tones correlate with variable movements of the head and neck. We constructed various tone combinations of congruent and incongruent auditory-visual materials (10 syllables with 16 tone combinations each) and presented them to native speakers of Mandarin Chinese and speakers of tone-naïve languages. In line with our previous work, we found that tone identification varies with individual tones, with tone 3 (the low-dipping tone) being the easiest one to identify, whereas tone 4 (the high-falling tone) was the most difficult one. We found that both groups of participants mainly relied on auditory input (instead of visual input), and that the auditory reliance for Chinese subjects was even stronger. The results did not show evidence for auditory-visual integration among native participants, while visual information is helpful for tone-naïve participants. However, even for this group, visual information only marginally increases the accuracy in the tone identification task, and this increase depends on the tone in question.

Download Full-text