Own-race faces promote integrated audiovisual speech information

2021 ◽  
pp. 174702182110444
Author(s):  
Yuta Ujiie ◽  
Kohske Takahashi

The other-race effect indicates a perceptual advantage when processing own-race faces. This effect has been demonstrated in individuals’ recognition of facial identity and emotional expressions. However, it remains unclear whether the other-race effect also exists in multisensory domains. We conducted two experiments to provide evidence for the other-race effect in facial speech recognition, using the McGurk effect. Experiment 1 tested this issue among East Asian adults, examining the magnitude of the McGurk effect during stimuli using speakers from two different races (own-race vs. other-race). We found that own-race faces induced a stronger McGurk effect than other-race faces. Experiment 2 indicated that the other-race effect was not simply due to different levels of attention being paid to the mouths of own- and other-race speakers. Our findings demonstrated that own-race faces enhance the weight of visual input during audiovisual speech perception, and they provide evidence of the own-race effect in the audiovisual interaction for speech perception in adults.

Author(s):  
Lawrence D. Rosenblum

Research on visual and audiovisual speech information has profoundly influenced the fields of psycholinguistics, perception psychology, and cognitive neuroscience. Visual speech findings have provided some of most the important human demonstrations of our new conception of the perceptual brain as being supremely multimodal. This “multisensory revolution” has seen a tremendous growth in research on how the senses integrate, cross-facilitate, and share their experience with one another. The ubiquity and apparent automaticity of multisensory speech has led many theorists to propose that the speech brain is agnostic with regard to sense modality: it might not know or care from which modality speech information comes. Instead, the speech function may act to extract supramodal informational patterns that are common in form across energy streams. Alternatively, other theorists have argued that any common information existent across the modalities is minimal and rudimentary, so that multisensory perception largely depends on the observer’s associative experience between the streams. From this perspective, the auditory stream is typically considered primary for the speech brain, with visual speech simply appended to its processing. If the utility of multisensory speech is a consequence of a supramodal informational coherence, then cross-sensory “integration” may be primarily a consequence of the informational input itself. If true, then one would expect to see evidence for integration occurring early in the perceptual process, as well in a largely complete and automatic/impenetrable manner. Alternatively, if multisensory speech perception is based on associative experience between the modal streams, then no constraints on how completely or automatically the senses integrate are dictated. There is behavioral and neurophysiological research supporting both perspectives. Much of this research is based on testing the well-known McGurk effect, in which audiovisual speech information is thought to integrate to the extent that visual information can affect what listeners report hearing. However, there is now good reason to believe that the McGurk effect is not a valid test of multisensory integration. For example, there are clear cases in which responses indicate that the effect fails, while other measures suggest that integration is actually occurring. By mistakenly conflating the McGurk effect with speech integration itself, interpretations of the completeness and automaticity of multisensory may be incorrect. Future research should use more sensitive behavioral and neurophysiological measures of cross-modal influence to examine these issues.


2011 ◽  
Vol 24 (1) ◽  
pp. 67-90 ◽  
Author(s):  
Riikka Möttönen ◽  
Kaisa Tiippana ◽  
Mikko Sams ◽  
Hanna Puharinen

AbstractAudiovisual speech perception has been considered to operate independent of sound location, since the McGurk effect (altered auditory speech perception caused by conflicting visual speech) has been shown to be unaffected by whether speech sounds are presented in the same or different location as a talking face. Here we show that sound location effects arise with manipulation of spatial attention. Sounds were presented from loudspeakers in five locations: the centre (location of the talking face) and 45°/90° to the left/right. Auditory spatial attention was focused on a location by presenting the majority (90%) of sounds from this location. In Experiment 1, the majority of sounds emanated from the centre, and the McGurk effect was enhanced there. In Experiment 2, the major location was 90° to the left, causing the McGurk effect to be stronger on the left and centre than on the right. Under control conditions, when sounds were presented with equal probability from all locations, the McGurk effect tended to be stronger for sounds emanating from the centre, but this tendency was not reliable. Additionally, reaction times were the shortest for a congruent audiovisual stimulus, and this was the case independent of location. Our main finding is that sound location can modulate audiovisual speech perception, and that spatial attention plays a role in this modulation.


2018 ◽  
Vol 31 (1-2) ◽  
pp. 7-18 ◽  
Author(s):  
John MacDonald

In 1976 Harry McGurk and I published a paper in Nature, entitled ‘Hearing Lips and Seeing Voices’. The paper described a new audio–visual illusion we had discovered that showed the perception of auditorily presented speech could be influenced by the simultaneous presentation of incongruent visual speech. This hitherto unknown effect has since had a profound impact on audiovisual speech perception research. The phenomenon has come to be known as the ‘McGurk effect’, and the original paper has been cited in excess of 4800 times. In this paper I describe the background to the discovery of the effect, the rationale for the generation of the initial stimuli, the construction of the exemplars used and the serendipitous nature of the finding. The paper will also cover the reaction (and non-reaction) to the Nature publication, the growth of research on, and utilizing the ‘McGurk effect’ and end with some reflections on the significance of the finding.


Author(s):  
Yuta Ujiie ◽  
So Kanazawa ◽  
Masami K. Yamaguchi

AbstractThis study investigated the difference in the McGurk effect between own-race-face and other-race-face stimuli among Japanese infants from 5 to 9 months of age. The McGurk effect results from infants using information from a speaker’s face in audiovisual speech integration. We hypothesized that the McGurk effect varies with the speaker’s race because of the other-race effect, which indicates an advantage for own-race faces in our face processing system. Experiment 1 demonstrated the other-race effect on audiovisual speech integration such that the infants ages 5–6 months and 8–9 months are likely to perceive the McGurk effect when observing an own-race-face speaker, but not when observing an other-race-face speaker. Experiment 2 found the other-race effect on audiovisual speech integration regardless of irrelevant speech identity cues. Experiment 3 confirmed the infants’ ability to differentiate two auditory syllables. These results showed that infants are likely to integrate voice with an own-race-face, but not with an other-race-face. This implies the role of experiences with own-race-faces in the development of audiovisual speech integration. Our findings also contribute to the discussion of whether perceptual narrowing is a modality-general, pan-sensory process.


Perception ◽  
1997 ◽  
Vol 26 (1_suppl) ◽  
pp. 347-347
Author(s):  
M Sams

Persons with hearing loss use visual information from articulation to improve their speech perception. Even persons with normal hearing utilise visual information, especially when the stimulus-to-noise ratio is poor. A dramatic demonstration of the role of vision in speech perception is the audiovisual fusion called the ‘McGurk effect’. When the auditory syllable /pa/ is presented in synchrony with the face articulating the syllable /ka/, the subject usually perceives /ta/ or /ka/. The illusory perception is clearly auditory in nature. We recently studied the audiovisual fusion (acoustical /p/, visual /k/) for Finnish (1) syllables, and (2) words. Only 3% of the subjects perceived the syllables according to the acoustical input, ie in 97% of the subjects the perception was influenced by the visual information. For words the percentage of acoustical identifications was 10%. The results demonstrate a very strong influence of visual information of articulation in face-to-face speech perception. Word meaning and sentence context have a negligible influence on the fusion. We have also recorded neuromagnetic responses of the human cortex when the subjects both heard and saw speech. Some subjects showed a distinct response to a ‘McGurk’ stimulus. The response was rather late, emerging about 200 ms from the onset of the auditory stimulus. We suggest that the perisylvian cortex, close to the source area for the auditory 100 ms response (M100), may be activated by the discordant stimuli. The behavioural and neuromagnetic results suggest a precognitive audiovisual speech integration occurring at a relatively early processing level.


2020 ◽  
Vol 10 (6) ◽  
pp. 328
Author(s):  
Melissa Randazzo ◽  
Ryan Priefer ◽  
Paul J. Smith ◽  
Amanda Nagler ◽  
Trey Avery ◽  
...  

The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.


Sign in / Sign up

Export Citation Format

Share Document