Automatic audiovisual integration in speech perception

2005 ◽  
Vol 167 (1) ◽  
pp. 66-75 ◽  
Author(s):  
Maurizio Gentilucci ◽  
Luigi Cattaneo
2012 ◽  
Author(s):  
Joseph D. W. Stephens ◽  
Julian L. Scrivens ◽  
Amy A. Overman

1995 ◽  
Vol 48 (2) ◽  
pp. 320-333 ◽  
Author(s):  
Eugen Diesch

If a place-of-articulation contrast is created between the auditory and the visual component syllables of videotaped speech, frequently the syllable that listeners report they have heard differs phonetically from the auditory component. These “McGurk effects”, as they have come to be called, show that speech perception may involve some kind of intermodal process. There are two classes of these phenomena: fusions and combinations. Perception of the syllable /da/ when auditory /ba/ and visual /ga/ are presented provides a clear example of the former, and perception of the string /bga/ after presentation of auditory /ga/ and visual /ba/ an unambiguous instance of the latter. Besides perceptual fusions and combinations, hearing visually presented component syllables also shows an influence of vision on audition. It is argued that these “visual” responses arise from basically the same underlying processes that yield fusions and combinations, respectively. In the present study, the visual component of audiovisually incongruous CV-syllables was presented in the left and the right visual hemifield, respectively. Audiovisual fusion responses showed a left hemifield advantage, and audiovisual combination responses a right hemifield advantage. This finding suggests that the process of audiovisual integration differs between audiovisual fusions and combinations and, furthermore, that the two cerebral hemispheres contribute differentially to the two classes of response.


2012 ◽  
Vol 25 (0) ◽  
pp. 105 ◽  
Author(s):  
Tobias Søren Andersen

Seeing the talking face can influence the phoneme perceived from the voice. This facilitates speech perception in the natural case where the face and voice are congruent and can cause the McGurk illusion when they are not. The classical example of the McGurk illusion is when acoustic /aba/ is perceived as /ada/ when dubbed onto a face articulating /aga/. In order to fully understand the underlying process of integrating information across the senses we need a computational account with predictive power. The Fuzzy Logical Model of Perception is one computational account of audiovisual integration in speech perception. Here we describe alternative accounts in which integration is based on an early continuous internal representation on which the phonetic classes fall. We show that these alternative accounts can provide just as good a fit when corrected for the number of free parameters. We also show, using cross-validation, that they have greater, but not great, predictive power. Finally, we show that introducing a regularization term can amend the lack of predictive power. With regularization, models based on continuous representations have the highest predictive power.


2007 ◽  
Vol 60 (10) ◽  
pp. 1446-1456 ◽  
Author(s):  
Stefan R. Schweinberger ◽  
David Robertson ◽  
Jürgen M. Kaufmann

While audiovisual integration is well known in speech perception, faces and speech are also informative with respect to speaker recognition. To date, audiovisual integration in the recognition of familiar people has never been demonstrated. Here we show systematic benefits and costs for the recognition of familiar voices when these are combined with time-synchronized articulating faces, of corresponding or noncorresponding speaker identity, respectively. While these effects were strong for familiar voices, they were smaller or nonsignificant for unfamiliar voices, suggesting that the effects depend on the previous creation of a multimodal representation of a person's identity. Moreover, the effects were reduced or eliminated when voices were combined with the same faces presented as static pictures, demonstrating that the effects do not simply reflect the use of facial identity as a “cue” for voice recognition. This is the first direct evidence for audiovisual integration in person recognition.


Sign in / Sign up

Export Citation Format

Share Document