scholarly journals Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit

2021 ◽  
Vol 11 (1) ◽  
pp. 49
Author(s):  
Kaylah Lalonde ◽  
Lynne A. Werner

The natural environments in which infants and children learn speech and language are noisy and multimodal. Adults rely on the multimodal nature of speech to compensate for noisy environments during speech communication. Multiple mechanisms underlie mature audiovisual benefit to speech perception, including reduced uncertainty as to when auditory speech will occur, use of correlations between the amplitude envelope of auditory and visual signals in fluent speech, and use of visual phonetic knowledge for lexical access. This paper reviews evidence regarding infants’ and children’s use of temporal and phonetic mechanisms in audiovisual speech perception benefit. The ability to use temporal cues for audiovisual speech perception benefit emerges in infancy. Although infants are sensitive to the correspondence between auditory and visual phonetic cues, the ability to use this correspondence for audiovisual benefit may not emerge until age four. A more cohesive account of the development of audiovisual speech perception may follow from a more thorough understanding of the development of sensitivity to and use of various temporal and phonetic cues.

2019 ◽  
Author(s):  
Jonathan E. Peelle

Understanding the neural systems supporting speech perception can shed light on the representations, processes, and variability in human communication. In the case of speech and language disorders, uncovering the neurological underpinnings can sometimes lead to surgical or medical treatments. Even in the case of healthy listeners, better understanding the interactions among hierarchical brain systems during speech processing can deepen our understanding of perceptual and language processes, and how these might be affected during development, hearing loss, or in background noise. Current neurobiological frameworks largely agree on the importance of bilateral temporal cortex for processing auditory speech, with the addition of left frontal cortex for more complex linguistic structures (such as sentences). Although visual cortex is clearly important for audiovisual speech processing, there is continued debate about where and how auditory and visual signals are integrated. Studies offer evidence supporting multisensory roles for posterior superior temporal sulcus, auditory cortex, and motor cortex. Rather than a single integration mechanism, it may be that visual and auditory inputs are combined in different ways depending on the type of information being processed. Importantly, core speech regions are not always sufficient for successfully understanding spoken language. Increased linguistic complexity or acoustic challenge forces listeners to recruit additional neural systems. In many cases compensatory activity is seen in executive and attention systems, such as the cingulo-opercular or frontoparietal networks. These patterns of increased activity appear to depend on the auditory and cognitive abilities of individual listeners, indicating a systems-level balance between neural systems that dynamically adjusts to the acoustic properties of the speech and current task demand. Speech perception is thus a shining example of flexible neural processing and behavioral stability.


Perception ◽  
10.1068/p3316 ◽  
2003 ◽  
Vol 32 (8) ◽  
pp. 921-936 ◽  
Author(s):  
Maxine V McCotter ◽  
Timothy R Jordan

We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.


2020 ◽  
Author(s):  
Anne-Marie Muller ◽  
Tyler C. Dalal ◽  
Ryan A Stevenson

Multisensory integration, the process by which sensory information from different sensory modalities are bound together, is hypothesized to contribute to perceptual symptomatology in schizophrenia, including hallucinations and aberrant speech perception. Differences in multisensory integration and temporal processing, an important component of multisensory integration, have been consistently found among individuals with schizophrenia. Evidence is emerging that these differences extend across the schizophrenia spectrum, including individuals in the general population with higher levels of schizotypal traits. In the current study, we measured (1) multisensory integration using an audiovisual speech-in-noise task, and the McGurk task. Using the speech-in-noise task, we assessed (2) susceptibility to distracting auditory speech to test the hypothesis that increased perception of distracting speech that is subsequently bound with mismatching visual speech contributes to hallucination-like experiences. As a measure of (3) temporal processing, we used the ternary synchrony judgment task. We measured schizotypal traits using the Schizotypal Personality Questionnaire (SPQ), hypothesizing that higher levels of schizotypal traits, specifically Unusual Perceptual Experiences and Odd Speech subscales, would be associated with (1) decreased multisensory integration, (2) increased susceptibility to distracting auditory speech, and (3) less precise temporal processing. Surprisingly, neither subscales were associated with any of the measures. These results suggest that these perceptual differences may not be present across the schizophrenia spectrum.


2020 ◽  
Vol 63 (7) ◽  
pp. 2245-2254 ◽  
Author(s):  
Jianrong Wang ◽  
Yumeng Zhu ◽  
Yu Chen ◽  
Abdilbar Mamat ◽  
Mei Yu ◽  
...  

Purpose The primary purpose of this study was to explore the audiovisual speech perception strategies.80.23.47 adopted by normal-hearing and deaf people in processing familiar and unfamiliar languages. Our primary hypothesis was that they would adopt different perception strategies due to different sensory experiences at an early age, limitations of the physical device, and the developmental gap of language, and others. Method Thirty normal-hearing adults and 33 prelingually deaf adults participated in the study. They were asked to perform judgment and listening tasks while watching videos of a Uygur–Mandarin bilingual speaker in a familiar language (Standard Chinese) or an unfamiliar language (Modern Uygur) while their eye movements were recorded by eye-tracking technology. Results Task had a slight influence on the distribution of selective attention, whereas subject and language had significant influences. To be specific, the normal-hearing and the d10eaf participants mainly gazed at the speaker's eyes and mouth, respectively, in the experiment; moreover, while the normal-hearing participants had to stare longer at the speaker's mouth when they confronted with the unfamiliar language Modern Uygur, the deaf participant did not change their attention allocation pattern when perceiving the two languages. Conclusions Normal-hearing and deaf adults adopt different audiovisual speech perception strategies: Normal-hearing adults mainly look at the eyes, and deaf adults mainly look at the mouth. Additionally, language and task can also modulate the speech perception strategy.


2007 ◽  
Vol 11 (4) ◽  
pp. 233-241 ◽  
Author(s):  
Nancy Tye-Murray ◽  
Mitchell Sommers ◽  
Brent Spehar

2019 ◽  
Vol 128 ◽  
pp. 93-100 ◽  
Author(s):  
Masahiro Imafuku ◽  
Masahiko Kawai ◽  
Fusako Niwa ◽  
Yuta Shinya ◽  
Masako Myowa

2020 ◽  
Author(s):  
Jonathan E Peelle ◽  
Brent Spehar ◽  
Michael S Jones ◽  
Sarah McConkey ◽  
Joel Myerson ◽  
...  

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.


Sign in / Sign up

Export Citation Format

Share Document