Developmental change in children’s speech processing of auditory and visual cues: An eyetracking study

2021 ◽  
pp. 1-25
Author(s):  
Tania S. ZAMUNER ◽  
Theresa RABIDEAU ◽  
Margarethe MCDONALD ◽  
H. Henny YEUNG

Abstract This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.

2020 ◽  
Vol 41 (4) ◽  
pp. 933-961
Author(s):  
Rebecca Holt ◽  
Laurence Bruggeman ◽  
Katherine Demuth

AbstractProcessing speech can be slow and effortful for children, especially in adverse listening conditions, such as the classroom. This can have detrimental effects on children’s academic achievement. We therefore asked whether primary school children’s speech processing could be made faster and less effortful via the presentation of visual speech cues (speaker’s facial movements), and whether any audio-visual benefit would be modulated by the presence of noise or by characteristics of individual participants. A phoneme monitoring task with concurrent pupillometry was used to measure 7- to 11-year-old children’s speech processing speed and effort, with and without visual cues, in both quiet and noise. Results demonstrated that visual cues to speech can facilitate children’s speech processing, but that these benefits may also be subject to variability according to children’s motivation. Children showed faster processing and reduced effort when visual cues were available, regardless of listening condition. However, examination of individual variability revealed that the reduction in effort was driven by the children who performed better on a measure of phoneme isolation (used to quantify how difficult they found the phoneme monitoring task).


RELC Journal ◽  
2020 ◽  
pp. 003368822096663
Author(s):  
Debra M Hardison ◽  
Martha C Pennington

This article reviews research findings involving visual input in speech processing in the form of facial cues and co-speech gestures for second-language (L2) learners, and provides pedagogical implications for the teaching of listening and speaking. It traces the foundations of auditory–visual speech research and explores the role of a speaker’s facial cues in L2 perception training and gestural cues in listening comprehension. There is a strong role for pedagogy to maximize the salience of multimodal cues for L2 learners. Visible articulatory gestures that precede the acoustic signal and the preparation phase of a hand gesture that precedes the acoustic onset of a word provide a priming effect on perceivers’ attention to signal upcoming information and facilitate processing, and visible gestures that co-occur with speech aid ongoing processing and comprehension. L2 learners benefit from an awareness of these visual cues and exposure to input.


2020 ◽  
Author(s):  
Aisling E. O’Sullivan ◽  
Michael J. Crosse ◽  
Giovanni M. Di Liberto ◽  
Alain de Cheveigné ◽  
Edmund C. Lalor

AbstractSeeing a speaker’s face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker’s face provides temporal cues to auditory cortex, and articulatory information from the speaker’s mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However it remains unclear how the integration of these cues varies as a function of listening conditions. Here we sought to provide insight on these questions by examining EEG responses to natural audiovisual, audio, and visual speech in quiet and in noise. Specifically, we represented our speech stimuli in terms of their spectrograms and their phonetic features, and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis. The encoding of both spectrotemporal and phonetic features was shown to be more robust in audiovisual speech responses then what would have been expected from the summation of the audio and visual speech responses, consistent with the literature on multisensory integration. Furthermore, the strength of this multisensory enhancement was more pronounced at the level of phonetic processing for speech in noise relative to speech in quiet, indicating that listeners rely more on articulatory details from visual speech in challenging listening conditions. These findings support the notion that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize comprehension based on the current listening conditions.Significance StatementDuring conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here we examine audiovisual integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how audiovisual integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions, and when the speech is noisy, we find enhanced integration at the phonetic stage of processing. These findings provide support for the multistage integration framework and demonstrate its flexibility in terms of a greater reliance on visual articulatory information in challenging listening conditions.


Author(s):  
Katherine M. Simeon ◽  
Tina M. Grieco-Calub

Purpose The purpose of this study was to examine the extent to which phonological competition and semantic priming influence lexical access in school-aged children with cochlear implants (CIs) and children with normal acoustic hearing. Method Participants included children who were 5–10 years of age with either normal hearing ( n = 41) or bilateral severe to profound sensorineural hearing loss and used CIs ( n = 13). All participants completed a two-alternative forced-choice task while eye gaze to visual images was recorded and quantified during a word recognition task. In this task, the target image was juxtaposed with a competitor image that was either a phonological onset competitor (i.e., shared the same initial consonant–vowel–consonant syllable as the target) or an unrelated distractor. Half of the trials were preceded by an image prime that was semantically related to the target image. Results Children with CIs showed evidence of phonological competition during real-time processing of speech. This effect, however, was less and occurred later in the time course of speech processing than what was observed in children with normal hearing. The presence of a semantically related visual prime reduced the effects of phonological competition in both groups of children but to a greater degree in children with CIs. Conclusions Children with CIs were able to process single words similarly to their counterparts with normal hearing. However, children with CIs appeared to have increased reliance on surrounding semantic information compared to their normal-hearing counterparts.


2009 ◽  
Vol 20 (5) ◽  
pp. 539-542 ◽  
Author(s):  
Catherine T. Best ◽  
Michael D. Tyler ◽  
Tiffany N. Gooding ◽  
Corey B. Orlando ◽  
Chelsea A. Quann

Efficient word recognition depends on detecting critical phonetic differences among similar-sounding words, or sensitivity to phonological distinctiveness, an ability evident at 19 months of age but unreliable at 14 to 15 months of age. However, little is known about phonological constancy, the equally crucial ability to recognize a word's identity across natural phonetic variations, such as those in cross-dialect pronunciation differences. We show that 15- and 19-month-old children recognize familiar words spoken in their native dialect, but that only the older children recognize familiar words in a dissimilar nonnative dialect, providing evidence for emergence of phonological constancy by 19 months. These results are compatible with a perceptual-attunement account of developmental change in early word recognition, but not with statistical-learning or phonological accounts. Thus, the complementary skills of phonological constancy and distinctiveness both appear at around 19 months of age, together providing the child with a fundamental insight that permits rapid vocabulary growth and later reading acquisition.


Author(s):  
Liesbeth Gijbels ◽  
Jason D. Yeatman ◽  
Kaylah Lalonde ◽  
Adrian K. C. Lee

Purpose It is generally accepted that adults use visual cues to improve speech intelligibility in noisy environments, but findings regarding visual speech benefit in children are mixed. We explored factors that contribute to audiovisual (AV) gain in young children's speech understanding. We examined whether there is an AV benefit to speech-in-noise recognition in children in first grade and if visual salience of phonemes influences their AV benefit. We explored if individual differences in AV speech enhancement could be explained by vocabulary knowledge, phonological awareness, or general psychophysical testing performance. Method Thirty-seven first graders completed online psychophysical experiments. We used an online single-interval, four-alternative forced-choice picture-pointing task with age-appropriate consonant–vowel–consonant words to measure auditory-only, visual-only, and AV word recognition in noise at −2 and −8 dB SNR. We obtained standard measures of vocabulary and phonological awareness and included a general psychophysical test to examine correlations with AV benefits. Results We observed a significant overall AV gain among children in first grade. This effect was mainly attributed to the benefit at −8 dB SNR, for visually distinct targets. Individual differences were not explained by any of the child variables. Boys showed lower auditory-only performances, leading to significantly larger AV gains. Conclusions This study shows AV benefit, of distinctive visual cues, to word recognition in challenging noisy conditions in first graders. The cognitive and linguistic constraints of the task may have minimized the impact of individual differences of vocabulary and phonological awareness on AV benefit. The gender difference should be studied on a larger sample and age range.


2007 ◽  
Vol 34 (2) ◽  
pp. 227-249 ◽  
Author(s):  
NEREYDA HURTADO ◽  
VIRGINIA A. MARCHMAN ◽  
ANNE FERNALD

Research on the development of efficiency in spoken language understanding has focused largely on middle-class children learning English. Here we extend this research to Spanish-learning children (n=49; M=2;0; range=1;3–3;1) living in the USA in Latino families from primarily low socioeconomic backgrounds. Children looked at pictures of familiar objects while listening to speech naming one of the objects. Analyses of eye movements revealed developmental increases in the efficiency of speech processing. Older children and children with larger vocabularies were more efficient at processing spoken language as it unfolds in real time, as previously documented with English learners. Children whose mothers had less education tended to be slower and less accurate than children of comparable age and vocabulary size whose mothers had more schooling, consistent with previous findings of slower rates of language learning in children from disadvantaged backgrounds. These results add to the cross-linguistic literature on the development of spoken word recognition and to the study of the impact of socioeconomic status (SES) factors on early language development.


2020 ◽  
Vol 6 (45) ◽  
pp. eabc6348
Author(s):  
Raphaël Thézé ◽  
Anne-Lise Giraud ◽  
Pierre Mégevand

When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.


1997 ◽  
Author(s):  
Paul D. Allopenna ◽  
James S. Magnuson ◽  
Michael K. Tanenhaus

Sign in / Sign up

Export Citation Format

Share Document