scholarly journals Multisensory Integration-Attention Trade-Off in Cochlear-Implanted Deaf Individuals

2021 ◽  
Vol 15 ◽  
Author(s):  
Luuk P. H. van de Rijt ◽  
A. John van Opstal ◽  
Marc M. van Wanrooij

The cochlear implant (CI) allows profoundly deaf individuals to partially recover hearing. Still, due to the coarse acoustic information provided by the implant, CI users have considerable difficulties in recognizing speech, especially in noisy environments. CI users therefore rely heavily on visual cues to augment speech recognition, more so than normal-hearing individuals. However, it is unknown how attention to one (focused) or both (divided) modalities plays a role in multisensory speech recognition. Here we show that unisensory speech listening and reading were negatively impacted in divided-attention tasks for CI users—but not for normal-hearing individuals. Our psychophysical experiments revealed that, as expected, listening thresholds were consistently better for the normal-hearing, while lipreading thresholds were largely similar for the two groups. Moreover, audiovisual speech recognition for normal-hearing individuals could be described well by probabilistic summation of auditory and visual speech recognition, while CI users were better integrators than expected from statistical facilitation alone. Our results suggest that this benefit in integration comes at a cost. Unisensory speech recognition is degraded for CI users when attention needs to be divided across modalities. We conjecture that CI users exhibit an integration-attention trade-off. They focus solely on a single modality during focused-attention tasks, but need to divide their limited attentional resources in situations with uncertainty about the upcoming stimulus modality. We argue that in order to determine the benefit of a CI for speech recognition, situational factors need to be discounted by presenting speech in realistic or complex audiovisual environments.

2020 ◽  
Author(s):  
Luuk P.H. van de Rijt ◽  
A. John van Opstal ◽  
Marc M. van Wanrooij

AbstractThe cochlear implant (CI) allows profoundly deaf individuals to partially recover hearing. Still, due to the coarse acoustic information provided by the implant, CI users have considerable difficulties in recognizing speech, especially in noisy environments, even years after implantation. CI users therefore rely heavily on visual cues to augment speech comprehension, more so than normal-hearing individuals. However, it is unknown how attention to one (focused) or both (divided) modalities plays a role in multisensory speech recognition. Here we show that unisensory speech listening and speech reading were negatively impacted in divided-attention tasks for CI users - but not for normal-hearing individuals. Our psychophysical experiments revealed that, as expected, listening thresholds were consistently better for the normal-hearing, while lipreading thresholds were largely similar for the two groups. Moreover, audiovisual speech recognition for normal-hearing individuals could be described well by probabilistic summation of auditory and visual speech recognition, while CI users were better integrators than expected from statistical facilitation alone. Our results suggest that this benefit in integration, however, comes at a cost. Unisensory speech recognition is degraded for CI users when attention needs to be divided across modalities, i.e. in situations with uncertainty about the upcoming stimulus modality. We conjecture that CI users exhibit an integration-attention trade-off. They focus solely on a single modality during focused-attention tasks, but need to divide their limited attentional resources to more modalities during divided-attention tasks. We argue that in order to determine the benefit of a CI for speech comprehension, situational factors need to be discounted by presenting speech in realistic or complex audiovisual environments.Significance statementDeaf individuals using a cochlear implant require significant amounts of effort to listen in noisy environments due to their impoverished hearing. Lipreading can benefit them and reduce the burden of listening by providing an additional source of information. Here we show that the improved speech recognition for audiovisual stimulation comes at a cost, however, as the cochlear-implant users now need to listen and speech-read simultaneously, paying attention to both modalities. The data suggests that cochlear-implant users run into the limits of their attentional resources, and we argue that they, unlike normal-hearing individuals, always need to consider whether a multisensory benefit outweighs the unisensory cost in everyday environments.


1993 ◽  
Vol 36 (2) ◽  
pp. 431-436 ◽  
Author(s):  
Brian E. Walden ◽  
Debra A. Busacco ◽  
Allen A. Montgomery

The benefit derived from visual cues in auditory-visual speech recognition and patterns of auditory and visual consonant confusions were compared for 20 middle-aged and 20 elderly men who were moderately to severely hearing impaired. Consonant-vowel nonsense syllables and CID sentences were presented to the subjects under auditory-only, visual-only, and auditory-visual test conditions. Benefit was defined as the difference between the scores in the auditory-only and auditory-visual conditions. The results revealed that the middle-aged and elderly subjects obtained similar benefit from visual cues in auditory-visual speech recognition. Further, patterns of consonant confusions were similar for the two groups.


2009 ◽  
pp. 128-148
Author(s):  
Eric Petajan

Automatic Speech Recognition (ASR) is the most natural input modality from humans to machines. When the hands are busy or a full keyboard is not available, speech input is especially in demand. Since the most compelling application scenarios for ASR include noisy environments (mobile phones, public kiosks, cars), visual speech processing must be incorporated to provide robust performance. This chapter motivates and describes the MPEG-4 Face and Body Animation (FBA) standard for representing visual speech data as part of a whole virtual human specification. The super low bit-rate FBA codec included with the standard enables thin clients to access processing and communication services over any network including enhanced visual communication, animated entertainment, man-machine dialog, and audio/visual speech recognition.


Author(s):  
Preety Singh ◽  
Vijay Laxmi ◽  
M. S. Gaur

Audio-Visual Speech Recognition (AVSR) is an emerging technology that helps in improved machine perception of speech by taking into account the bimodality of human speech. Automated speech is inspired from the fact that human beings subconsciously use visual cues to interpret speech. This chapter surveys the techniques for audio-visual speech recognition. Through this survey, the authors discuss the steps involved in a robust mechanism for perception of speech for human-computer interaction. The main emphasis is on visual speech recognition taking only the visual cues into account. Previous research has shown that visual-only speech recognition systems pose many challenges. The authors present a speech recognition system where only the visual modality is used for recognition of the spoken word. Significant features are extracted from lip images. These features are used to build n-gram feature vectors. Classification of speech using these modified feature vectors results in improved accuracy of the spoken word.


Author(s):  
Guillaume Gravier ◽  
Gerasimos Potamianos ◽  
Chalapathy Neti

Sign in / Sign up

Export Citation Format

Share Document