DISCRIMINATIVE LEARNING OF VISUAL DATA FOR AUDIOVISUAL SPEECH RECOGNITION

1999 ◽  
Vol 08 (01) ◽  
pp. 43-52 ◽  
Author(s):  
ALEXANDRINA ROGOZAN

In recent years a number of techniques have been proposed to improve the accuracy and the robustness of automatic speech recognition in noisy environments. Among these, suplementing the acoustic information with visual data, mostly extracted from speaker's lip shapes, has been proved to be successful. We have already demonstrated the effectiveness of integrating visual data at two different levels during speech decoding according to both direct and separate identification strategies (DI+SI). This paper outlines methods for reinforcing the visible speech recognition in the framework of separate identification. First, we define visual-specific units using a self-organizing mapping technique. Second, we complete a stochastic learning of these units with a discriminative neural-network-based technique for speech recognition purposes. Finally, we show on a connected-letter speech recognition task that using these methods improves performances of the DI+SI based system under varying noise-level conditions.

2021 ◽  
Vol 15 ◽  
Author(s):  
Luuk P. H. van de Rijt ◽  
A. John van Opstal ◽  
Marc M. van Wanrooij

The cochlear implant (CI) allows profoundly deaf individuals to partially recover hearing. Still, due to the coarse acoustic information provided by the implant, CI users have considerable difficulties in recognizing speech, especially in noisy environments. CI users therefore rely heavily on visual cues to augment speech recognition, more so than normal-hearing individuals. However, it is unknown how attention to one (focused) or both (divided) modalities plays a role in multisensory speech recognition. Here we show that unisensory speech listening and reading were negatively impacted in divided-attention tasks for CI users—but not for normal-hearing individuals. Our psychophysical experiments revealed that, as expected, listening thresholds were consistently better for the normal-hearing, while lipreading thresholds were largely similar for the two groups. Moreover, audiovisual speech recognition for normal-hearing individuals could be described well by probabilistic summation of auditory and visual speech recognition, while CI users were better integrators than expected from statistical facilitation alone. Our results suggest that this benefit in integration comes at a cost. Unisensory speech recognition is degraded for CI users when attention needs to be divided across modalities. We conjecture that CI users exhibit an integration-attention trade-off. They focus solely on a single modality during focused-attention tasks, but need to divide their limited attentional resources in situations with uncertainty about the upcoming stimulus modality. We argue that in order to determine the benefit of a CI for speech recognition, situational factors need to be discounted by presenting speech in realistic or complex audiovisual environments.


2004 ◽  
Author(s):  
Martin Graciarena ◽  
Federico Cesari ◽  
Horacio Franco ◽  
Greg Myers ◽  
Cregg Cowan ◽  
...  

2018 ◽  
Vol 39 (04) ◽  
pp. 349-363 ◽  
Author(s):  
Eric Hoover ◽  
Pamela Souza

AbstractSubstantial loss of cochlear function is required to elevate pure-tone thresholds to the severe hearing loss range; yet, individuals with severe or profound hearing loss continue to rely on hearing for communication. Despite the impairment, sufficient information is encoded at the periphery to make acoustic hearing a viable option. However, the probability of significant cochlear and/or neural damage associated with the loss has consequences for sound perception and speech recognition. These consequences include degraded frequency selectivity, which can be assessed with tests including psychoacoustic tuning curves and broadband rippled stimuli. Because speech recognition depends on the ability to resolve frequency detail, a listener with severe hearing loss is likely to have impaired communication in both quiet and noisy environments. However, the extent of the impairment varies widely among individuals. A better understanding of the fundamental abilities of listeners with severe and profound hearing loss and the consequences of those abilities for communication can support directed treatment options in this population.


2018 ◽  
Vol 22 (1) ◽  
pp. 47-58 ◽  
Author(s):  
M. Kalamani ◽  
M. Krishnamoorthi ◽  
R. S. Valarmathi

2018 ◽  
Author(s):  
Tim Schoof ◽  
Pamela Souza

Objective: Older hearing-impaired adults typically experience difficulties understanding speech in noise. Most hearing aids address this issue using digital noise reduction. While noise reduction does not necessarily improve speech recognition, it may reduce the resources required to process the speech signal. Those available resources may, in turn, aid the ability to perform another task while listening to speech (i.e., multitasking). This study examined to what extent changing the strength of digital noise reduction in hearing aids affects the ability to multitask. Design: Multitasking was measured using a dual-task paradigm, combining a speech recognition task and a visual monitoring task. The speech recognition task involved sentence recognition in the presence of six-talker babble at signal-to-noise ratios (SNRs) of 2 and 7 dB. Participants were fit with commercially-available hearing aids programmed under three noise reduction settings: off, mild, strong. Study sample: 18 hearing-impaired older adults. Results: There were no effects of noise reduction on the ability to multitask, or on the ability to recognize speech in noise. Conclusions: Adjustment of noise reduction settings in the clinic may not invariably improve performance for some tasks.


Sign in / Sign up

Export Citation Format

Share Document