Tuning of Acoustic Modeling and Adaptation Technique for a Real Speech Recognition Task

Author(s):  
Jan Vaněk ◽  
Josef Michálek ◽  
Josef Psutka
2019 ◽  
Vol 29 (1) ◽  
pp. 1261-1274 ◽  
Author(s):  
Vishal Passricha ◽  
Rajesh Kumar Aggarwal

Abstract Deep neural networks (DNNs) have been playing a significant role in acoustic modeling. Convolutional neural networks (CNNs) are the advanced version of DNNs that achieve 4–12% relative gain in the word error rate (WER) over DNNs. Existence of spectral variations and local correlations in speech signal makes CNNs more capable of speech recognition. Recently, it has been demonstrated that bidirectional long short-term memory (BLSTM) produces higher recognition rate in acoustic modeling because they are adequate to reinforce higher-level representations of acoustic data. Spatial and temporal properties of the speech signal are essential for high recognition rate, so the concept of combining two different networks came into mind. In this paper, a hybrid architecture of CNN-BLSTM is proposed to appropriately use these properties and to improve the continuous speech recognition task. Further, we explore different methods like weight sharing, the appropriate number of hidden units, and ideal pooling strategy for CNN to achieve a high recognition rate. Specifically, the focus is also on how many BLSTM layers are effective. This paper also attempts to overcome another shortcoming of CNN, i.e. speaker-adapted features, which are not possible to be directly modeled in CNN. Next, various non-linearities with or without dropout are analyzed for speech tasks. Experiments indicate that proposed hybrid architecture with speaker-adapted features and maxout non-linearity with dropout idea shows 5.8% and 10% relative decrease in WER over the CNN and DNN systems, respectively.


Author(s):  
C.-H. Lee ◽  
E. Giachin ◽  
L. R. Rabiner ◽  
R. Pieraccini ◽  
A. E. Rosenberg

2018 ◽  
Author(s):  
Tim Schoof ◽  
Pamela Souza

Objective: Older hearing-impaired adults typically experience difficulties understanding speech in noise. Most hearing aids address this issue using digital noise reduction. While noise reduction does not necessarily improve speech recognition, it may reduce the resources required to process the speech signal. Those available resources may, in turn, aid the ability to perform another task while listening to speech (i.e., multitasking). This study examined to what extent changing the strength of digital noise reduction in hearing aids affects the ability to multitask. Design: Multitasking was measured using a dual-task paradigm, combining a speech recognition task and a visual monitoring task. The speech recognition task involved sentence recognition in the presence of six-talker babble at signal-to-noise ratios (SNRs) of 2 and 7 dB. Participants were fit with commercially-available hearing aids programmed under three noise reduction settings: off, mild, strong. Study sample: 18 hearing-impaired older adults. Results: There were no effects of noise reduction on the ability to multitask, or on the ability to recognize speech in noise. Conclusions: Adjustment of noise reduction settings in the clinic may not invariably improve performance for some tasks.


Sign in / Sign up

Export Citation Format

Share Document