Tuning of Acoustic Modeling and Adaptation Technique for a Real Speech Recognition Task

A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0372 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1261-1274 ◽

Cited By ~ 5

Author(s):

Vishal Passricha ◽

Rajesh Kumar Aggarwal

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Speech Signal ◽

Short Term Memory ◽

Recognition Rate ◽

Recognition Task ◽

Acoustic Modeling ◽

Hybrid Architecture ◽

Continuous Speech Recognition ◽

Temporal Properties

Abstract Deep neural networks (DNNs) have been playing a significant role in acoustic modeling. Convolutional neural networks (CNNs) are the advanced version of DNNs that achieve 4–12% relative gain in the word error rate (WER) over DNNs. Existence of spectral variations and local correlations in speech signal makes CNNs more capable of speech recognition. Recently, it has been demonstrated that bidirectional long short-term memory (BLSTM) produces higher recognition rate in acoustic modeling because they are adequate to reinforce higher-level representations of acoustic data. Spatial and temporal properties of the speech signal are essential for high recognition rate, so the concept of combining two different networks came into mind. In this paper, a hybrid architecture of CNN-BLSTM is proposed to appropriately use these properties and to improve the continuous speech recognition task. Further, we explore different methods like weight sharing, the appropriate number of hidden units, and ideal pooling strategy for CNN to achieve a high recognition rate. Specifically, the focus is also on how many BLSTM layers are effective. This paper also attempts to overcome another shortcoming of CNN, i.e. speaker-adapted features, which are not possible to be directly modeled in CNN. Next, various non-linearities with or without dropout are analyzed for speech tasks. Experiments indicate that proposed hybrid architecture with speaker-adapted features and maxout non-linearity with dropout idea shows 5.8% and 10% relative decrease in WER over the CNN and DNN systems, respectively.

Download Full-text

Review on Acoustic Modeling for Continuous Speech Recognition

i-manager s Journal on Digital Signal Processing ◽

10.26634/jdp.2.4.3145 ◽

2014 ◽

Vol 2 (4) ◽

pp. 30-33

Author(s):

R. Mohan ◽

◽

M. Kalamani ◽

Keyword(s):

Speech Recognition ◽

Acoustic Modeling ◽

Continuous Speech ◽

Continuous Speech Recognition

Download Full-text

Improved acoustic modeling for continuous speech recognition

10.3115/116580.116686 ◽

1990 ◽

Cited By ~ 12

Author(s):

C.-H. Lee ◽

E. Giachin ◽

L. R. Rabiner ◽

R. Pieraccini ◽

A. E. Rosenberg

Keyword(s):

Speech Recognition ◽

Acoustic Modeling ◽

Continuous Speech ◽

Continuous Speech Recognition

Download Full-text

Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition

10.21236/ada457846 ◽

2004 ◽

Cited By ~ 11

Author(s):

Dimitra Vergyri ◽

Katrin Kirchhoff

Keyword(s):

Speech Recognition ◽

Acoustic Modeling

Download Full-text

Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383628 ◽

2021 ◽

Author(s):

Li Chai ◽

Jun Du ◽

Di-Yuan Liu ◽

Yan-Hui Tu ◽

Chin-Hui Lee

Keyword(s):

Speech Recognition ◽

Acoustic Modeling ◽

Conversational Speech

Download Full-text

What's the difference? comparing humans and machines on the Aurora 2 speech recognition task

10.21437/interspeech.2013-606 ◽

2013 ◽

Author(s):

Bernd T. Meyer

Keyword(s):

Speech Recognition ◽

Recognition Task ◽

The Difference

Download Full-text

Probabilistic speaker-class based acoustic modeling for large vocabulary continuous speech recognition

10.21437/interspeech.2012-376 ◽

2012 ◽

Author(s):

Xiangang Li ◽

Dan Su ◽

Zaihu Pang ◽

Xihong Wu

Keyword(s):

Speech Recognition ◽

Acoustic Modeling ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Large Vocabulary

Download Full-text

Significance of Feature Selection for Acoustic Modeling in Dysarthric Speech Recognition

2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) ◽

10.1109/wispnet.2018.8538531 ◽

2018 ◽

Author(s):

Jerin Baby Mathew ◽

Jonie Jacob ◽

Karun Sajeev ◽

Jithin Joy ◽

Rajeev Rajan

Keyword(s):

Feature Selection ◽

Speech Recognition ◽

Acoustic Modeling ◽

Dysarthric Speech ◽

Selection For

Download Full-text

Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition

Speech Communication ◽

10.1016/j.specom.2005.01.004 ◽

2005 ◽

Vol 46 (1) ◽

pp. 37-51 ◽

Cited By ~ 23

Author(s):

Katrin Kirchhoff ◽

Dimitra Vergyri

Keyword(s):

Speech Recognition ◽

Data Sharing ◽

Acoustic Modeling ◽

Arabic Speech Recognition

Download Full-text

Multitasking with typical use of hearing aid noise reduction in older listeners

10.31234/osf.io/bhq2j ◽

2018 ◽

Author(s):

Tim Schoof ◽

Pamela Souza

Keyword(s):

Speech Recognition ◽

Noise Reduction ◽

Hearing Aids ◽

Recognition Task ◽

Hearing Impaired ◽

Improve Performance ◽

Sentence Recognition ◽

Monitoring Task ◽

Speech In Noise ◽

Dual Task Paradigm

Objective: Older hearing-impaired adults typically experience difficulties understanding speech in noise. Most hearing aids address this issue using digital noise reduction. While noise reduction does not necessarily improve speech recognition, it may reduce the resources required to process the speech signal. Those available resources may, in turn, aid the ability to perform another task while listening to speech (i.e., multitasking). This study examined to what extent changing the strength of digital noise reduction in hearing aids affects the ability to multitask. Design: Multitasking was measured using a dual-task paradigm, combining a speech recognition task and a visual monitoring task. The speech recognition task involved sentence recognition in the presence of six-talker babble at signal-to-noise ratios (SNRs) of 2 and 7 dB. Participants were fit with commercially-available hearing aids programmed under three noise reduction settings: off, mild, strong. Study sample: 18 hearing-impaired older adults. Results: There were no effects of noise reduction on the ability to multitask, or on the ability to recognize speech in noise. Conclusions: Adjustment of noise reduction settings in the clinic may not invariably improve performance for some tasks.

Download Full-text