A study on deep neural network acoustic model adaptation for robust far-field speech recognition

Mapping Intimacies ◽

10.21437/interspeech.2015-525 ◽

2015 ◽

Author(s):

Seyedmahdad Mirsamadi ◽

John H. L. Hansen

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Far Field ◽

Acoustic Model ◽

Model Adaptation

Download Full-text

Geo-location dependent deep neural network acoustic model for speech recognition

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472803 ◽

2016 ◽

Author(s):

Guoli Ye ◽

Chaojun Liu ◽

Yifan Gong

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Download Full-text

Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition

Neural Networks ◽

10.1016/j.neunet.2021.04.017 ◽

2021 ◽

Author(s):

Guanjun Li ◽

Shan Liang ◽

Shuai Nie ◽

Wenju Liu ◽

Zhanlei Yang

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Far Field ◽

Generalized Sidelobe Canceller ◽

Dual Channel ◽

Sidelobe Canceller

Download Full-text

Acoustic landmarks contain more information about the phone string than other frames for automatic speech recognition with deep neural network acoustic model

The Journal of the Acoustical Society of America ◽

10.1121/1.5039837 ◽

2018 ◽

Vol 143 (6) ◽

pp. 3207-3219 ◽

Author(s):

Di He ◽

Boon Pang Lim ◽

Xuesong Yang ◽

Mark Hasegawa-Johnson ◽

Deming Chen

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Deep Neural Network ◽

Download Full-text

Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector

Applied Sciences ◽

10.3390/app10124091 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4091 ◽

Author(s):

Yoo Rhee Oh ◽

Kiyoung Park ◽

Jeon Gyu Park

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Computation Method ◽

Acoustic Model ◽

Voice Activity Detector ◽

Context Sensitive ◽

Speech Features ◽

Voice Activity ◽

This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech recognition, including the proposed multichannel parallel acoustic score computation of a BLSTM acoustic model, the proposed deep neural network-based voice activity detector, and Viterbi decoding. The proposed acoustic score computation method estimates the acoustic scores of a context-sensitive-chunk BLSTM acoustic model for the batched speech features from concurrent clients, using the worker thread. The proposed deep neural network-based voice activity detector detects short pauses in the utterance to reduce response latency, while the user utters long sentences. From the experiments of Korean speech recognition, the number of concurrent clients is increased from 22 to 44 using the proposed acoustic score computation. When combined with the frame skipping method, the number is further increased up to 59 clients with a small accuracy degradation. Moreover, the average user-perceived latency is reduced from 11.71 s to 3.09–5.41 s by using the proposed deep neural network-based voice activity detector.

Download Full-text

Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation

10.21437/interspeech.2014-497 ◽

2014 ◽

Author(s):

Yan Huang ◽

Dong Yu ◽

Chaojun Liu ◽

Yifan Gong

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Acoustic Model ◽

Model Adaptation ◽

Regularized Model

Download Full-text

Hybrid Deep Neural Network Acoustic Model for Taiwanese Speech Recognition

2020 8th International Conference on Orange Technology (ICOT) ◽

10.1109/icot51877.2020.9468762 ◽

2020 ◽

Author(s):

Che-Wen Chen ◽

Yu-Fu Yeh ◽

Chun-Liang Lin ◽

Shih-Pang Tseng ◽

Jhing-Fa Wang

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Download Full-text

I-vector based deep neural network acoustic model adaptation using multilingual language resource

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) ◽

10.1109/apsipa.2016.7820698 ◽

2016 ◽

Author(s):

Haihua Xu ◽

Wei Rao ◽

Xiong Xiao ◽

Hao Huang ◽

Eng-Siong Chng ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Acoustic Model ◽

Model Adaptation ◽

Language Resource

Download Full-text

Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition

10.21437/interspeech.2016-919 ◽

2016 ◽

Author(s):

Satoshi Tsujioka ◽

Sakriani Sakti ◽

Koichiro Yoshino ◽

Graham Neubig ◽

Satoshi Nakamura

Keyword(s):

Speech Recognition ◽

Joint Estimation ◽

Acoustic Model ◽

Model Adaptation ◽

Download Full-text

A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition

10.21437/interspeech.2012-9 ◽

2012 ◽

Author(s):

Yeming Xiao ◽

Zhen Zhang ◽

Shang Cai ◽

Jielin Pan ◽

Yonghong Yan

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Initial Attempt ◽

Specific Adaptation ◽

Large Vocabulary

Download Full-text

Gaussian map based acoustic model adaptation using untranscribed data for speech recognition in severely adverse environments

10.21437/interspeech.2012-481 ◽

2012 ◽

Author(s):

Wooil Kim ◽

John H. L. Hansen

Keyword(s):

Speech Recognition ◽

Acoustic Model ◽

Model Adaptation ◽

Gaussian Map ◽

Adverse Environments

Download Full-text