High-performance robust speech recognition using stereo training data

Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale are still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6.5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WER

Download Full-text

Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition

10.21437/interspeech.2017-611 ◽

2017 ◽

Author(s):

Bi-Cheng Yan ◽

Chin-Hong Shih ◽

Shih-Hung Liu ◽

Berlin Chen

Keyword(s):

Speech Recognition ◽

Robust Speech Recognition ◽

Low Dimensional

Download Full-text

Toward Robust Speech Recognition and Understanding

The Journal of VLSI Signal Processing Systems for Signal Image and Video Technology ◽

10.1007/s11265-005-4149-x ◽

2005 ◽

Vol 41 (3) ◽

pp. 245-254 ◽

Cited By ~ 2

Author(s):

Sadaoki Furui

Keyword(s):

Speech Recognition ◽

Robust Speech Recognition

Download Full-text

Deep bidirectional neural networks for robust speech recognition under heavy background noise

Materials Today Proceedings ◽

10.1016/j.matpr.2021.02.640 ◽

2021 ◽

Author(s):

Jeevan Reddy Koya ◽

S.P. Venu Madhava Rao

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Background Noise ◽

Robust Speech Recognition

Download Full-text

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Applied Sciences ◽

10.3390/app11062866 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2866

Author(s):

Damheo Lee ◽

Donghyun Kim ◽

Seung Yun ◽

Sanghun Kim

Keyword(s):

Speech Recognition ◽

Language Model ◽

Reduction Rate ◽

Code Switching ◽

Training Data ◽

Target Domain ◽

Phonetic Variation ◽

Language Model Adaptation ◽

Imbalanced Training Data ◽

Lm Adaptation

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

Estimating the phase volume fraction of multi-phase steel via unsupervised deep learning

Scientific Reports ◽

10.1038/s41598-021-85407-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sung Wook Kim ◽

Seong-Hoon Kang ◽

Se-Jong Kim ◽

Seungchul Lee

Keyword(s):

High Performance ◽

Materials Science ◽

Volume Fraction ◽

Training Data ◽

Phase Fraction ◽

Phase Volume ◽

Generative Adversarial Network ◽

Phase Volume Fraction ◽

Original Dataset ◽

Multi Phase

AbstractAdvanced high strength steel (AHSS) is a steel of multi-phase microstructure that is processed under several conditions to meet the current high-performance requirements from the industry. Deep neural network (DNN) has emerged as a promising tool in materials science for the task of estimating the phase volume fraction of these steels. Despite its advantages, one of its major drawbacks is its requirement of a sufficient amount of training data with correct labels to the network. This often comes as a challenge in many areas where obtaining data and labeling it is extremely labor-intensive. To overcome this challenge, an unsupervised way of learning DNN, which does not require any manual labeling, is proposed. Information maximizing generative adversarial network (InfoGAN) is used to learn the underlying probability distribution of each phase and generate realistic sample points with class labels. Then, the generated data is used for training an MLP classifier, which in turn predicts the labels for the original dataset. The result shows a mean relative error of 4.53% at most, while it can be as low as 0.73%, which implies the estimated phase fraction closely matches the true phase fraction. This presents the high feasibility of using the proposed methodology for fast and precise estimation of phase volume fraction in both industry and academia.

Download Full-text

Performance evaluation of front-end algorithms for robust speech recognition

Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005. ◽

10.1109/isspa.2005.1581037 ◽

2006 ◽

Cited By ~ 4

Author(s):

O. Cheng ◽

W. Abdulla ◽

Z. Salcic

Keyword(s):

Performance Evaluation ◽

Speech Recognition ◽

Robust Speech Recognition ◽

Front End

Download Full-text

Analysis of CFA-BF: Novel combined fixed/adaptive beamforming for robust speech recognition in real car environments

Speech Communication ◽

10.1016/j.specom.2009.09.001 ◽

2010 ◽

Vol 52 (2) ◽

pp. 134-149 ◽

Cited By ~ 6

Author(s):

John H.L. Hansen ◽

Xianxian Zhang

Keyword(s):

Speech Recognition ◽

Adaptive Beamforming ◽

Robust Speech Recognition

Download Full-text

High-performance robust speech recognition using stereo training data

Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models

Investigating the impact of the training data volume for robust speech recognition using multi-task learning

DEVELOPMENT OF HIGH-PERFORMANCE AND LARGE-SCALE VIETNAMESE AUTOMATIC SPEECH RECOGNITION SYSTEMS

Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition

Toward Robust Speech Recognition and Understanding

Deep bidirectional neural networks for robust speech recognition under heavy background noise

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Estimating the phase volume fraction of multi-phase steel via unsupervised deep learning

Performance evaluation of front-end algorithms for robust speech recognition

Analysis of CFA-BF: Novel combined fixed/adaptive beamforming for robust speech recognition in real car environments

Export Citation Format