Open Source German Distant Speech Recognition: Corpus and Acoustic Model

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.

Download Full-text

The Implementation of a Vocabulary and Grammar for an Open-Source Speech-Recognition Programming Platform

Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility - ASSETS '15 ◽

10.1145/2700648.2811346 ◽

2015 ◽

Author(s):

Jean K. Rodriguez-Cartagena ◽

Andrea C. Claudio-Palacios ◽

Natalia Pacheco-Tallaj ◽

Valerie Santiago González ◽

Patricia Ordonez-Franco

Keyword(s):

Speech Recognition ◽

Open Source

Download Full-text

Gaussian map based acoustic model adaptation using untranscribed data for speech recognition in severely adverse environments

10.21437/interspeech.2012-481 ◽

2012 ◽

Author(s):

Wooil Kim ◽

John H. L. Hansen

Keyword(s):

Speech Recognition ◽

Acoustic Model ◽

Model Adaptation ◽

Gaussian Map ◽

Adverse Environments

Download Full-text

Joint training of speech separation, filterbank and acoustic model for robust automatic speech recognition

10.21437/interspeech.2015-597 ◽

2015 ◽

Author(s):

Zhong-Qiu Wang ◽

DeLiang Wang

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Acoustic Model ◽

Speech Separation ◽

Joint Training

Download Full-text

On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition

10.21437/interspeech.2007-100 ◽

2007 ◽

Author(s):

Luis Buera ◽

Antonio Miguel ◽

Eduardo Lleida ◽

Óscar Saz ◽

Alfonso Ortega

Keyword(s):

Speech Recognition ◽

Feature Vector ◽

Robust Speech Recognition ◽

Acoustic Model

Download Full-text

A language model for Amdo Tibetan speech recognition

MATEC Web of Conferences ◽

10.1051/matecconf/202133606016 ◽

2021 ◽

Vol 336 ◽

pp. 06016

Author(s):

Taiben Suan ◽

Rangzhuoma Cai ◽

Zhijie Cai ◽

Ba Zu ◽

Baojia Gong

Keyword(s):

Speech Recognition ◽

Network Architecture ◽

Language Model ◽

Acoustic Model ◽

End To End

We built a language model which is based on Transformer network architecture, used attention mechanisms to dispensing with recurrence and convalutions entirely. Through the transliteration of Tibetan to International Phonetic Alphabets, the language model was trained using the syllables and phonemes of the Tibetan word as modeling units to predict corresponding Tibetan sentences according to the context semantics of IPA. And it combined with the acoustic model as the Tibetan speech recognition was compared with end-to-end Tibetan speech recognition.

Download Full-text

A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/513 ◽

2020 ◽

Author(s):

Conghui Tan ◽

Di Jiang ◽

Jinhua Peng ◽

Xueyang Wu ◽

Qian Xu ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

De Novo ◽

Superior Performance ◽

Acoustic Model ◽

Acoustic Models ◽

Public Data ◽

Speech Data ◽

Low Efficiency ◽

Novel Algorithms

Due to the rising awareness of privacy protection and the voluminous scale of speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers to train the acoustic model with complete data as before. In this paper, we propose a novel Divide-and-Merge paradigm to solve salient problems plaguing the ASR field. In the Divide phase, multiple acoustic models are trained based upon different subsets of the complete speech data, while in the Merge phase two novel algorithms are utilized to generate a high-quality acoustic model based upon those trained on data subsets. We first propose the Genetic Merge Algorithm (GMA), which is a highly specialized algorithm for optimizing acoustic models but suffers from low efficiency. We further propose the SGD-Based Optimizational Merge Algorithm (SOMA), which effectively alleviates the efficiency bottleneck of GMA and maintains superior performance. Extensive experiments on public data show that the proposed methods can significantly outperform the state-of-the-art.

Download Full-text