Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition

Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.

Download Full-text

A Tree-Based Statistical Language Model for Natural Language Speech Recognition

Readings in Speech Recognition ◽

10.1016/b978-0-08-051584-7.50046-2 ◽

1990 ◽

pp. 507-514 ◽

Cited By ~ 5

Author(s):

LALIT R. BAHL ◽

PETER F. BROWN ◽

PETER V. DE SOUZA ◽

ROBERT L. MERCER

Keyword(s):

Speech Recognition ◽

Natural Language ◽

Language Model ◽

Statistical Language Model

Download Full-text

A tree-based statistical language model for natural language speech recognition

IEEE Transactions on Acoustics Speech and Signal Processing ◽

10.1109/29.32278 ◽

1989 ◽

Vol 37 (7) ◽

pp. 1001-1008 ◽

Cited By ~ 101

Author(s):

L.R. Bahl ◽

P.F. Brown ◽

P.V. de Souza ◽

R.L. Mercer

Keyword(s):

Speech Recognition ◽

Natural Language ◽

Language Model ◽

Statistical Language Model

Download Full-text

Speech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition

Information Retrieval Techniques for Speech Applications - Lecture Notes in Computer Science ◽

10.1007/3-540-45637-6_9 ◽

2002 ◽

pp. 94-104 ◽

Cited By ~ 7

Author(s):

Atsushi Fujii ◽

Katunobu Itou ◽

Tetsuya Ishikawa

Keyword(s):

Speech Recognition ◽

Language Model ◽

Text Retrieval ◽

Model Adaptation ◽

Statistical Language Model ◽

Language Model Adaptation

Download Full-text

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383515 ◽

2021 ◽

Author(s):

Zhong Meng ◽

Sarangarajan Parthasarathy ◽

Eric Sun ◽

Yashesh Gaur ◽

Naoyuki Kanda ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Estimation ◽

End To End

Download Full-text

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Applied Sciences ◽

10.3390/app11062866 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2866

Author(s):

Damheo Lee ◽

Donghyun Kim ◽

Seung Yun ◽

Sanghun Kim

Keyword(s):

Speech Recognition ◽

Language Model ◽

Reduction Rate ◽

Code Switching ◽

Training Data ◽

Target Domain ◽

Phonetic Variation ◽

Language Model Adaptation ◽

Imbalanced Training Data ◽

Lm Adaptation

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text