Dynamic out-of-vocabulary word registration to language model for speech recognition

AbstractWe propose a method of dynamically registering out-of-vocabulary (OOV) words by assigning the pronunciations of these words to pre-inserted OOV tokens, editing the pronunciations of the tokens. To do this, we add OOV tokens to an additional, partial copy of our corpus, either randomly or to part-of-speech (POS) tags in the selected utterances, when training the language model (LM) for speech recognition. This results in an LM containing OOV tokens, to which we can assign pronunciations. We also investigate the impact of acoustic complexity and the “natural” occurrence frequency of OOV words on the recognition of registered OOV words. The proposed OOV word registration method is evaluated using two modern automatic speech recognition (ASR) systems, Julius and Kaldi, using DNN-HMM acoustic models and N-gram language models (plus an additional evaluation using RNN re-scoring with Kaldi). Our experimental results show that when using the proposed OOV registration method, modern ASR systems can recognize OOV words without re-training the language model, that the acoustic complexity of OOV words affects OOV recognition, and that differences between the “natural” and the assigned occurrence frequencies of OOV words have little impact on the final recognition results.

Download Full-text

N-gram adaptation using Dirichlet class language model based on part-of-speech for speech recognition

2013 21st Iranian Conference on Electrical Engineering (ICEE) ◽

10.1109/iraniancee.2013.6599642 ◽

2013 ◽

Cited By ~ 4

Author(s):

Ali Hatami ◽

Ahmad Akbari ◽

Babak Nasersharif

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Based ◽

Part Of Speech ◽

N Gram

Download Full-text

Usage of Combinational Acoustic Models (DNN-HMM and SGMM) and Identifying the Impact of Language Models in Sinhala Speech Recognition

2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter51097.2020.9325439 ◽

2020 ◽

Author(s):

Buddhi Gamage ◽

Randil Pushpananda ◽

Ruvan Weerasinghe ◽

Thilini Nadungodage

Keyword(s):

Speech Recognition ◽

Language Models ◽

Acoustic Models ◽

The Impact

Download Full-text

Application of Morphosyntactic and Class-Based Language Models in Automatic Speech Recognition of Polish

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500068 ◽

2016 ◽

Vol 25 (02) ◽

pp. 1650006

Author(s):

Aleksander Smywinski-Pohl ◽

Bartosz Ziółko

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Language Models ◽

Clustering Method ◽

Training Corpus ◽

Model Based ◽

N Gram ◽

Better Than

In this paper we investigate the usefulness of morphosyntactic information as well as clustering in modeling Polish for automatic speech recognition. Polish is an inflectional language, thus we investigate the usefulness of an N-gram model based on morphosyntactic features. We present how individual types of features influence the model and which types of features are best suited for building a language model for automatic speech recognition. We compared the results of applying them with a class-based model that is automatically derived from the training corpus. We show that our approach towards clustering performs significantly better than frequently used SRI LM clustering method. However, this difference is apparent only for smaller corpora.

Download Full-text

INTEGRATION OF n-GRAM LANGUAGE MODELS IN MULTIPLE CLASSIFIER SYSTEMS FOR OFFLINE HANDWRITTEN TEXT LINE RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001408006855 ◽

2008 ◽

Vol 22 (07) ◽

pp. 1301-1321 ◽

Cited By ~ 2

Author(s):

ROMAN BERTOLAMI ◽

HORST BUNKE

Keyword(s):

Language Model ◽

Language Models ◽

Combination Method ◽

Text Line ◽

Multiple Classifier Systems ◽

Classifier Systems ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Multiple Classifier ◽

N Gram

Current multiple classifier systems for unconstrained handwritten text recognition do not provide a straightforward way to utilize language model information. In this paper, we describe a generic method to integrate a statistical n-gram language model into the combination of multiple offline handwritten text line recognizers. The proposed method first builds a word transition network and then rescores this network with an n-gram language model. Experimental evaluation conducted on a large dataset of offline handwritten text lines shows that the proposed approach improves the recognition accuracy over a reference system as well as over the original combination method that does not include a language model.

Download Full-text

Combination of random indexing based language model and n-gram language model for speech recognition

10.21437/interspeech.2013-525 ◽

2013 ◽

Author(s):

Dominique Fohr ◽

Odile Mella

Keyword(s):

Speech Recognition ◽

Language Model ◽

Random Indexing ◽

N Gram

Download Full-text

The impact of vocabulary size and language model order on the polish whispery speech recognition

2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR) ◽

10.1109/mmar.2017.8046899 ◽

2017 ◽

Author(s):

Piotr Kozierski ◽

Talar Sadalla ◽

Szymon Drgas ◽

Adam Dabrowski ◽

Joanna Zietkiewicz

Keyword(s):

Speech Recognition ◽

Language Model ◽

Vocabulary Size ◽

Model Order ◽

The Impact

Download Full-text

Ways to Improve N-Gram Language Models for OCR and Speech Recognition of Slavic Languages

The Advanced Science Journal ◽

10.15550/asj.2014.04.065 ◽

2014 ◽

Vol 2014 (4) ◽

pp. 65-69

Author(s):

V Taranukha ◽

Keyword(s):

Speech Recognition ◽

Language Models ◽

Slavic Languages ◽

N Gram

Download Full-text

A New Word Clustering Method for Building N-Gram Language Models in Continuous Speech Recognition Systems

New Frontiers in Applied Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-69052-8_30 ◽

2008 ◽

pp. 286-293 ◽

Cited By ~ 3

Author(s):

Mohammad Bahrani ◽

Hossein Sameti ◽

Nazila Hafezi ◽

Saeedeh Momtazi

Keyword(s):

Speech Recognition ◽

Language Models ◽

Clustering Method ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Recognition Systems ◽

Word Clustering ◽

N Gram

Download Full-text

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla

Ingeniería ◽

10.14483/23448393.11616 ◽

2017 ◽

Vol 22 (3) ◽

pp. 362 ◽

Cited By ~ 1

Author(s):

Juan David Celis Nuñez ◽

Rodrigo Andres Llanos Castro ◽

Byron Medina Delgado ◽

Sergio Basilio Sepúlveda Mora ◽

Sergio Alexander Castro Casadiego

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Spanish Language ◽

San Jose ◽

Training Process ◽

Accuracy Rate ◽

Statistical Language Model ◽

Acoustic Models ◽

The Voice

Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.

Download Full-text

N-gram Language Model Based on Multi-Word Expressions in Web Documents for Speech Recognition and Closed-Captioning

2012 International Conference on Asian Language Processing ◽

10.1109/ialp.2012.55 ◽

2012 ◽

Cited By ~ 4

Author(s):

Shinya Takahashi ◽

Tsuyoshi Morimoto

Keyword(s):

Speech Recognition ◽

Language Model ◽

Web Documents ◽

Closed Captioning ◽

Model Based ◽

N Gram

Download Full-text