A language model for Amdo Tibetan speech recognition

MATEC Web of Conferences ◽

10.1051/matecconf/202133606016 ◽

2021 ◽

Vol 336 ◽

pp. 06016

Author(s):

Taiben Suan ◽

Rangzhuoma Cai ◽

Zhijie Cai ◽

Ba Zu ◽

Baojia Gong

Keyword(s):

Speech Recognition ◽

Network Architecture ◽

Language Model ◽

Acoustic Model ◽

End To End

We built a language model which is based on Transformer network architecture, used attention mechanisms to dispensing with recurrence and convalutions entirely. Through the transliteration of Tibetan to International Phonetic Alphabets, the language model was trained using the syllables and phonemes of the Tibetan word as modeling units to predict corresponding Tibetan sentences according to the context semantics of IPA. And it combined with the acoustic model as the Tibetan speech recognition was compared with end-to-end Tibetan speech recognition.

Download Full-text

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383515 ◽

2021 ◽

Author(s):

Zhong Meng ◽

Sarangarajan Parthasarathy ◽

Eric Sun ◽

Yashesh Gaur ◽

Naoyuki Kanda ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Estimation ◽

End To End

Download Full-text

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683602 ◽

2019 ◽

Cited By ~ 5

Author(s):

Alexander H. Liu ◽

Hung-yi Lee ◽

Lin-shan Lee

Keyword(s):

Speech Recognition ◽

Language Model ◽

Adversarial Training ◽

End To End

Download Full-text

Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6639094 ◽

2013 ◽

Cited By ~ 5

Author(s):

Ying Li ◽

Pascale Fung

Keyword(s):

Speech Recognition ◽

Language Model ◽

Acoustic Model ◽

Mixed Language

Download Full-text

Research on the Language Model according to the Recognition Unit for End-to-End Speech Recognition

KIISE Transactions on Computing Practices ◽

10.5626/ktcp.2021.27.6.255 ◽

2021 ◽

Vol 27 (6) ◽

pp. 255-262

Author(s):

Hyungbae Jeon ◽

Byung Ok Kang ◽

Hoon Chung ◽

Yoo Rhee Oh ◽

Yun Kyung Lee ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

End To End

Download Full-text

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

10.21437/interspeech.2021-2075 ◽

2021 ◽

Author(s):

Zhong Meng ◽

Yu Wu ◽

Naoyuki Kanda ◽

Liang Lu ◽

Xie Chen ◽

...

Keyword(s):

Speech Recognition ◽

Error Rate ◽

Language Model ◽

Word Error Rate ◽

Model Fusion ◽

End To End

Download Full-text

Building Acoustic and Language Model for Continuous Speech Recognition in Bahasa Indonesia

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v6i2.2684 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Vincent Elbert Budiman ◽

Andreas Widjaja

Keyword(s):

Speech Recognition ◽

Error Rate ◽

Language Model ◽

Beam Width ◽

Acoustic Model ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Word Error Rate ◽

Testing Data ◽

Bahasa Indonesia

Here a development of an Acoustic and Language Model is presented. Low Word Error Rate is an early good sign of a good Language and Acoustic Model. Although there are still parameters other than Words Error Rate, our work focused on building Bahasa Indonesia with approximately 2000 common words and achieved the minimum threshold of 25% Word Error Rate. There were several experiments consist of different cases, training data, and testing data with Word Error Rate and Testing Ratio as the main comparison. The language and acoustic model were built using Sphinx4 from Carnegie Mellon University using Hidden Markov Model for the acoustic model and ARPA Model for the language model. The models configurations, which are Beam Width and Force Alignment, directly correlates with Word Error Rate. The configurations were set to 1e-80 for Beam Width and 1e-60 for Force Alignment to prevent underfitting or overfitting of the acoustic model. The goals of this research are to build continuous speech recognition in Bahasa Indonesia which has low Word Error Rate and to determine the optimum numbers of training and testing data which minimize the Word Error Rate.

Download Full-text

Development of Artificial Intelligence and Prospects for Its Application

Economics and Management ◽

10.35854/1998-1627-2021-2-132-138 ◽

2021 ◽

Vol 27 (2) ◽

pp. 132-138

Author(s):

V. Ya. Dmitriev ◽

T. A. Ignat'eva ◽

V. P. Pilyavskiy

Keyword(s):

Artificial Intelligence ◽

Speech Recognition ◽

Information And Communication Technologies ◽

Language Model ◽

Communication Technologies ◽

Conceptual Apparatus ◽

Information And Communication ◽

End To End ◽

Definition Of ◽

The Impact

Aim. To analyze the concept of “artificial intelligence”, to justify the effectiveness of using artificial intelligence technologies.Tasks. To study the conceptual apparatus; to propose and justify the author’s definition of the “artificial intelligence” concept; to describe the technology of speech recognition using artificial intelligence.Methodology. The authors used such general scientific methods of cognition as comparison, deduction and induction, analysis, generalization and systematization.Results. Based on a comparative analysis of the existing conceptual apparatus, it is concluded that there is no single concept of “artificial intelligence”. Each author puts his own vision into it. In this regard, the author’s definition of the “artificial intelligence” concept is formulated. It is determined that an important area of applying artificial intelligence technologies in various fields of activity is speech recognition technology. It is shown that the first commercially successful speech recognition prototypes appeared already by the 1990s, and since the beginning of the 21st century. The great interest in “end-to-end” automatic speech recognition has become obvious. While traditional phonetic approaches have requested pronunciation, acoustic, and language model data, end-to-end models simultaneously consider all components of speech recognition, thereby facilitating the stages of self-learning and development. It is established that a significant increase in the” mental “ capabilities of computer technology and the development of new algorithms have led to new achievements in this direction. These advances are driven by the growing demand for speech recognition.Conclusions. According to the authors, artificial intelligence is a complex of computer programs that duplicate the functions of the human brain, opening up the possibility of informal learning based on big data processing, allowing to solve the problems of pattern recognition (text, image, speech) and the formation of management decisions. Currently, the active development of information and communication technologies and artificial intelligence concepts has led to a wide practical application of intelligent technologies, especially in control systems. The impact of these systems can be found in the work of mobile phones and expert systems, in forecasting and other areas. Among the obstacles to the development of this technology is the lack of accuracy in speech and voice recognition systems in the conditions of sound interference, which is always present in the external environment. However, the recent advances overcome this disadvantage.

Download Full-text

Measuring information provided by language model and acoustic model in probabilistic speech recognition: Theory and experimental results

Speech Communication ◽

10.1016/0167-6393(90)90028-8 ◽

1990 ◽

Vol 9 (5-6) ◽

pp. 531-539 ◽

Cited By ~ 2

Author(s):

Marco Ferretti ◽

Giulio Maltese ◽

Stefano Scarci

Keyword(s):

Speech Recognition ◽

Language Model ◽

Experimental Results ◽

Acoustic Model ◽

Recognition Theory

Download Full-text

A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru46091.2019.9003790 ◽

2019 ◽

Author(s):

Erik McDermott ◽

Hasim Sak ◽

Ehsan Variani

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Density Ratio ◽

Model Fusion ◽

Ratio Approach ◽

End To End

Download Full-text

Encoder-decoder models for recognition of Russian speech

Information and Control Systems ◽

10.31799/1684-8853-2019-4-45-53 ◽

2019 ◽

pp. 45-53

Author(s):

Nikita Markovnikov ◽

Irina Kipyatkova

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recognition Accuracy ◽

Language Model ◽

Hybrid Models ◽

Attention Mechanism ◽

Russian Language ◽

End To End ◽

The Russian Language ◽

Decoding Speed

Problem: Classical systems of automatic speech recognition are traditionally built using an acoustic model based on hidden Markovmodels and a statistical language model. Such systems demonstrate high recognition accuracy, but consist of several independentcomplex parts, which can cause problems when building models. Recently, an end-to-end recognition method has been spread, usingdeep artificial neural networks. This approach makes it easy to implement models using just one neural network. End-to-end modelsoften demonstrate better performance in terms of speed and accuracy of speech recognition. Purpose: Implementation of end-toendmodels for the recognition of continuous Russian speech, their adjustment and comparison with hybrid base models in terms ofrecognition accuracy and computational characteristics, such as the speed of learning and decoding. Methods: Creating an encoderdecodermodel of speech recognition using an attention mechanism; applying techniques of stabilization and regularization of neuralnetworks; augmentation of data for training; using parts of words as an output of a neural network. Results: An encoder-decodermodel was obtained using an attention mechanism for recognizing continuous Russian speech without extracting features or usinga language model. As elements of the output sequence, we used parts of words from the training set. The resulting model could notsurpass the basic hybrid models, but surpassed the other baseline end-to-end models, both in recognition accuracy and in decoding/learning speed. The word recognition error was 24.17% and the decoding speed was 0.3 of the real time, which is 6% faster than thebaseline end-to-end model and 46% faster than the basic hybrid model. We showed that end-to-end models could work without languagemodels for the Russian language, while demonstrating a higher decoding speed than hybrid models. The resulting model was trained onraw data without extracting any features. We found that for the Russian language the hybrid type of an attention mechanism gives thebest result compared to location-based or context-based attention mechanisms. Practical relevance: The resulting models require lessmemory and less speech decoding time than the traditional hybrid models. That fact can allow them to be used locally on mobile deviceswithout using calculations on remote servers.

Download Full-text