scholarly journals Location-Based End-to-End Speech Recognition with Multiple Language Models

Author(s):  
Zhijie Lin ◽  
Kaiyang Lin ◽  
Shiling Chen ◽  
Linlin Li ◽  
Zhou Zhao

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.

Author(s):  
Deepang Raval ◽  
Vyom Pathak ◽  
Muktan Patel ◽  
Brijesh Bhatt

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.


Symmetry ◽  
2019 ◽  
Vol 11 (8) ◽  
pp. 1018 ◽  
Author(s):  
Dong Wang ◽  
Xiaodong Wang ◽  
Shaohe Lv

Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning. For a long time, the hidden Markov model (HMM)-Gaussian mixed model (GMM) has been the mainstream speech recognition framework. But recently, HMM-deep neural network (DNN) model and the end-to-end model using deep learning has achieved performance beyond HMM-GMM. Both using deep learning techniques,


Author(s):  
Rishabh Nevatia

Abstract: Lip reading is the visual task of interpreting phrases from lip movements. While speech is one of the most common ways of communicating among individuals, understanding what a person wants to convey while having access only to their lip movements is till date a task that has not seen its paradigm. Various stages are involved in the process of automated lip reading, ranging from extraction of features to applying neural networks. This paper covers various deep learning approaches that are used for lip reading Keywords: Automatic Speech Recognition, Lip Reading, Neural Networks, Feature Extraction, Deep Learning


2021 ◽  
pp. 1-13
Author(s):  
Hamzah A. Alsayadi ◽  
Abdelaziz A. Abdelhamid ◽  
Islam Hegazy ◽  
Zaki T. Fayed

Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.


2016 ◽  
Vol 25 (02) ◽  
pp. 1650006
Author(s):  
Aleksander Smywinski-Pohl ◽  
Bartosz Ziółko

In this paper we investigate the usefulness of morphosyntactic information as well as clustering in modeling Polish for automatic speech recognition. Polish is an inflectional language, thus we investigate the usefulness of an N-gram model based on morphosyntactic features. We present how individual types of features influence the model and which types of features are best suited for building a language model for automatic speech recognition. We compared the results of applying them with a class-based model that is automatically derived from the training corpus. We show that our approach towards clustering performs significantly better than frequently used SRI LM clustering method. However, this difference is apparent only for smaller corpora.


2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

Author(s):  
Zhong Meng ◽  
Sarangarajan Parthasarathy ◽  
Eric Sun ◽  
Yashesh Gaur ◽  
Naoyuki Kanda ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document