Location-Based End-to-End Speech Recognition with Multiple Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019975 ◽

2019 ◽

Vol 33 ◽

pp. 9975-9976

Author(s):

Zhijie Lin ◽

Kaiyang Lin ◽

Shiling Chen ◽

Linlin Li ◽

Zhou Zhao

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Error Correction ◽

Automatic Speech Recognition ◽

Language Model ◽

Language Models ◽

Learning Approaches ◽

Semantic Error ◽

End To End

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.

Download Full-text

Speech Vision: An End-to-End Deep Learning-based Dysarthric Automatic Speech Recognition System

IEEE Transactions on Neural Systems and Rehabilitation Engineering ◽

10.1109/tnsre.2021.3076778 ◽

2021 ◽

pp. 1-1

Author(s):

Seyed Reza Shahamiri

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

End To End

Download Full-text

Improving Deep Learning based Automatic Speech Recognition for Gujarati

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483446 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-18

Author(s):

Deepang Raval ◽

Vyom Pathak ◽

Muktan Patel ◽

Brijesh Bhatt

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Short Term Memory ◽

Language Model ◽

Recognition System ◽

Processing Technique ◽

Speech Corpus ◽

Novel Approach ◽

Asr System

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

Download Full-text

A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru46091.2019.9003790 ◽

2019 ◽

Author(s):

Erik McDermott ◽

Hasim Sak ◽

Ehsan Variani

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Density Ratio ◽

Model Fusion ◽

Ratio Approach ◽

End To End

Download Full-text

An Overview of End-to-End Automatic Speech Recognition

Symmetry ◽

10.3390/sym11081018 ◽

2019 ◽

Vol 11 (8) ◽

pp. 1018 ◽

Cited By ~ 8

Author(s):

Dong Wang ◽

Xiaodong Wang ◽

Shaohe Lv

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Deep Neural Network ◽

Mixed Model ◽

Hidden Markov ◽

Continuous Speech Recognition ◽

Learning Techniques ◽

Long Time ◽

End To End

Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning. For a long time, the hidden Markov model (HMM)-Gaussian mixed model (GMM) has been the mainstream speech recognition framework. But recently, HMM-deep neural network (DNN) model and the end-to-end model using deep learning has achieved performance beyond HMM-GMM. Both using deep learning techniques,

Download Full-text

Lip Reading: Delving into Deep Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38216 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1555-1561

Author(s):

Rishabh Nevatia

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Visual Task ◽

Learning Approaches ◽

Lip Reading

Abstract: Lip reading is the visual task of interpreting phrases from lip movements. While speech is one of the most common ways of communicating among individuals, understanding what a person wants to convey while having access only to their lip movements is till date a task that has not seen its paradigm. Various stages are involved in the process of automated lip reading, ranging from extraction of features to applying neural networks. This paper covers various deep learning approaches that are used for lip reading Keywords: Automatic Speech Recognition, Lip Reading, Neural Networks, Feature Extraction, Deep Learning

Download Full-text

Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202841 ◽

2021 ◽

pp. 1-13

Author(s):

Hamzah A. Alsayadi ◽

Abdelaziz A. Abdelhamid ◽

Islam Hegazy ◽

Zaki T. Fayed

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Language Model ◽

Arabic Language ◽

Speech Corpus ◽

Word Error Rate ◽

Single Speaker ◽

End To End ◽

Deep Learning Model ◽

Arabic Speech Recognition

Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.

Download Full-text

Application of Morphosyntactic and Class-Based Language Models in Automatic Speech Recognition of Polish

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500068 ◽

2016 ◽

Vol 25 (02) ◽

pp. 1650006

Author(s):

Aleksander Smywinski-Pohl ◽

Bartosz Ziółko

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Language Models ◽

Clustering Method ◽

Training Corpus ◽

Model Based ◽

N Gram ◽

Better Than

In this paper we investigate the usefulness of morphosyntactic information as well as clustering in modeling Polish for automatic speech recognition. Polish is an inflectional language, thus we investigate the usefulness of an N-gram model based on morphosyntactic features. We present how individual types of features influence the model and which types of features are best suited for building a language model for automatic speech recognition. We compared the results of applying them with a class-based model that is automatically derived from the training corpus. We show that our approach towards clustering performs significantly better than frequently used SRI LM clustering method. However, this difference is apparent only for smaller corpora.

Download Full-text

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

10.21437/interspeech.2020-1930 ◽

2020 ◽

Author(s):

Ryo Masumura ◽

Naoki Makishima ◽

Mana Ihori ◽

Akihiko Takashima ◽

Tomohiro Tanaka ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

End To End

Download Full-text

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383515 ◽

2021 ◽

Author(s):

Zhong Meng ◽

Sarangarajan Parthasarathy ◽

Eric Sun ◽

Yashesh Gaur ◽

Naoyuki Kanda ◽

...

Keyword(s):

Speech Recognition ◽

Language Model ◽

Model Estimation ◽

End To End

Download Full-text

Research on Text Error Correction Algorithm after Automatic Speech Recognition Based on Pragmatic Information

2020 International Conference on Control, Robotics and Intelligent System ◽

10.1145/3437802.3437830 ◽

2020 ◽

Author(s):

Yiming Sun ◽

Tianyu Xiao ◽

Chen Yang ◽

Wei Liu

Keyword(s):

Speech Recognition ◽

Error Correction ◽

Automatic Speech Recognition ◽

Correction Algorithm ◽

Pragmatic Information ◽

Error Correction Algorithm

Download Full-text