scholarly journals Bangladeshi Bangla Speech Corpus for Automatic Speech Recognition Research

Author(s):  
Shafkat Kibria ◽  
Ahnaf Mozib Samin ◽  
M. Humayon Kobir ◽  
M. Shahidur Rahman ◽  
M. Reza Selim ◽  
...  
Author(s):  
Aye Nyein Mon ◽  
Win Pa Pa ◽  
Ye Kyaw Thu

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.<br />The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models  such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data  sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR  using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2.<br /><br />


Author(s):  
Devaraja Adiga ◽  
Rishabh Kumar ◽  
Amrith Krishna ◽  
Preethi Jyothi ◽  
Ganesh Ramakrishnan ◽  
...  

Author(s):  
Deepang Raval ◽  
Vyom Pathak ◽  
Muktan Patel ◽  
Brijesh Bhatt

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.


Sign in / Sign up

Export Citation Format

Share Document