UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

Aye Nyein Mon; Win Pa Pa; Ye Kyaw Thu

doi:10.11591/ijece.v9i4.pp3194-3202

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp3194-3202 ◽

2019 ◽

Vol 9 (4) ◽

pp. 3194 ◽

Cited By ~ 1

Author(s):

Aye Nyein Mon ◽

Win Pa Pa ◽

Ye Kyaw Thu

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Gaussian Mixture ◽

Error Rates ◽

Training Data ◽

Speech Corpus ◽

Total Size ◽

Test Sets ◽

Web News

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.<br />The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2.<br /><br />

Get full-text (via PubEx)

Automatic Speech Recognition (ASR) System for Isolated Marathi Words: using HTK

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2651.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 3702-3705

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Viterbi Algorithm ◽

Gaussian Mixture ◽

Speech Corpus ◽

Word Level ◽

Speaker Independent ◽

Token Passing ◽

Mel Frequency Cepstral Coefficient ◽

Asr System

The present manuscript focuses on building automatic speech recognition (ASR) system for Marathi language (M-ASR) using Hidden Markov Model Toolkit (HTK). The M-ASR system gives the detail about experimentation and implementation using the HTK Toolkit. In this work total 106 speaker independent Marathi isolated words were recognized. These unique Marathi words are used to train and evaluate M-ASR system. The speech corpus (database) is created by us using isolated Marathi words uttered with mixed gender people. The system uses Mel Frequency Cepstral Coefficient (MFCC) for the purpose of extracting features using Gaussian mixture model (GMM). Viterbi algorithm based on token passing is used for decoding to recognize unknown utterances. The proposed M-ASR system is speaker independent. The proposed system has reported 96.23% word level recognition accuracy.

Get full-text (via PubEx)

Gaussian mixture models for adaptation of deep neural network acoustic models in automatic speech recognition systems

Scientific and technical journal of information technologies mechanics and optics ◽

10.17586/2226-1494-2016-16-6-1063-1072 ◽

2016 ◽

pp. 1063-1072

Author(s):

N.A. Tomashenko ◽

Yu.Yu. Khokhlov ◽

A. Larcher ◽

Ya. Estève ◽

Yu.N. Matveev

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Mixture Models ◽

Automatic Speech Recognition ◽

Deep Neural Network ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Acoustic Models ◽

Recognition Systems

Get full-text (via PubEx)

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413453 ◽

2021 ◽

Author(s):

Chao-Han Huck Yang ◽

Jun Qi ◽

Samuel Yen-Chi Chen ◽

Pin-Yu Chen ◽

Sabato Marco Siniscalchi ◽

...

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Speech Recognition ◽

Convolutional Neural Network ◽

Automatic Speech Recognition

Get full-text (via PubEx)

Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Communications in Computer and Information Science - Neural Information Processing ◽

10.1007/978-3-030-36802-9_76 ◽

2019 ◽

pp. 718-726

Author(s):

Jiabin Xue ◽

Tieran Zheng ◽

Jiqing Han

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Get full-text (via PubEx)

Convolutional Neural Network for Automatic Speech Recognition of Filipino Language

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/0791.12020 ◽

2020 ◽

Vol 9 (1.1 S I) ◽

pp. 34-40

Author(s):

Felizardo Reyes Jr., Arnel Fajardo

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Convolutional Neural Network ◽

Automatic Speech Recognition

Get full-text (via PubEx)

Chhattisgarhi speech corpus for research and development in automatic speech recognition

International Journal of Speech Technology ◽

10.1007/s10772-018-9496-7 ◽

2018 ◽

Vol 21 (2) ◽

pp. 193-210 ◽

Cited By ~ 2

Author(s):

Narendra D. Londhe ◽

Ghanahshyam B. Kshirsagar

Keyword(s):

Speech Recognition ◽

Research And Development ◽

Automatic Speech Recognition ◽

Speech Corpus

Get full-text (via PubEx)

Development of Automatic Speech Recognition for Xitsonga Using Subspace Gaussian Mixture Model

10.1109/icabcd51485.2021.9519355 ◽

2021 ◽

Author(s):

Vukosi Rikhotso ◽

Thipe Modipa ◽

Madimetja Jonas Manamela ◽

Tumisho Bilson Mokgonyane

Keyword(s):

Speech Recognition ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Automatic Speech Recognition ◽

Gaussian Mixture

Get full-text (via PubEx)

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.31 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 5

Author(s):

Prashanth Gurunath Shivakumar ◽

Haoqi Li ◽

Kevin Knight ◽

Panayiotis Georgiou

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Context Modeling ◽

Short Term ◽

Extensive Analysis ◽

Network Language ◽

Correction System

AbstractAutomatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example, pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work, we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (Out-Of-Vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through Recurrent Neural Network (RNN) language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.

Get full-text (via PubEx)

Power-Law Nonlinearity with Maximally Uniform Distribution Criterion for Improved Neural Network Training in Automatic Speech Recognition

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru46091.2019.9003973 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chanwoo Kim ◽

Mehul Kumar ◽

Kwangyoun Kim ◽

Dhananjaya Gowda

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Uniform Distribution ◽

Automatic Speech Recognition ◽

Power Law ◽

Neural Network Training ◽

Network Training

Get full-text (via PubEx)

Robust automatic speech recognition based on neural network in reverberant environments

Civil, Architecture and Environmental Engineering, Volume 2 ◽

10.1201/9781315116242-55 ◽

2017 ◽

Author(s):

L Bai ◽

H Li ◽

Y He

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Reverberant Environments

Get full-text (via PubEx)