Data augmentation on spontaneous Indonesian automatic speech recognition using statistical machine translation

Grammatical error correction (GEC) is an important application aspect of natural language processing techniques, and GEC system is a kind of very important intelligent system that has long been explored both in academic and industrial communities. The past decade has witnessed significant progress achieved in GEC for the sake of increasing popularity of machine learning and deep learning. However, there is not a survey that untangles the large amount of research works and progress in this field. We present the first survey in GEC for a comprehensive retrospective of the literature in this area. We first give the definition of GEC task and introduce the public datasets and data annotation schema. After that, we discuss six kinds of basic approaches, six commonly applied performance boosting techniques for GEC systems, and three data augmentation methods. Since GEC is typically viewed as a sister task of Machine Translation (MT), we put more emphasis on the statistical machine translation (SMT)-based approaches and neural machine translation (NMT)-based approaches for the sake of their importance. Similarly, some performance-boosting techniques are adapted from MT and are successfully combined with GEC systems for enhancement on the final performance. More importantly, after the introduction of the evaluation in GEC, we make an in-depth analysis based on empirical results in aspects of GEC approaches and GEC systems for a clearer pattern of progress in GEC, where error type analysis and system recapitulation are clearly presented. Finally, we discuss five prospective directions for future GEC researches.

Download Full-text

Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition

Speech Communication ◽

10.1016/j.specom.2021.08.002 ◽

2021 ◽

Author(s):

Gary Yeung ◽

Ruchao Fan ◽

Abeer Alwan

Keyword(s):

Speech Recognition ◽

Fundamental Frequency ◽

Automatic Speech Recognition ◽

Data Augmentation ◽

Frequency Feature

Download Full-text

On statistical machine translation method for lexicon refinement in speech recognition

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) ◽

10.1109/chinasip.2015.7230355 ◽

2015 ◽

Author(s):

Haihua Xu ◽

Xiong Xiao ◽

Eng-Siong Chng ◽

Haizhou Li

Keyword(s):

Speech Recognition ◽

Machine Translation ◽

Statistical Machine Translation ◽

Translation Method

Download Full-text

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

10.21437/interspeech.2019-2680 ◽

2019 ◽

Cited By ~ 103

Author(s):

Daniel S. Park ◽

William Chan ◽

Yu Zhang ◽

Chung-Cheng Chiu ◽

Barret Zoph ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Data Augmentation

Download Full-text

Sisyphus, a Workflow Manager Designed for Machine Translation and Automatic Speech Recognition

10.18653/v1/d18-2015 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jan-Thorsten Peter ◽

Eugen Beck ◽

Hermann Ney

Keyword(s):

Speech Recognition ◽

Machine Translation ◽

Automatic Speech Recognition ◽

Workflow Manager

Download Full-text

Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Information ◽

10.3390/info12020062 ◽

2021 ◽

Vol 12 (2) ◽

pp. 62 ◽

Cited By ~ 1

Author(s):

Eshete Derb Emiru ◽

Shengwu Xiong ◽

Yaxing Li ◽

Awet Fesseha ◽

Moussa Diallo

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Data Augmentation ◽

Recognition System ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Attention Model ◽

Recognition Systems ◽

End To End ◽

Connectionist Temporal Classification

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.

Download Full-text

Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) ◽

10.1109/mmsp48831.2020.9287127 ◽

2020 ◽

Author(s):

Nirayo Hailu ◽

Ingo Siegert ◽

Andreas Nurnberger

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Data Augmentation

Download Full-text

Language To Language Translation System

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206363 ◽

2020 ◽

pp. 289-293

Author(s):

Ms Pratheeksha ◽

Pratheeksha Rai ◽

Ms Vijetha

Keyword(s):

Speech Recognition ◽

Machine Translation ◽

Automatic Speech Recognition ◽

Speech Synthesis ◽

Language Translation ◽

Target Language ◽

Translation System ◽

Text To Speech ◽

Source Language ◽

Text To Speech Synthesis

The system used in Language to Language Translation is the phrases spoken in one language are immediately spoken in other language by the device. Language to Language Translation is a three steps software process which includes Automatic Speech Recognition, Machine Translation and Voice Synthesis. Language to Language system includes the major speech translation projects using different approaches for Speech Recognition, Translation and Text to Speech synthesis highlighting the major pros and cons for the approach being used. Language translation is a process that takes the conversational phrase in one language as an input and translated speech phrases in another language as the output. The three components of language-to-language translation are connected in a sequential order. Automatic Speech Recognition (ASR) is responsible for converting the spoken phrases of source language to the text in the same language followed by machine translation which translates the source language to next target language text and finally the speech synthesizer is responsible for text to speech conversion of target language.

Download Full-text

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414483 ◽

2021 ◽

Author(s):

Linghui Meng ◽

Jin Xu ◽

Xu Tan ◽

Jindong Wang ◽

Tao Qin ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Data Augmentation ◽

Low Resource

Download Full-text

Extremely low-resource neural machine translation for Asian languages

Machine Translation ◽

10.1007/s10590-020-09258-6 ◽

2020 ◽

Vol 34 (4) ◽

pp. 347-382

Author(s):

Raphael Rubino ◽

Benjamin Marie ◽

Raj Dabre ◽

Atushi Fujita ◽

Masao Utiyama ◽

...

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Statistical Machine Translation ◽

Synthetic Data ◽

Parameter Tuning ◽

Data Generation ◽

Neural Machine Translation ◽

Low Resource ◽

Translation Quality ◽

Asian Languages

AbstractThis paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel sentences used to train bilingual baseline models, we introduce additional monolingual corpora and data processing techniques to improve translation quality. We describe a series of best practices and empirically validate the methods through an evaluation conducted on eight translation directions, based on state-of-the-art NMT approaches such as hyper-parameter search, data augmentation with forward and backward translation in combination with tags and noise, as well as joint multilingual training. Experiments show that the commonly used default architecture of self-attention NMT models does not reach the best results, validating previous work on the importance of hyper-parameter tuning. Additionally, empirical results indicate the amount of synthetic data required to efficiently increase the parameters of the models leading to the best translation quality measured by automatic metrics. We show that the best NMT models trained on large amount of tagged back-translations outperform three other synthetic data generation approaches. Finally, comparison with statistical machine translation (SMT) indicates that extremely low-resource NMT requires a large amount of synthetic parallel data obtained with back-translation in order to close the performance gap with the preceding SMT approach.

Download Full-text