AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.<br />The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2.<br /><br />

Download Full-text

Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus

1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings ◽

10.1109/icassp.1996.540314 ◽

2002 ◽

Cited By ~ 2

Author(s):

Fu-Hua Liu ◽

M. Picheny ◽

P. Srinivasa ◽

M. Monkowski ◽

J. Chen

Keyword(s):

Speech Recognition ◽

Speech Corpus ◽

Large Vocabulary ◽

Telephone Speech

Download Full-text

Chhattisgarhi speech corpus for research and development in automatic speech recognition

International Journal of Speech Technology ◽

10.1007/s10772-018-9496-7 ◽

2018 ◽

Vol 21 (2) ◽

pp. 193-210 ◽

Cited By ~ 2

Author(s):

Narendra D. Londhe ◽

Ghanahshyam B. Kshirsagar

Keyword(s):

Speech Recognition ◽

Research And Development ◽

Automatic Speech Recognition ◽

Speech Corpus

Download Full-text

An Amharic speech corpus for large vocabulary continuous speech recognition

10.21437/interspeech.2005-467 ◽

2005 ◽

Author(s):

Solomon Teferra Abate ◽

Wolfgang Menzel ◽

Bairu Tafila

Keyword(s):

Speech Recognition ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Speech Corpus ◽

Large Vocabulary

Download Full-text

Spatial Release from Masking Using Clinical Corpora: Sentence Recognition in a Colocated or Spatially Separated Speech Masker

Journal of the American Academy of Audiology ◽

10.3766/jaaa.19018 ◽

2020 ◽

Vol 31 (04) ◽

pp. 271-276

Author(s):

Grant King ◽

Nicole E. Corbin ◽

Lori J. Leibold ◽

Emily Buss

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Spatial Separation ◽

Speech Corpus ◽

Sentence Recognition ◽

Spatial Release ◽

The Mean ◽

Release From Masking ◽

Perceived Spatial Separation ◽

Separation Conditions

Abstract Background Speech recognition in complex multisource environments is challenging, particularly for listeners with hearing loss. One source of difficulty is the reduced ability of listeners with hearing loss to benefit from spatial separation of the target and masker, an effect called spatial release from masking (SRM). Despite the prevalence of complex multisource environments in everyday life, SRM is not routinely evaluated in the audiology clinic. Purpose The purpose of this study was to demonstrate the feasibility of assessing SRM in adults using widely available tests of speech-in-speech recognition that can be conducted using standard clinical equipment. Research Design Participants were 22 young adults with normal hearing. The task was masked sentence recognition, using each of five clinically available corpora with speech maskers. The target always sounded like it originated from directly in front of the listener, and the masker either sounded like it originated from the front (colocated with the target) or from the side (separated from the target). In the real spatial manipulation conditions, source location was manipulated by routing the target and masker to either a single speaker or to two speakers: one directly in front of the participant, and one mounted in an adjacent corner, 90° to the right. In the perceived spatial separation conditions, the target and masker were presented from both speakers with delays that made them sound as if they were either colocated or separated. Results With real spatial manipulations, the mean SRM ranged from 7.1 to 11.4 dB, depending on the speech corpus. With perceived spatial manipulations, the mean SRM ranged from 1.8 to 3.1 dB. Whereas real separation improves the signal-to-noise ratio in the ear contralateral to the masker, SRM in the perceived spatial separation conditions is based solely on interaural timing cues. Conclusions The finding of robust SRM with widely available speech corpora supports the feasibility of measuring this important aspect of hearing in the audiology clinic. The finding of a small but significant SRM in the perceived spatial separation conditions suggests that modified materials could be used to evaluate the use of interaural timing cues specifically.

Download Full-text