Phonetically rich and balanced speech corpus for Arabic speaker-independent continuous automatic speech recognition systems

2018 ◽

pp. 1-6

Author(s):

Md. Farukuzzaman Khan ◽

M. Abdus Sobhan

Keyword(s):

Speech Recognition ◽

Application Area ◽

Property A ◽

Speech Corpus ◽

Research Activities ◽

Speaker Independent ◽

Type Size ◽

New Speakers ◽

Recognition Systems ◽

Test Database

A new speech corpus of connected Bangla words derived from newspapers text corpus BdNC01 has recorded. This has been designed for various research activities related to speaker-independent Bangla speech recognition. The database consists of speech of 100 speakers, each of them uttered 52 sentences as connected words for training database. Another 50 new speakers were employed to speak all the list of speech to construct a test database. Every utterance was repeated 5 times in various days to avoid time variation of speaker property. A total of 62 hours of recording makes the corpus largest in its type, size and application area. This paper describes the motivation for the corpus and the processes undertaken in its construction. The paper concludes with the usability of the corpus.

Download Full-text

Automatic Speech Recognition (ASR) System for Isolated Marathi Words: using HTK

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2651.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 3702-3705

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Viterbi Algorithm ◽

Gaussian Mixture ◽

Speech Corpus ◽

Word Level ◽

Speaker Independent ◽

Token Passing ◽

Mel Frequency Cepstral Coefficient ◽

Asr System

The present manuscript focuses on building automatic speech recognition (ASR) system for Marathi language (M-ASR) using Hidden Markov Model Toolkit (HTK). The M-ASR system gives the detail about experimentation and implementation using the HTK Toolkit. In this work total 106 speaker independent Marathi isolated words were recognized. These unique Marathi words are used to train and evaluate M-ASR system. The speech corpus (database) is created by us using isolated Marathi words uttered with mixed gender people. The system uses Mel Frequency Cepstral Coefficient (MFCC) for the purpose of extracting features using Gaussian mixture model (GMM). Viterbi algorithm based on token passing is used for decoding to recognize unknown utterances. The proposed M-ASR system is speaker independent. The proposed system has reported 96.23% word level recognition accuracy.

Download Full-text

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems

10.21437/interspeech.2013-672 ◽

2013 ◽

Author(s):

Kartik Audhkhasi ◽

Andreas M. Zavou ◽

Panayiotis G. Georgiou ◽

Shrikanth Narayanan

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Systems

Download Full-text

Evaluation of Automatic Speech Recognition Systems

10.5753/sbbd.2021.17889 ◽

2021 ◽

Author(s):

Matheus Xavier Sampaio ◽

Regis Pires Magalhães ◽

Ticiana Linhares Coelho da Silva ◽

Lívia Almada Cruz ◽

Davi Romero de Vasconcelos ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Smart Homes ◽

The Other ◽

Learning Models ◽

Recognition Systems ◽

Microsoft Azure

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Download Full-text

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Applied Sciences ◽

10.3390/app11198872 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8872

Author(s):

Iván G. Torre ◽

Mónica Romero ◽

Aitor Álvarez

Keyword(s):

Speech Recognition ◽

Supervised Learning ◽

Automatic Speech Recognition ◽

English Language ◽

Spanish Language ◽

Learning Methods ◽

Text Data ◽

Lower Performance ◽

Recognition Systems ◽

Fine Tune

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.

Download Full-text

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp3194-3202 ◽

2019 ◽

Vol 9 (4) ◽

pp. 3194 ◽

Cited By ~ 1

Author(s):

Aye Nyein Mon ◽

Win Pa Pa ◽

Ye Kyaw Thu

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Gaussian Mixture ◽

Error Rates ◽

Training Data ◽

Speech Corpus ◽

Total Size ◽

Test Sets ◽

Web News

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.<br />The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2.<br /><br />

Download Full-text

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

ETRI Journal ◽

10.4218/etrij.13.0112.0074 ◽

2013 ◽

Vol 35 (1) ◽

pp. 100-108 ◽

Cited By ~ 13

Author(s):

Yasser Shekofteh ◽

Farshad Almasganj

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Phase Space ◽

Automatic Speech Recognition ◽

Reconstructed Phase Space ◽

Recognition Systems

Download Full-text

Chhattisgarhi speech corpus for research and development in automatic speech recognition

International Journal of Speech Technology ◽

10.1007/s10772-018-9496-7 ◽

2018 ◽

Vol 21 (2) ◽

pp. 193-210 ◽

Cited By ~ 2

Author(s):

Narendra D. Londhe ◽

Ghanahshyam B. Kshirsagar

Keyword(s):

Speech Recognition ◽

Research And Development ◽

Automatic Speech Recognition ◽

Speech Corpus

Download Full-text

On the Application of Automated Software Testing Techniques to the Development and Maintenance of Speech Recognition Systems

Advanced Automated Software Testing ◽

10.4018/978-1-4666-0089-8.ch002 ◽

2012 ◽

pp. 30-48

Author(s):

Daniel Bolanos

Keyword(s):

Speech Recognition ◽

Software Testing ◽

Automatic Speech Recognition ◽

Automated Testing ◽

Automated Software Testing ◽

Testing Framework ◽

Methods And Techniques ◽

Testing Techniques ◽

Recognition Systems ◽

Automated Software

This chapter provides practitioners in the field with a set of guidelines to help them through the process of elaborating an adequate automated testing framework to competently test automatic speech recognition systems. Through this chapter the testing process of such a system is analyzed from different angles, and different methods and techniques are proposed that are well suited for this task.

Download Full-text

Confidence Measures in Automatic Speech Recognition Systems for Error Detection in Restricted Domains

Advances in Speech and Language Technologies for Iberian Languages - Lecture Notes in Computer Science ◽

10.1007/978-3-319-13623-3_18 ◽

2014 ◽

pp. 168-177

Author(s):

Julia Olcoz ◽

Alfonso Ortega ◽

Antonio Miguel ◽

Eduardo Lleida

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Error Detection ◽

Confidence Measures ◽

Restricted Domains ◽

Recognition Systems

Download Full-text

Phonetically rich and balanced speech corpus for Arabic speaker-independent continuous automatic speech recognition systems

Creation of Connected Word Speech Corpus for Bangla Speech Recognition Systems

Automatic Speech Recognition (ASR) System for Isolated Marathi Words: using HTK

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems

Evaluation of Automatic Speech Recognition Systems

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

Chhattisgarhi speech corpus for research and development in automatic speech recognition

On the Application of Automated Software Testing Techniques to the Development and Maintenance of Speech Recognition Systems

Confidence Measures in Automatic Speech Recognition Systems for Error Detection in Restricted Domains

Export Citation Format