speech corpus Latest Research Papers

Improving Deep Learning based Automatic Speech Recognition for Gujarati

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483446 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-18

Author(s):

Deepang Raval ◽

Vyom Pathak ◽

Muktan Patel ◽

Brijesh Bhatt

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Short Term Memory ◽

Language Model ◽

Recognition System ◽

Processing Technique ◽

Speech Corpus ◽

Novel Approach ◽

Asr System

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

Age group classification and gender recognition from speech with temporal convolutional neural networks

Multimedia Tools and Applications ◽

10.1007/s11042-021-11614-4 ◽

2022 ◽

Author(s):

Héctor A. Sánchez-Hevia ◽

Roberto Gil-Pita ◽

Manuel Utrilla-Manso ◽

Manuel Rosa-Zurera

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Network Architecture ◽

Deep Neural Networks ◽

Interactive Voice Response ◽

Classification Error ◽

Age Group ◽

Speech Corpus ◽

Wide Range ◽

Feature Extraction And Selection

AbstractThis paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%.

Steering a Robotic Wheelchair Based on Voice Recognition System Using Convolutional Neural Networks

Electronics ◽

10.3390/electronics11010168 ◽

2022 ◽

Vol 11 (1) ◽

pp. 168

Author(s):

Mohsen Bakouri ◽

Mohammed Alsehaimi ◽

Husham Farouk Ismail ◽

Khaled Alshareef ◽

Ali Ganoun ◽

...

Keyword(s):

Performance Test ◽

Low Cost ◽

Voice Recognition ◽

Recognition System ◽

Mobile App ◽

Raspberry Pi ◽

Speech Corpus ◽

Dc Motor Drives ◽

Software And Hardware ◽

Robotic Wheelchair

Many wheelchair people depend on others to control the movement of their wheelchairs, which significantly influences their independence and quality of life. Smart wheelchairs offer a degree of self-dependence and freedom to drive their own vehicles. In this work, we designed and implemented a low-cost software and hardware method to steer a robotic wheelchair. Moreover, from our method, we developed our own Android mobile app based on Flutter software. A convolutional neural network (CNN)-based network-in-network (NIN) structure approach integrated with a voice recognition model was also developed and configured to build the mobile app. The technique was also implemented and configured using an offline Wi-Fi network hotspot between software and hardware components. Five voice commands (yes, no, left, right, and stop) guided and controlled the wheelchair through the Raspberry Pi and DC motor drives. The overall system was evaluated based on a trained and validated English speech corpus by Arabic native speakers for isolated words to assess the performance of the Android OS application. The maneuverability performance of indoor and outdoor navigation was also evaluated in terms of accuracy. The results indicated a degree of accuracy of approximately 87.2% of the accurate prediction of some of the five voice commands. Additionally, in the real-time performance test, the root-mean-square deviation (RMSD) values between the planned and actual nodes for indoor/outdoor maneuvering were 1.721 × 10−5 and 1.743 × 10−5, respectively.

Managing Data for Integrated Speech Corpus Analysis in SPeech Across Dialects of English (SPADE)

10.7551/mitpress/12200.003.0020 ◽

2022 ◽

Keyword(s):

Corpus Analysis ◽

Speech Corpus

Automatic Phonemes Segmentation for Quran Verses Using Kaldi Toolkit

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i12.006 ◽

2021 ◽

Vol 10 (12) ◽

pp. 39-45

Author(s):

Alaa Ehab Sakran ◽

Mohsen Rashwan ◽

Sherif Mahdy Abdou

Keyword(s):

Deep Neural Network ◽

Automatic Segmentation ◽

Language Model ◽

Training Phase ◽

Mel Frequency Cepstral Coefficients ◽

Speech Corpus ◽

Data Set ◽

Acoustic Modelling ◽

Development Data ◽

Selection Of

In this paper, automatic segmentation system was built using the Kaldi toolkit at phoneme level for Quran verses data set with a total speech corpus of (80 hours) and its corresponding text corpus respectively, with a size of 1100 recorded Quran verses of 100 non-Arab reciters. Initiated with the extraction of Mel Frequency Cepstral Coefficients MFCCs, the proceedings of the building of Language Model LM and Acoustic Model AM training phase continued until the Deep Neural Network DNN level by selecting 770 waves (70 reciters). The testing of the system was done using 220 waves (20 reciters), and concluded with the selection of the development data set which was 280 waves (10 reciters). Comparison was implemented between automatic and manual segmentation, and the results obtained for the test set was 99% and for the Development set was 99% with Time Delay Neural Networks TDNN based acoustic modelling.

Применение алгоритма аппроксимации графика долей энергии для определения пауз в речевом сигнале

Вестник ВГУ. Серия: Системный анализ и информационные технологии ◽

10.17308/sait.2021.3/3740 ◽

2021 ◽

pp. 106-114

Author(s):

Татьяна Николаевна Балабанова ◽

Алексей Владимирович Болдышев ◽

Сергей Вячеславович Уманец

Keyword(s):

Continuous Speech ◽

Speech Corpus

В данной работе рассматривается речевой сигнал как набор фрагментов, содержащих речевые компоненты и фрагменты с шумами, соответствующие паузам между словами. Ставится задача по составлению решающей функции, способной принять или отвергнуть гипотезу об отсутствии речи в отрезке речевого сигнала. На основе субполосного метода для отрезка речевого сигнала составляется его распределение энергий по частотам. Для этого распределения в дальнейшем применяется процедура аппроксимации смесью радиально-базисными функциями (функциями Гаусса). Смесь представляет собой взвешенную сумму радиально-базисных функций и равномерно-распределённой составляющей. По соотношению максимальных значений компонент смеси составляется решающее правило. Для проведения вычислительного эксперимента вводится нелинейность «зона нечувствительности», выбор которой обусловлен особенностями электрической активности путей и центров слуховой системы. В работе приводится результат применения алгоритма определения пауз в речевом сигнале. В качестве рабочего материала использовалась база размеченных речевых фрагментов американского агентства передовых оборонных исследовательских проектов DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. Всего было обработано 100 звукозаписей, размер отрезка анализа был взят 9 миллисекунд, частота дискретизации 16000Гц. Для проверки работоспособности предлагаемого алгоритма были оценены ошибки первого рода «пропуск цели» — когда алгоритм не начал отмечать паузу, но такая отметка присутствует при ручной расстановке, а также ошибки второго рода «ложная тревога» — когда произошла ошибочная постановка паузы. Полученные в ходе вычислительных экспериментов результаты позволяются судить о достаточно высокой эффективности предлагаемого подхода для определения пауз в речевом сигнале.

Bangladeshi Bangla Speech Corpus for Automatic Speech Recognition Research

Speech Communication ◽

10.1016/j.specom.2021.12.004 ◽

2021 ◽

Author(s):

Shafkat Kibria ◽

Ahnaf Mozib Samin ◽

M. Humayon Kobir ◽

M. Shahidur Rahman ◽

M. Reza Selim ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Corpus

Die Bedeutung der Redekorpora in der Erforschung der zielsprachlichen Verdolmetschung

Vārds un tā pētīšanas aspekti: rakstu krājums = The Word: Aspects of Research: conference proceedings - Vārds un tā pētīšanas aspekti = The Word: Aspects of Research ◽

10.37384/vtpa.2021.25.224 ◽

2021 ◽

pp. 224-235

Author(s):

Linda Gaile ◽

Keyword(s):

Data Retrieval ◽

Target Language ◽

Speech Corpus ◽

Simultaneous Interpretation ◽

Simultaneous Interpreting ◽

Source Language ◽

Speech Corpora ◽

Short Time ◽

Language Corpus ◽

Language Pair

The research on the simultaneous interpreting process and the associated target and source languages requires both the oral source speeches and the simultaneous interpreting of the spoken source speeches into the target language. For a relatively short time now, researchers of translation and interpreting have been able to access digitized linguistic corpora, parallel and speech corpora of different language pairs, from which they can build their own purpose-oriented corpus of original and target-language oral texts. Furthermore, the built-up language corpus can be analysed qualitatively or quantitatively using different software and investigated for specific linguistic phenomena. This present article focuses on the benefits of data retrieval from digitalized language and speech corpora, which can be an important source of assistance for the analysis of the oral simultaneous interpretation target text. At the heart of this question is the European Parliament’s speeches corpus, from which authentic speeches in the source language (German) and simultaneous interpretation in the target language (Latvian) can be obtained to create a sub-corpus for the German-Latvian language pair. Among others, the question of which interpreting strategies can be used for simultaneous interpreting from German into Latvian is explored, and the application of EXMARaLDA Partitur-Editor software is presented, which allows to create a simultaneous transcription of the source language and the simultaneously interpreted target language as well as to develop a speech corpus.

GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpus

10.1109/o-cocosda202152914.2021.9660495 ◽

2021 ◽

Author(s):

Shinnosuke Isobe ◽

Ryuichi Hirose ◽

Takumi Nishiwaki ◽

Tomohiro Hattori ◽

Satoshi Tamura ◽

...

Keyword(s):

Speech Corpus

A Blind Method for Phone Segmentation and Its Evaluation on Vietnamese Speech Corpus

10.1109/o-cocosda202152914.2021.9660474 ◽

2021 ◽

Author(s):

Dac-Thang Hoang ◽

Tat-Thang Vu

Keyword(s):

Speech Corpus

speech corpus
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Deep Learning based Automatic Speech Recognition for Gujarati

Age group classification and gender recognition from speech with temporal convolutional neural networks

Steering a Robotic Wheelchair Based on Voice Recognition System Using Convolutional Neural Networks

Managing Data for Integrated Speech Corpus Analysis in SPeech Across Dialects of English (SPADE)

Automatic Phonemes Segmentation for Quran Verses Using Kaldi Toolkit

Применение алгоритма аппроксимации графика долей энергии для определения пауз в речевом сигнале

Bangladeshi Bangla Speech Corpus for Automatic Speech Recognition Research

Die Bedeutung der Redekorpora in der Erforschung der zielsprachlichen Verdolmetschung

GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpus

A Blind Method for Phone Segmentation and Its Evaluation on Vietnamese Speech Corpus

Export Citation Format

speech corpusRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Deep Learning based Automatic Speech Recognition for Gujarati

Age group classification and gender recognition from speech with temporal convolutional neural networks

Steering a Robotic Wheelchair Based on Voice Recognition System Using Convolutional Neural Networks

Managing Data for Integrated Speech Corpus Analysis in SPeech Across Dialects of English (SPADE)

Automatic Phonemes Segmentation for Quran Verses Using Kaldi Toolkit

Применение алгоритма аппроксимации графика долей энергии для определения пауз в речевом сигнале

Bangladeshi Bangla Speech Corpus for Automatic Speech Recognition Research

Die Bedeutung der Redekorpora in der Erforschung der zielsprachlichen Verdolmetschung

GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpus

A Blind Method for Phone Segmentation and Its Evaluation on Vietnamese Speech Corpus

speech corpus
Recently Published Documents