Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria

Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker’s voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker’s voice features and their optimal parameters.

Download Full-text

Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss

Journal of Speech Language and Hearing Research ◽

10.1044/2017_jslhr-s-16-0269 ◽

2017 ◽

Vol 60 (9) ◽

pp. 2394-2405 ◽

Cited By ~ 6

Author(s):

Lionel Fontan ◽

Isabelle Ferrané ◽

Jérôme Farinas ◽

Julien Pinquier ◽

Julien Tardieu ◽

...

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Hearing Aids ◽

Speech Processing ◽

Fine Tuning ◽

Language Models ◽

Age Related ◽

Age Related Hearing Loss ◽

Asr System

Purpose The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method Sixty young participants with normal hearing listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and 1 comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL.

Download Full-text

A HYBRID METHOD FOR AUTOMATIC SPEECH RECOGNITION PERFORMANCE IMPROVEMENT IN REAL WORLD NOISY ENVIRONMENT

Journal of Computer Science ◽

10.3844/jcssp.2013.94.104 ◽

2013 ◽

Vol 9 (1) ◽

pp. 94-104 ◽

Cited By ~ 1

Author(s):

Shrawankar

Keyword(s):

Speech Recognition ◽

Performance Improvement ◽

Automatic Speech Recognition ◽

Hybrid Method ◽

Real World ◽

Recognition Performance ◽

Noisy Environment

Download Full-text

The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance

The Journal of the Acoustical Society of America ◽

10.1121/1.4967208 ◽

2016 ◽

Vol 140 (5) ◽

pp. EL416-EL422 ◽

Cited By ~ 7

Author(s):

Ming Tu ◽

Alan Wisler ◽

Visar Berisha ◽

Julie M. Liss

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Performance ◽

Dysarthric Speech ◽

The Relationship

Download Full-text

Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast) ◽

10.1109/apmediacast.2016.7878163 ◽

2016 ◽

Cited By ~ 3

Author(s):

Muhammad Atif Imtiaz ◽

Gulistan Raja

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Isolated Word ◽

Asr System

Download Full-text

Automatic Speech Recognition with Stuttering Speech Removal using Long Short-Term Memory (LSTM)

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6230.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1677-1681

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Short Term Memory ◽

Long Short Term Memory ◽

Increase In Accuracy ◽

Two Stages ◽

The Given ◽

Asr System

Stuttering or Stammering is a speech defect within which sounds, syllables, or words are rehashed or delayed, disrupting the traditional flow of speech. Stuttering can make it hard to speak with other individuals, which regularly have an effect on an individual's quality of life. Automatic Speech Recognition (ASR) system is a technology that converts audio speech signal into corresponding text. Presently ASR systems play a major role in controlling or providing inputs to the various applications. Such an ASR system and Machine Translation Application suffers a lot due to stuttering (speech dysfluency). Dysfluencies will affect the phrase consciousness accuracy of an ASR, with the aid of increasing word addition, substitution and dismissal rates. In this work we focused on detecting and removing the prolongation, silent pauses and repetition to generate proper text sequence for the given stuttered speech signal. The stuttered speech recognition consists of two stages namely classification using LSTM and testing in ASR. The major phases of classification system are Re-sampling, Segmentation, Pre-Emphasis, Epoch Extraction and Classification. The current work is carried out in UCLASS Stuttering dataset using MATLAB with 4% to 6% increase in accuracy when compare with ANN and SVM.

Download Full-text

Study of algorithms to combine multiple automatic speech recognition (ASR) system outputs

10.17760/d10019273 ◽

2009 ◽

Author(s):

Harish Kashyap Krishnamurthy

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Asr System

Download Full-text

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200730225301 ◽

2020 ◽

Vol 13 ◽

Author(s):

Mohit Dua ◽

Pawandeep Singh Sethi ◽

Vinam Agrawal ◽

Raghav Chawla

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Gaussian Mixture ◽

Performance Comparison ◽

Acoustic Modeling ◽

Extraction Techniques ◽

Front End ◽

Noise Robust ◽

Asr System

Introduction: An Automatic Speech Recognition (ASR) system enables to recognize the speech utterances and thus can be used to convert speech into text for various purposes. These systems are deployed in different environments such as clean or noisy and are used by all ages or types of people. These also present some of the major difficulties faced in the development of an ASR system. Thus, an ASR system need to be efficient, while also being accurate and robust. Our main goal is to minimize the error rate during training as well as testing phases, while implementing an ASR system. Performance of ASR depends upon different combinations of feature extraction techniques and back-end techniques. In this paper, using a continuous speech recognition system, the performance comparison of different combinations of feature extraction techniques and various types of back-end techniques has been presented Methods: Hidden Markov Models (HMMs), Subspace Gaussian Mixture Models (SGMMs) and Deep Neural Networks (DNNs) with DNN-HMM architecture, namely Karel's, Dan's and Hybrid DNN-SGMM architecture are used at the back-end of the implemented system. Mel frequency Cepstral Coefficient (MFCC), Perceptual Linear Prediction (PLP), and Gammatone Frequency Cepstral coefficients (GFCC) are used as feature extraction techniques at the front-end of the proposed system. Kaldi toolkit has been used for the implementation of the proposed work. The system is trained on the Texas Instruments-Massachusetts Institute of Technology (TIMIT) speech corpus for English language Results: The experimental results show that MFCC outperforms GFCC and PLP in noiseless conditions, while PLP tends to outperform MFCC and GFCC in noisy conditions. Furthermore, the hybrid of Dan's DNN implementation along with SGMM performs the best for the back-end acoustic modeling. The proposed architecture with PLP feature extraction technique in the front end and hybrid of Dan's DNN implementation along with SGMM at the back end outperforms the other combinations in a noisy environment. Conclusion: Automatic Speech recognition has numerous applications in our lives like Home automation, Personal assistant, Robotics etc. It is highly desirable to build an ASR system with good performance. The performance Automatic Speech Recognition is affected by various factors which include vocabulary size, whether system is speaker dependent or independent, whether speech is isolated, discontinuous or continuous, adverse conditions like noise. The paper presented an ensemble architecture that uses PLP for feature extraction at the front end and a hybrid of SGMM + Dan's DNN in the backend to build a noise robust ASR system Discussion: The presented work in this paper discusses the performance comparison of continuous ASR systems developed using different combinations of front-end feature extraction (MFCC, PLP, and GFCC) and back-end acoustic modeling (mono-phone, tri-phone, SGMM, DNN and hybrid DNN-SGMM) techniques. Each type of front-end technique is tested in combination with each type of back-end technique. Finally, it compares the results of the combinations thus formed, to find out the best performing combination in noisy and clean conditions

Download Full-text

A new approach to speaker adaptation by modelling pronunciation in automatic speech recognition

Speech Communication ◽

10.1016/0167-6393(93)90026-h ◽

1993 ◽

Vol 13 (3-4) ◽

pp. 281-286 ◽

Cited By ~ 1

Author(s):

Florian Schiel

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

New Approach

Download Full-text

Speech Feature Evaluation for Bangla Automatic Speech Recognition

Technical Challenges and Design Issues in Bangla Language Processing ◽

10.4018/978-1-4666-3970-6.ch009 ◽

2013 ◽

pp. 169-208 ◽

Cited By ~ 1

Author(s):

Mohammed Rokibul Alam Kotwal ◽

Foyzul Hassan ◽

Mohammad Nurul Huda

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Performance ◽

Dynamic Parameters ◽

Mel Frequency Cepstral Coefficients ◽

Phoneme Recognition ◽

Hybrid Features ◽

Feature Evaluation ◽

Speech Feature ◽

Speech Features

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.

Download Full-text

Improvement In Automatic Speech Recognition Performance In Noisy Environments Using Time-Domain Blind Source Separation

2007 IEEE 15th Signal Processing and Communications Applications ◽

10.1109/siu.2007.4298592 ◽

2007 ◽

Author(s):

Cemil Demir ◽

F. Kerem Harmanci

Keyword(s):

Speech Recognition ◽

Blind Source Separation ◽

Automatic Speech Recognition ◽

Time Domain ◽

Recognition Performance ◽

Source Separation ◽

Noisy Environments

Download Full-text