Confidence Measures in Automatic Speech Recognition Systems for Error Detection in Restricted Domains

FRAME-SYNCHRONOUS AND LOCAL CONFIDENCE MEASURES FOR AUTOMATIC SPEECH RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008543 ◽

2011 ◽

Vol 25 (02) ◽

pp. 157-182 ◽

Cited By ~ 4

Author(s):

JOSEPH RAZIK ◽

ODILE MELLA ◽

DOMINIQUE FOHR ◽

JEAN-PAUL HATON

Keyword(s):

Speech Recognition ◽

Likelihood Ratio ◽

Automatic Speech Recognition ◽

Posterior Probability ◽

State Of The Art ◽

Confidence Measure ◽

Confidence Measures ◽

Automatic Transcription ◽

Recognition Systems ◽

Large Vocabulary Speech Recognition

In this paper, we introduce two new confidence measures for large vocabulary speech recognition systems. The major feature of these measures is that they can be computed without waiting for the end of the audio stream. We proposed two kinds of confidence measures: frame-synchronous and local. The frame-synchronous ones can be computed as soon as a frame is processed by the recognition engine and are based on a likelihood ratio. The local measures estimate a local posterior probability in the vicinity of the word to analyze. We evaluated our confidence measures within the framework of the automatic transcription of French broadcast news with the EER criterion. Our local measures achieved results very close to the best state-of-the-art measure (EER of 23% compared to 22.0%). We then conducted a preliminary experiment to assess the contribution of our confidence measure in improving the comprehension of an automatic transcription for the hearing impaired. We introduced several modalities to highlight words of low confidence in this transcription. We showed that these modalities used with our local confidence measure improved the comprehension of automatic transcription.

Download Full-text

Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection

Journal Of Big Data ◽

10.1186/s40537-020-00391-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Asmaa El Hannani ◽

Rahhal Errattahi ◽

Fatima Zahra Salmam ◽

Thomas Hain ◽

Hassan Ouahmane

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Error Detection ◽

State Of The Art ◽

Rapid Development ◽

Unified Framework ◽

Human Machine Interaction ◽

Detection Analysis ◽

Extensive Evaluation ◽

Effectiveness And Efficiency

AbstractSpeech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the proposed approaches (e.g. features and models) difficult. In this paper we perform an extensive evaluation of the effectiveness and efficiency of state-of-the-art approaches in a unified framework for both errors detection and errors type classification. We make three primary contributions throughout this paper: (1) we have compared our Variant Recurrent Neural Network (V-RNN) model with three other state-of-the-art neural based models, and have shown that the V-RNN model is the most effective classifier for ASR error detection in term of accuracy and speed, (2) we have compared four features’ settings, corresponding to different categories of predictor features and have shown that the generic features are particularly suitable for real-time ASR error detection applications, and (3) we have looked at the post generalization ability of our error detection framework and performed a detailed post detection analysis in order to perceive the recognition errors that are difficult to detect.

Download Full-text

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems

10.21437/interspeech.2013-672 ◽

2013 ◽

Author(s):

Kartik Audhkhasi ◽

Andreas M. Zavou ◽

Panayiotis G. Georgiou ◽

Shrikanth Narayanan

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Systems

Download Full-text

Evaluation of Automatic Speech Recognition Systems

10.5753/sbbd.2021.17889 ◽

2021 ◽

Author(s):

Matheus Xavier Sampaio ◽

Regis Pires Magalhães ◽

Ticiana Linhares Coelho da Silva ◽

Lívia Almada Cruz ◽

Davi Romero de Vasconcelos ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Smart Homes ◽

The Other ◽

Learning Models ◽

Recognition Systems ◽

Microsoft Azure

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Download Full-text

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Applied Sciences ◽

10.3390/app11198872 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8872

Author(s):

Iván G. Torre ◽

Mónica Romero ◽

Aitor Álvarez

Keyword(s):

Speech Recognition ◽

Supervised Learning ◽

Automatic Speech Recognition ◽

English Language ◽

Spanish Language ◽

Learning Methods ◽

Text Data ◽

Lower Performance ◽

Recognition Systems ◽

Fine Tune

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.

Download Full-text

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

ETRI Journal ◽

10.4218/etrij.13.0112.0074 ◽

2013 ◽

Vol 35 (1) ◽

pp. 100-108 ◽

Cited By ~ 13

Author(s):

Yasser Shekofteh ◽

Farshad Almasganj

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Phase Space ◽

Automatic Speech Recognition ◽

Reconstructed Phase Space ◽

Recognition Systems

Download Full-text

On the Application of Automated Software Testing Techniques to the Development and Maintenance of Speech Recognition Systems

Advanced Automated Software Testing ◽

10.4018/978-1-4666-0089-8.ch002 ◽

2012 ◽

pp. 30-48

Author(s):

Daniel Bolanos

Keyword(s):

Speech Recognition ◽

Software Testing ◽

Automatic Speech Recognition ◽

Automated Testing ◽

Automated Software Testing ◽

Testing Framework ◽

Methods And Techniques ◽

Testing Techniques ◽

Recognition Systems ◽

Automated Software

This chapter provides practitioners in the field with a set of guidelines to help them through the process of elaborating an adequate automated testing framework to competently test automatic speech recognition systems. Through this chapter the testing process of such a system is analyzed from different angles, and different methods and techniques are proposed that are well suited for this task.

Download Full-text

An MTF‐based blind restoration of temporal power envelopes as a front‐end processor for automatic speech recognition systems in reverberant environments

The Journal of the Acoustical Society of America ◽

10.1121/1.2933278 ◽

2008 ◽

Vol 123 (5) ◽

pp. 3180-3180

Author(s):

Xugang Lu ◽

Masashi Unoki ◽

Masato Akagi

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Front End ◽

Reverberant Environments ◽

Blind Restoration ◽

Recognition Systems

Download Full-text

Errors and error correction in automatic speech recognition systems

Ergonomics ◽

10.1080/00140139408964959 ◽

1994 ◽

Vol 37 (11) ◽

pp. 1943-1957 ◽

Cited By ~ 20

Author(s):

J. M. NOYES ◽

C. R. FRANKISH

Keyword(s):

Speech Recognition ◽

Error Correction ◽

Automatic Speech Recognition ◽

Recognition Systems

Download Full-text

The efficient incorporation of MLP features into automatic speech recognition systems

Computer Speech & Language ◽

10.1016/j.csl.2010.07.005 ◽

2011 ◽

Vol 25 (3) ◽

pp. 519-534 ◽

Cited By ~ 10

Author(s):

J. Park ◽

F. Diehl ◽

M.J.F. Gales ◽

M. Tomalin ◽

P.C. Woodland

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Systems

Download Full-text