A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition

Stuttering or Stammering is a speech defect within which sounds, syllables, or words are rehashed or delayed, disrupting the traditional flow of speech. Stuttering can make it hard to speak with other individuals, which regularly have an effect on an individual's quality of life. Automatic Speech Recognition (ASR) system is a technology that converts audio speech signal into corresponding text. Presently ASR systems play a major role in controlling or providing inputs to the various applications. Such an ASR system and Machine Translation Application suffers a lot due to stuttering (speech dysfluency). Dysfluencies will affect the phrase consciousness accuracy of an ASR, with the aid of increasing word addition, substitution and dismissal rates. In this work we focused on detecting and removing the prolongation, silent pauses and repetition to generate proper text sequence for the given stuttered speech signal. The stuttered speech recognition consists of two stages namely classification using LSTM and testing in ASR. The major phases of classification system are Re-sampling, Segmentation, Pre-Emphasis, Epoch Extraction and Classification. The current work is carried out in UCLASS Stuttering dataset using MATLAB with 4% to 6% increase in accuracy when compare with ANN and SVM.

Download Full-text

Introduction to Speech Recognition

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch007 ◽

2011 ◽

pp. 141-161

Author(s):

Sergio Suárez-Guerra ◽

Jose Luis Oropeza-Rodriguez

Keyword(s):

Artificial Intelligence ◽

Signal Processing ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Essential Information ◽

Science Field ◽

Applied Artificial Intelligence ◽

And Linguistics ◽

Successful Technology

This chapter presents the state-of-the-art automatic speech recognition (ASR) technology, which is a very successful technology in the computer science field, related to multiple disciplines such as the signal processing and analysis, mathematical statistics, applied artificial intelligence and linguistics, and so forth. The unit of essential information used to characterize the speech signal in the most widely used ASR systems is the phoneme. However, recently several researchers have questioned this representation and demonstrated the limitations of the phonemes, suggesting that ASR with better performance can be developed replacing the phoneme by triphones and syllables as the unit of essential information used to characterize the speech signal. This chapter presents an overview of the most successful techniques used in ASR systems together with some recently proposed ASR systems that intend to improve the characteristics of conventional ASR systems.

Download Full-text

Introduction to Speech Recognition

Advances in Audio and Speech Signal Processing ◽

10.4018/978-1-59904-132-2.ch011 ◽

2011 ◽

pp. 325-348 ◽

Cited By ~ 2

Author(s):

Sergio Suárez-Guerra ◽

Jose Luis Oropeza-Rodriguez

Keyword(s):

Artificial Intelligence ◽

Signal Processing ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Essential Information ◽

Science Field ◽

Applied Artificial Intelligence ◽

And Linguistics ◽

Successful Technology

This chapter presents the state-of-the-art automatic speech recognition (ASR) technology, which is a very successful technology in the computer science field, related to multiple disciplines such as the signal processing and analysis, mathematical statistics, applied artificial intelligence and linguistics, and so forth. The unit of essential information used to characterize the speech signal in the most widely used ASR systems is the phoneme. However, recently several researchers have questioned this representation and demonstrated the limitations of the phonemes, suggesting that ASR with better performance can be developed replacing the phoneme by triphones and syllables as the unit of essential information used to characterize the speech signal. This chapter presents an overview of the most successful techniques used in ASR systems together with some recently proposed ASR systems that intend to improve the characteristics of conventional ASR systems.

Download Full-text

Automatic speech signal segmentation based on the innovation adaptive filter

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2014-0019 ◽

2014 ◽

Vol 24 (2) ◽

pp. 259-270 ◽

Cited By ~ 9

Author(s):

Ryszard Makowski ◽

Robert Hossa

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Adaptive Filter ◽

Speech Signal ◽

Detection Efficiency ◽

Difficult Problem ◽

Second Order ◽

Signal Segmentation ◽

Schur Algorithm ◽

Starting Point

Abstract Speech segmentation is an essential stage in designing automatic speech recognition systems and one can ﬁnd several algorithms proposed in the literature. It is a difﬁcult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006), and it is a modiﬁed version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefﬁcients of an innovation adaptive ﬁlter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics deﬁned on the mel spectrum determined from the reﬂection coefﬁcients. The paper presents the structure of the algorithm, deﬁnes its properties, lists parameter values, describes detection efﬁciency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.

Download Full-text

Voice-Based Speaker Identification and Verification

Advances in Library and Information Science - Handbook of Research on Knowledge and Organization Systems in Library and Information Science ◽

10.4018/978-1-7998-7258-0.ch016 ◽

2021 ◽

pp. 288-316

Author(s):

Keshav Sinha ◽

Rasha Subhi Hameed ◽

Partha Paul ◽

Karan Pratap Singh

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Reference Model ◽

Speaker Identification ◽

Recognition System ◽

Speech Recognition System ◽

Primary Focus ◽

Dynamic Time ◽

Dynamic Time Wrapping

In recent years, the advancement in voice-based authentication leads in the field of numerous forensic voice authentication technology. For verification, the speech reference model is collected from various open-source clusters. In this chapter, the primary focus is on automatic speech recognition (ASR) technique which stores and retrieves the data and processes them in a scalable manner. There are the various conventional techniques for speech recognition such as BWT, SVD, and MFCC, but for automatic speech recognition, the efficiency of these conventional recognition techniques degrade. So, to overcome this problem, the authors propose a speech recognition system using E-SVD, D3-MFCC, and dynamic time wrapping (DTW). The speech signal captures its important qualities while discarding the unimportant and distracting features using D3-MFCC.

Download Full-text

The speech signal segmentation algorithm using pitch synchronous analysis

Open Computer Science ◽

10.1515/comp-2017-0001 ◽

2017 ◽

Vol 7 (1) ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Yedilkhan Amirgaliyev ◽

Minsoo Hahn ◽

Timur Mussabayev

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Energy Function ◽

Speech Signal ◽

Spectral Characteristics ◽

Speech Segmentation ◽

Signal Energy ◽

Signal Segmentation ◽

Pitch Frequency ◽

Pitch Synchronous Analysis

Abstract Parameterization of the speech signal using the algorithms of analysis synchronized with the pitch frequency is discussed. Speech parameterization is performed by the average number of zero transitions function and the signal energy function. Parameterization results are used to segment the speech signal and to isolate the segments with stable spectral characteristics. Segmentation results can be used to generate a digital voice pattern of a person or be applied in the automatic speech recognition. Stages needed for continuous speech segmentation are described.

Download Full-text

Text-Guided Automatic Analysis of the Speech Signal as a Possible Means of Approximating Automatic Speech Recognition

Proceedings of the seventh International Congress of Phonetic Sciences / Actes du Septième Congrès international des sciences phonétiques ◽

10.1515/9783110814750-169 ◽

1972 ◽

pp. 1179-1183

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Automatic Analysis

Download Full-text

Comparison of several speech signal feature parameters for automatic speech recognition

Speech Communication ◽

10.1016/0167-6393(89)90016-2 ◽

1989 ◽

Vol 8 (4) ◽

pp. 347-353 ◽

Cited By ~ 1

Author(s):

Momir Partalo ◽

Zlatko Sijerčić

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Feature Parameters

Download Full-text

Introduction to Speech Recognition

Pattern Recognition Technologies and Applications ◽

10.4018/978-1-59904-807-9.ch005 ◽

2008 ◽

pp. 90-109

Author(s):

Sergio Suárez-Guerra ◽

Jose Luis Oropeza-Rodriguez

Keyword(s):

Artificial Intelligence ◽

Signal Processing ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Signal ◽

Essential Information ◽

Science Field ◽

Applied Artificial Intelligence ◽

And Linguistics ◽

Successful Technology

This chapter presents the state-of-the-art automatic speech recognition (ASR) technology, which is a very successful technology in the computer science field, related to multiple disciplines such as the signal processing and analysis, mathematical statistics, applied artificial intelligence and linguistics, and so forth. The unit of essential information used to characterize the speech signal in the most widely used ASR systems is the phoneme. However, recently several researchers have questioned this representation and demonstrated the limitations of the phonemes, suggesting that ASR with better performance can be developed replacing the phoneme by triphones and syllables as the unit of essential information used to characterize the speech signal. This chapter presents an overview of the most successful techniques used in ASR systems together with some recently proposed ASR systems that intend to improve the characteristics of conventional ASR systems.

Download Full-text