Vocal tract length invariant features for automatic speech recognition

IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. ◽

10.1109/asru.2005.1566473 ◽

2005 ◽

Author(s):

A. Mertins ◽

J. Rademacher

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Tract Length ◽

Invariant Features

Download Full-text

Enhancing vocal tract length normalization with elastic registration for automatic speech recognition

10.21437/interspeech.2012-393 ◽

2012 ◽

Author(s):

Florian Müller ◽

Alfred Mertins

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Elastic Registration ◽

Tract Length ◽

Vocal Tract Length Normalization

Download Full-text

Low‐dimensional, auditory feature vectors that improve vocal‐tract‐length normalization in automatic speech recognition

The Journal of the Acoustical Society of America ◽

10.1121/1.2932824 ◽

2008 ◽

Vol 123 (5) ◽

pp. 3066-3066 ◽

Author(s):

Jessica J. Monaghan ◽

Christian Feldbauer ◽

Tom C. Walters ◽

Roy D. Patterson

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Tract Length ◽

Feature Vectors ◽

Auditory Feature ◽

Vocal Tract Length Normalization ◽

Low Dimensional

Download Full-text

A novel feature transformation for vocal tract length normalization in automatic speech recognition

IEEE Transactions on Speech and Audio Processing ◽

10.1109/89.725321 ◽

1998 ◽

Vol 6 (6) ◽

pp. 549-557 ◽

Author(s):

T. Claes ◽

I. Dologlou ◽

L. ten Bosch ◽

D. van Compernolle

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Feature Transformation ◽

Tract Length ◽

Vocal Tract Length Normalization

Download Full-text

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462037 ◽

2018 ◽

Author(s):

Wei-Ning Hsu ◽

James Glass

Keyword(s):

Speech Recognition ◽

Unsupervised Learning ◽

Automatic Speech Recognition ◽

Invariant Features

Download Full-text

Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition

2014 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2014.7078563 ◽

2014 ◽

Author(s):

Romain Serizel ◽

Diego Giuliani

Keyword(s):

Speech Recognition ◽

Vocal Tract ◽

Download Full-text

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy

Active Media Technology - Lecture Notes in Computer Science ◽

10.1007/3-540-45336-9_17 ◽

2001 ◽

pp. 134-143 ◽

Author(s):

Chun Keung Chau ◽

Chak Shun Lai ◽

Bertram Emil Shi

Keyword(s):

Speech Recognition ◽

Vocal Tract ◽

Tract Length ◽

Model Based ◽

Vocal Tract Length Normalization ◽

Interactive Toy

Download Full-text

Frequency-Warping Invariant Features for Automatic Speech Recognition

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1661453 ◽

2006 ◽

Author(s):

A. Mertins ◽

J. Rademacher

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Frequency Warping ◽

Invariant Features

Download Full-text

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00216-5 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Masoud Geravanchizadeh ◽

Elnaz Forouhandeh ◽

Meysam Bashirpour

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Recognition System ◽

Emotional States ◽

Emotional Speech ◽

Automatic Speech Recognition System ◽

Frequency Warping

AbstractThe performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world environment, it is necessary to take account of the emotional states of speech in the performance of the automatic speech recognition system. Limited works have been performed in the field of emotion-affected speech recognition and so far, most of the researches have focused on the classification of speech emotions. In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system. For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used. To achieve this goal, frequency warping is applied to the filterbank and/or discrete-cosine transform domain(s) in the feature extraction process of the automatic speech recognition system. The warping process is conducted in a way to normalize the emotional feature components and make them close to their corresponding neutral feature components. The performance of the proposed system is evaluated in neutrally trained/emotionally tested conditions for different speech features and emotional states (i.e., Anger, Disgust, Fear, Happy, and Sad). In this system, frequency warping is employed for different acoustical features. The constructed emotion-affected speech recognition system is based on the Kaldi automatic speech recognition with the Persian emotional speech database and the crowd-sourced emotional multi-modal actors dataset as the input corpora. The experimental simulations reveal that, in general, the warped emotional features result in better performance of the emotion-affected speech recognition system as compared with their unwarped counterparts. Also, it can be seen that the performance of the speech recognition using the deep neural network-hidden Markov model outperforms the system employing the hybrid with the Gaussian mixture model.

Download Full-text

Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition

10.21437/interspeech.2008-449 ◽

2008 ◽

Author(s):

D. R. Sanand ◽

V. Balaji ◽

Rani R. Sandhya ◽

S. Umesh

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Centre Of Gravity ◽

Invariant Features

Download Full-text

Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v39p118 ◽

2016 ◽

Vol 39 (2) ◽

pp. 105-109 ◽

Author(s):

Swapnanil Gogoi ◽

◽

Utpal Bhattacharjee

Keyword(s):

Speech Recognition ◽

Vocal Tract ◽

Recognition Performance ◽

Tract Length ◽

Vocal Tract Length Normalization ◽

Download Full-text