Feature compensation based on independent noise estimation for robust speech recognition

AbstractIn this paper, we propose a novel feature compensation algorithm based on independent noise estimation, which employs a Gaussian mixture model (GMM) with fewer Gaussian components to rapidly estimate the noise parameters from the noisy speech and monitor the noise variation. The estimated noise model is combined with a GMM with sufficient Gaussian mixtures to produce the noisy GMM for the clean speech estimation so that parameters are updated if and only if the noise variation occurs. Experimental results show that the proposed algorithm can achieve the recognition accuracy similar to that of the traditional GMM-based feature compensation, but significantly reduces the computational cost, and thereby is more useful for resource-limited mobile devices.

Download Full-text

MODIFIED QUANTILE BASED ONLINE NOISE ADAPTATION METHOD FOR A ROBUST SPEECH RECOGNITION INTERFACE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005661 ◽

2007 ◽

Vol 21 (04) ◽

pp. 759-772

Author(s):

HEUNGKYU LEE ◽

JUNE KIM

Keyword(s):

Speech Recognition ◽

Estimation Method ◽

Estimation Procedure ◽

Gaussian Mixture ◽

Noise Estimation ◽

Noise Model ◽

Robust Speech Recognition ◽

Model Adaptation ◽

Speech Database ◽

Adaptation Method

This paper proposes the online noise model adaptation technique using the modified quantile based noise estimation method for feature compensation of noisy speech that is based on the Gaussian mixture model for a robust speech recognition interface in real car environments. The proposed method is designed for an active online model adaptation method to cope with varying environmental noise conditions, and enhance speech recognition accuracy. This method is compensated on logarithmic filter-bank energies domain, and modified quantile based noise estimation method using beta-order harmonic mean is employed to the online noise estimation procedure. Experimental evaluation is done by using Aurora 2 speech database, and robust results were obtained than from other comparative algorithms.

Download Full-text

Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition

10.21437/interspeech.2014-162 ◽

2014 ◽

Author(s):

M. J. Alam ◽

Patrick Kenny ◽

Pierre Dumouchel ◽

Douglas O'Shaughnessy

Keyword(s):

Speech Recognition ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Noise Spectrum ◽

Robust Speech Recognition ◽

Spectrum Estimation ◽

Model Based

Download Full-text

Emotional Speech Recognition Based on Weighted Distance Optimization System

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420500275 ◽

2020 ◽

Vol 34 (11) ◽

pp. 2050027

Author(s):

Mona Nagy ElBedwehy ◽

G. M. Behery ◽

Reda Elbarougy

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Gaussian Mixture ◽

Research Field ◽

Superior Performance ◽

Emotional States ◽

Emotional Speech ◽

Weighted Distance ◽

Human Emotion ◽

Emotional Speech Recognition

Human emotion plays a major role in expressing their feelings through speech. Emotional speech recognition is an important research field in the human–computer interaction. Ultimately, the endowing machines that perceive the users’ emotions will enable a more intuitive and reliable interaction.The researchers presented many models to recognize the human emotion from the speech. One of the famous models is the Gaussian mixture model (GMM). Nevertheless, GMM may sometimes have one or more of its components as ill-conditioned or singular covariance matrices when the number of features is high and some features are correlated. In this research, a new system based on a weighted distance optimization (WDO) has been developed for recognizing the emotional speech. The main purpose of the WDO system (WDOS) is to address the GMM shortcomings and increase the recognition accuracy. We found that WDOS has achieved considerable success through a comparative study of all emotional states and the individual emotional state characteristics. WDOS has a superior performance accuracy of 86.03% for the Japanese language. It improves the Japanese emotion recognition accuracy by 18.43% compared with GMM and [Formula: see text]-mean.

Download Full-text