scholarly journals An ensemble technique for speech recognition in noisy environments

Author(s):  
Imad Qasim Habeeb ◽  
Tamara Z. Fadhil ◽  
Yaseen Naser Jurn ◽  
Zeyad Qasim Habeeb ◽  
Hanan Najm Abdulkhudhur

<span>Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text. ASR systems often produce poor accuracy for the noisy speech signal. Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy. The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters. The speech features of these copies differ slightly in order to extract different texts from them when processed by the ASR system. Thus, the best among these texts can be elected as final ASR output. The ensemble technique was compared with three related current noise reduction techniques in terms of CER and WER. The test results were encouraging and showed a relatively decreased by 16.61% and 11.54% on CER and WER compared with the best current technique. ASR field will benefit from the contribution of this research to increase the recognition accuracy of a human speech in the presence of background noise.</span>

Author(s):  
Poonam Bansal ◽  
Amita Dev ◽  
Shail Jain

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.


2001 ◽  
Vol 34 (1-2) ◽  
pp. 3-12 ◽  
Author(s):  
Joerg Bitzer ◽  
Klaus Uwe Simmer ◽  
Karl-Dirk Kammeyer

Author(s):  
Ziad A. Alqadi ◽  
Sayel Shareef Rimawi

The stage of extracting the features of the speech file is one of the most important stages of building a system for identifying a person through the use of his voice. Accordingly, the choice of the method of extracting speech features is an important process because of its subsequent negative or positive effects on the speech recognition system. In this paper research we will analyze the most popular methods of speech signal features extraction: LPC, Kmeans clustering, WPT decomposition and MLBP methods. These methods will be implemented and tested using various speech files. The amplitude and sampling frequency will be changed to see the affects of changing on the extracted features. Depending on the results of analysis some recommendations will be given.


2002 ◽  
Vol 14 (02) ◽  
pp. 55-66
Author(s):  
CHENG-CHI TAI ◽  
CHIH-HSING CHANG ◽  
CHUAN-CHING TAN ◽  
TSUNG-WEN HUANG ◽  
CHING-CHAU SU

In this paper, we present a noise reduction technique for hearing-aid systems. The proposed algorithm adopted adaptive beamformer with combination of subband filtering technique. The structure of conventional hearing aids is relatively simple. They amplify ambient sounds that include speech signal as well as noise. Because noise and human speech signal are amplified at the same time, hearing-aid users can't clearly hear speech signal in noisy environment. The direction of sound can be used to discriminate speech signal from noise by combining adaptive noise canceller and adaptive beamformer. We have developed a system that based on the constrained adaptive noise canceller to preserve speech signal from straight ahead and minimize background noise arriving from other directions. This system also uses subband filtering technique to reduce the requirement for computation and enhance the flexibility of the system. The performance of this system is illustrated using simulated and real-world noises. The results show that the developed system can reserve the right ahead speech signal and substantially reject noises from other directions.


Author(s):  
Poonam Bansal ◽  
Amita Dev ◽  
Shail Jain

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.


Sign in / Sign up

Export Citation Format

Share Document