scholarly journals A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition

Author(s):  
Risanuri Hidayat ◽  
◽  
Anggun Winursito ◽  

Research on the current speech recognition system leads to the creation of a noise-resistant system. The Mel Frequency Cepstral Coefficients (MFCC) extraction method becomes a popular method in the speech recognition system. In this paper, the MFCC's weakness of noise interference is the main reason underlies the accomplishment of a robust speech recognition system. Development was carried out by improving the denoising performance using a wavelet transform. Modifications were carried out by analyzing the weakness of the wavelet denoising process on the recognition system using the MFCC method. The analysis was conducted at one of the MFCC stages, the Fast Fourier Transform (FFT) stage. The proposed method was conducted by performing the denoising process using Wavelet only on the noise-related data based on the FFT process' analysis results. The study utilized speech data in the form of eleven isolated words in English added with noise with several different characteristics. Results showed that the proposed method was capable of generating a better accuracy than conventional wavelet denoising methods on the signal to noise ratio (SNR) of 10dB, 15dB, and 20dB using a Fejer Korovkin 6 wavelet type. The highest accuracy increase of the proposed method was in signal to noise ratio (SNR) of 15dB with a rise of 4.63%, followed by a 3.96% increase at 20dB intensity, and 2.3% at 10dB intensity. The performance of the proposed method is then compared with other methods. The results show that the proposed method has the best performance on clean speech and noisy speech at SNR intensities of 10dB, 15dB, and 20dB.

2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


2010 ◽  
Author(s):  
Gökhan Ince ◽  
Kazuhiro Nakadai ◽  
Tobias Rodemann ◽  
Hiroshi Tsujino ◽  
Jun-ichi Imura

2005 ◽  
Vol 17 (4) ◽  
pp. 447-455 ◽  
Author(s):  
Shingo Yoshizawa ◽  
◽  
Noboru Hayasaka ◽  
Naoya Wada ◽  
Yoshikazu Miyanaga

This paper presents a VLSI architecture for a robust speech recognition system that enables high-speed, low-power operation. The proposed architecture improves recognition accuracy in noisy environments and realizes short-time response by implementing parallel and pipeline processing. We demonstrate improved processing time and power consumption by evaluating circuit performance in 0.25-μm CMOS technology. We also detail a verification platform that helps users implement our hardware-based robust speech recognition system. The verification platform facilitates software conversion to hardware and promptly provides testing environments on field-programmable gate arrays.


Sign in / Sign up

Export Citation Format

Share Document