End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

Despite the progress of deep neural networks over the last decade, the state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance. Methods to improve noise robustness usually include adding components to the recognition system that often need optimization. For this reason, data augmentation of the input features derived from the Short-Time Fourier Transform (STFT) has become a popular approach. However, for many speech processing tasks, there is an evidence that the combination of STFT-based and Hilbert–Huang transform (HHT)-based features improves the overall performance. The Hilbert spectrum can be obtained using adaptive mode decomposition (AMD) techniques, which are noise-robust and suitable for non-linear and non-stationary signal analysis. In this study, we developed a DeepSpeech2-based recognition system by adding a combination of STFT and HHT spectrum-based features. We propose several ways to combine those features at different levels of the neural network. All evaluations were performed using the WSJ and CHiME-4 databases. Experimental results show that combining STFT and HHT spectra leads to a 5–7% relative improvement in noisy speech recognition.

Download Full-text

Noisy speech recognition system for car cellular phone

Gateway to 21st Century Communications Village. VTC 1999-Fall. IEEE VTS 50th Vehicular Technology Conference (Cat. No.99CH36324) ◽

10.1109/vetecf.1999.797332 ◽

1999 ◽

Author(s):

R. Sankar ◽

N.S. Sethi

Keyword(s):

Speech Recognition ◽

Cellular Phone ◽

Recognition System ◽

Speech Recognition System ◽

Noisy Speech ◽

Noisy Speech Recognition

Download Full-text

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Sensors ◽

10.3390/s20082326 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2326

Author(s):

Ayesha Pervaiz ◽

Fawad Hussain ◽

Huma Israr ◽

Muhammad Ali Tahir ◽

Fawad Riasat Raja ◽

...

Keyword(s):

Machine Learning ◽

Speech Recognition ◽

Speech Processing ◽

Recognition System ◽

Training Data ◽

Machine Learning Techniques ◽

Noise Robustness ◽

Human Computer Interfaces ◽

Learning Techniques ◽

Speech Corpora

The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. However, none of them catered to noise in the same. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques. A novel technique is proposed for noise robustness by augmenting noise in training data. Our proposed technique is tested on clean and noisy data along with locally generated data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.

Download Full-text