Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.

Download Full-text

A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System

2019 International Conference on Field-Programmable Technology (ICFPT) ◽

10.1109/icfpt47387.2019.00033 ◽

2019 ◽

Author(s):

Mingjun Jiao ◽

Yue Li ◽

Pengbo Dang ◽

Wei Cao ◽

Lingli Wang

Keyword(s):

Speaker Recognition ◽

High Performance ◽

Recognition System ◽

End To End ◽

Accelerator Design

Download Full-text

MFCC AND CMN BASED SPEAKER RECOGNITION IN NOISY ENVIRONMENT

International Journal of Electronics Signals and Systems ◽

10.47893/ijess.2013.1137 ◽

2013 ◽

pp. 48-51

Author(s):

DEBASHISH DEV MISHRA ◽

UTPAL BHATTACHARJEE ◽

SHIKHAR KUMAR SARMA

Keyword(s):

Speaker Recognition ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Training Data ◽

Noisy Environment ◽

Noisy Environments ◽

Mel Frequency Cepstral Coefficients ◽

Automatic Speaker Recognition ◽

Cepstral Mean Normalization ◽

Testing Environments

The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognition systems is their unsatisfactory performance in noisy environments. In this experimental research, we have studied a combination of Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Cepstral Mean Normalization (CMN) techniques for speech enhancement. Our system uses a Gaussian Mixture Models (GMM) classifier and is implemented under MATLAB®7 programming environment. The process involves the use of speaker data for both training and testing. The data used for testing is matched up against a speaker model, which is trained with the training data using GMM modeling. Finally, experiments are carried out to test the new model for ASR given limited training data and with differing levels and types of realistic background noise. The results have demonstrated the robustness of the new system.

Download Full-text

Comparative study between different classifiers based speaker recognition system using MFCC for noisy environment

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) ◽

10.1109/icgciot.2015.7380600 ◽

2015 ◽

Cited By ~ 2

Author(s):

Abhilasha Sukhwal ◽

Mahendra Kumar

Keyword(s):

Comparative Study ◽

Speaker Recognition ◽

Recognition System ◽

Noisy Environment

Download Full-text

Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring

Journal of Electrical and Computer Engineering ◽

10.1155/2017/1735698 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Lei Lei ◽

She Kun

Keyword(s):

Speaker Recognition ◽

Wavelet Packet ◽

Time Cost ◽

Noisy Environment ◽

Proposed Model ◽

Speech Database ◽

Recognition Result ◽

Spectrum Feature ◽

The Difference ◽

Cosine Distance

Today, more and more people have benefited from the speaker recognition. However, the accuracy of speaker recognition often drops off rapidly because of the low-quality speech and noise. This paper proposed a new speaker recognition model based on wavelet packet entropy (WPE), i-vector, and cosine distance scoring (CDS). In the proposed model, WPE transforms the speeches into short-term spectrum feature vectors (short vectors) and resists the noise. I-vector is generated from those short vectors and characterizes speech to improve the recognition accuracy. CDS fast compares with the difference between two i-vectors to give out the recognition result. The proposed model is evaluated by TIMIT speech database. The results of the experiments show that the proposed model can obtain good performance in clear and noisy environment and be insensitive to the low-quality speech, but the time cost of the model is high. To reduce the time cost, the parallel computation is used.

Download Full-text