Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment

Author(s):  
Danwei Cai ◽  
Xiaoyi Qin ◽  
Ming Li
Author(s):  
Johan Rohdin ◽  
Anna Silnova ◽  
Mireia Diez ◽  
Oldrch Plchot ◽  
Pavel Matejka ◽  
...  

2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


Author(s):  
DEBASHISH DEV MISHRA ◽  
UTPAL BHATTACHARJEE ◽  
SHIKHAR KUMAR SARMA

The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognition systems is their unsatisfactory performance in noisy environments. In this experimental research, we have studied a combination of Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Cepstral Mean Normalization (CMN) techniques for speech enhancement. Our system uses a Gaussian Mixture Models (GMM) classifier and is implemented under MATLAB®7 programming environment. The process involves the use of speaker data for both training and testing. The data used for testing is matched up against a speaker model, which is trained with the training data using GMM modeling. Finally, experiments are carried out to test the new model for ASR given limited training data and with differing levels and types of realistic background noise. The results have demonstrated the robustness of the new system.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Lei Lei ◽  
She Kun

Today, more and more people have benefited from the speaker recognition. However, the accuracy of speaker recognition often drops off rapidly because of the low-quality speech and noise. This paper proposed a new speaker recognition model based on wavelet packet entropy (WPE), i-vector, and cosine distance scoring (CDS). In the proposed model, WPE transforms the speeches into short-term spectrum feature vectors (short vectors) and resists the noise. I-vector is generated from those short vectors and characterizes speech to improve the recognition accuracy. CDS fast compares with the difference between two i-vectors to give out the recognition result. The proposed model is evaluated by TIMIT speech database. The results of the experiments show that the proposed model can obtain good performance in clear and noisy environment and be insensitive to the low-quality speech, but the time cost of the model is high. To reduce the time cost, the parallel computation is used.


Sign in / Sign up

Export Citation Format

Share Document