scholarly journals Performance of speaker recognition system using shifted mfcc, delta spectral cepstral coefficient (DSCC) and Fuzzy techniques

2018 ◽  
Vol 7 (2.8) ◽  
pp. 278
Author(s):  
Priyanka Bansal ◽  
Syed Akhtar Imam

Speech and speaker recognition systems are biometric inspired systems which are having scope in various online and offline applications. In case of biometric we ponder the variability of speech signal due to the presence of noise which greatly degrades the efficiency of Automatic Speaker Recognition (ASR) in real-world environmental circumstances. Real world speech signal is degraded by different types of noise signals like background noise, interference noise and crosstalk noise. In this paper, we have used Delta Spectrum Cepstrum Coefficient (DSCC) and Shifted MFCC with fuzzy modeling techniques to rectify the deed of ASR even in a noisy surrounding with the help of upgraded speech information which is present at high frequency in the spectral domain. The combination of fuzzy modeling and DSCC creates a firm cumulative algorithm which has reasonably high robustness to noise. Experimental results show that accuracy has enhanced by 10-20% even at 5-8dB SNR in the presence of background noise or turbulent environmental condition or in the presence of white noise.Thus proposed model has improved maturity level in comparison to obsolete methods.

Author(s):  
Tumisho Billson Mokgonyane ◽  
Tshephisho Joseph Sefara ◽  
Thipe Isaiah Modipa ◽  
Madimetja Jonas Manamela

Author(s):  
Satyanand Singh

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.


Author(s):  
P. Chakraborty ◽  
F. Ahmed ◽  
Md. Monirul Kabir ◽  
Md. Shahjahan ◽  
Kazuyuki Murase

Sign in / Sign up

Export Citation Format

Share Document