Segmental Analysis of Speech Signal for Robust Speaker Recognition System

Author(s):  
Rupali V. Pawar ◽  
R. M. Jalnekar ◽  
J. S. Chitode
Author(s):  
Dea Sifana Ramadhina ◽  
Rita Magdalena ◽  
Sofia Saidah

Voice is one of the parameters in the identification process of a person. Through the voice, information will be obtained such as gender, age, and even the identity of the speaker. Speaker recognition is a method to narrow down crimes and frauds committed by voice. So that it will minimize the occurrence of faking one's identity. The Method of Mel Frequency Cepstrum Coefficient (MFCC) can be used in the speech recognition system. The process of feature extraction of speech signal using MFCC will produce acoustic speech signal. The classification, Hidden Markov Models (HMM) is used to match unidentified speaker’s voice with the voices in database. In this research, the system is used to verify the speaker, namely 15 text dependent in Indonesian. On testing the speaker with the same as database, the highest accuracy is 99,16%.


2019 ◽  
Vol 58 ◽  
pp. 403-421 ◽  
Author(s):  
Ondřej Novotný ◽  
Oldřich Plchot ◽  
Ondřej Glembek ◽  
Jan “Honza” Černocký ◽  
Lukáš Burget

2018 ◽  
Vol 7 (2.8) ◽  
pp. 278
Author(s):  
Priyanka Bansal ◽  
Syed Akhtar Imam

Speech and speaker recognition systems are biometric inspired systems which are having scope in various online and offline applications. In case of biometric we ponder the variability of speech signal due to the presence of noise which greatly degrades the efficiency of Automatic Speaker Recognition (ASR) in real-world environmental circumstances. Real world speech signal is degraded by different types of noise signals like background noise, interference noise and crosstalk noise. In this paper, we have used Delta Spectrum Cepstrum Coefficient (DSCC) and Shifted MFCC with fuzzy modeling techniques to rectify the deed of ASR even in a noisy surrounding with the help of upgraded speech information which is present at high frequency in the spectral domain. The combination of fuzzy modeling and DSCC creates a firm cumulative algorithm which has reasonably high robustness to noise. Experimental results show that accuracy has enhanced by 10-20% even at 5-8dB SNR in the presence of background noise or turbulent environmental condition or in the presence of white noise.Thus proposed model has improved maturity level in comparison to obsolete methods.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Dongdong Li ◽  
Yingchun Yang ◽  
Weihui Dai

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.


Sign in / Sign up

Export Citation Format

Share Document