Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector

2012 ◽  
Vol 15 (3) ◽  
pp. 351-364 ◽  
Author(s):  
A. K. Sarkar ◽  
S. Umesh
Author(s):  
Walid Hussein ◽  
Sarah Akram Essmat ◽  
Nestor Yoma ◽  
Fernando Huenupán

This paper proposes and evaluates classifiers based on Vocal Tract Length Normalization (VTLN) in a text-dependent speaker verification (SV) task with short testing utterances. This type of tasks is important in commercial applications and is not easily addressed with methods designed for long utterances such as JFA and i-Vectors. In contrast, VTLN is a speaker compensation scheme that can lead to significant improvements in speech recognition accuracy with just a few seconds of speech samples. A novel scheme to generate new classifiers is employed by incorporating the observation vector sequence compensated with VTLN. The modified sequence of feature vectors and the corresponding warping factors are used to generate classifiers whose scores are combined by a Support Vector Machine (SVM) based SV system. The proposed scheme can provide an average reduction in EER equal to 14% when compared with the baseline system based on the likelihood of observation vectors.


2012 ◽  
Author(s):  
Hiroaki Hatano ◽  
Tatsuya Kitamura ◽  
Hironori Takemoto ◽  
Parham Mokhtari ◽  
Kiyoshi Honda ◽  
...  

2018 ◽  
Vol 29 (1) ◽  
pp. 565-582
Author(s):  
T.R. Jayanthi Kumari ◽  
H.S. Jayanna

Abstract In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.


2007 ◽  
Vol 121 (2) ◽  
pp. EL90-EL95 ◽  
Author(s):  
Marie Rivenez ◽  
Christopher J. Darwin ◽  
Léonore Bourgeon ◽  
Anne Guillaume

Sign in / Sign up

Export Citation Format

Share Document