A study on dimensions of feature space for text-independent speaker verification systems

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.

Download Full-text

Performance improvement of text-independent speaker verification systems based on histogram enhancement in noisy environments

10.21437/interspeech.2008-503 ◽

2008 ◽

Author(s):

C. H. Kwon ◽

J. K. Choi ◽

Eliathamby Ambikairajah

Keyword(s):

Performance Improvement ◽

Speaker Verification ◽

Noisy Environments ◽

Verification Systems ◽

Text Independent Speaker Verification

Download Full-text

Hierarchical pattern classification for high performance text-independent speaker verification systems

Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.1994.389331 ◽

2002 ◽

Cited By ~ 4

Author(s):

J. Sorensen ◽

M. Savic

Keyword(s):

Pattern Classification ◽

High Performance ◽

Speaker Verification ◽

Verification Systems ◽

Hierarchical Pattern ◽

Text Independent Speaker Verification ◽

Performance Text

Download Full-text

Score Normalization for Text-Independent Speaker Verification Systems

Digital Signal Processing ◽

10.1006/dspr.1999.0360 ◽

2000 ◽

Vol 10 (1-3) ◽

pp. 42-54 ◽

Cited By ~ 430

Author(s):

Roland Auckenthaler ◽

Michael Carey ◽

Harvey Lloyd-Thomas

Keyword(s):

Speaker Verification ◽

Score Normalization ◽

Verification Systems ◽

Text Independent Speaker Verification

Download Full-text

Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions

Circuits Systems and Signal Processing ◽

10.1007/s00034-019-01028-x ◽

2019 ◽

Vol 38 (8) ◽

pp. 3778-3801 ◽

Cited By ~ 2

Author(s):

Rohan Kumar Das ◽

S. R. Mahadeva Prasanna

Keyword(s):

Speaker Verification ◽

Verification Systems ◽

Text Independent Speaker Verification

Download Full-text

A kernel trick for sequences applied to text-independent speaker verification systems

Pattern Recognition ◽

10.1016/j.patcog.2007.01.011 ◽

2007 ◽

Vol 40 (8) ◽

pp. 2315-2324 ◽

Cited By ~ 13

Author(s):

Johnny Mariéthoz ◽

Samy Bengio

Keyword(s):

Speaker Verification ◽

Kernel Trick ◽

Verification Systems ◽

Text Independent Speaker Verification

Download Full-text

Automatic text-independent speaker verification using convolutional deep belief network

Computer Optics ◽

10.18287/2412-6179-co-621 ◽

2020 ◽

Vol 44 (4) ◽

pp. 596-605 ◽

Cited By ~ 3

Author(s):

I.A. Rakhmanenko ◽

A.A. Shelupanov ◽

E.Y. Kostyuchenko

Keyword(s):

Speaker Verification ◽

Experimental Studies ◽

Deep Belief Network ◽

Speech Corpus ◽

Belief Network ◽

Audio Recordings ◽

Speech Features ◽

Verification Systems ◽

Automatic Text ◽

Text Independent Speaker Verification

This paper is devoted to the use of the convolutional deep belief network as a speech feature extractor for automatic text-independent speaker verification. The paper describes the scope and problems of automatic speaker verification systems. Types of modern speaker verification systems and types of speech features used in speaker verification systems are considered. The structure and learning algorithm of convolutional deep belief networks is described. The use of speech features extracted from three layers of a trained convolution deep belief network is proposed. Experimental studies of the proposed features were performed on two speech corpora: own speech corpus including audio recordings of 50 speakers and TIMIT speech corpus including audio recordings of 630 speakers. The accuracy of the proposed features was assessed using different types of classifiers. Direct use of these features did not increase the accuracy compared to the use of traditional spectral speech features, such as mel-frequency cepstral coefficients. However, the use of these features in the classifiers ensemble made it possible to achieve a reduction of the equal error rate to 0.21% on 50-speaker speech corpus and to 0.23% on the TIMIT speech corpus.

Download Full-text