Robust speaker recognition based on level-building voice activity detection

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.

Download Full-text

Comparative Analysis of Speaker Recognition System Based on Voice Activity Detection Technique, MFCC and PLP Features

Intelligent Computing Techniques for Smart Energy Systems - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-15-0214-9_82 ◽

2019 ◽

pp. 781-787 ◽

Cited By ~ 2

Author(s):

Akanksha Kalia ◽

Shikar Sharma ◽

Saurabh Kumar Pandey ◽

Vinay Kumar Jadoun ◽

Madhulika Das

Keyword(s):

Comparative Analysis ◽

Speaker Recognition ◽

Recognition System ◽

Voice Activity Detection ◽

Detection Technique ◽

Activity Detection ◽

Voice Activity

Download Full-text

Artificial neural networks for voice activity detection Technology

Journal of Advanced Sciences and Engineering Technologies ◽

10.32441/jaset.05.01.03 ◽

2022 ◽

Vol 5 (1) ◽

pp. 23-31

Author(s):

Al smadi Takialddin ◽

Ahmed Handam

Keyword(s):

Speaker Recognition ◽

Recognition System ◽

Detection Algorithm ◽

Voice Activity Detection ◽

Activity Detection ◽

Probability Of Error ◽

Detection Technology ◽

Proposed Modification ◽

Voice Activity

Currently, the direction of voice biometrics is actively developing, which includes two related tasks of recognizing the speaker by voice: the verification task, which consists in determining the speaker's personality, and the identification task, which is responsible for checking the belonging of the phonogram to a particular speaker. An open question remains related to improving the quality of the verification identification algorithms in real conditions and reducing the probability of error. In this work study Voice activity detection algorithm is proposed, which is a modification of the algorithm based on pitch statistics; VAD is investigated as a component of a speaker recognition system by voice, and therefore the main purpose of its work is to improve the quality of the system as a whole. On the example of the proposed modification of the VAD algorithm and the energy-based VAD algorithm, the analysis of the influence of the choice on the quality of the speaker recognition system is carried out.

Download Full-text

A study of voice activity detection techniques for NIST speaker recognition evaluations

Computer Speech & Language ◽

10.1016/j.csl.2013.07.003 ◽

2014 ◽

Vol 28 (1) ◽

pp. 295-313 ◽

Cited By ~ 57

Author(s):

Man-Wai Mak ◽

Hon-Bill Yu

Keyword(s):

Speaker Recognition ◽

Voice Activity Detection ◽

Activity Detection ◽

Detection Techniques ◽

Voice Activity

Download Full-text

The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2011.2109379 ◽

2011 ◽

Vol 19 (7) ◽

pp. 2026-2038 ◽

Cited By ~ 29

Author(s):

Iain McCowan ◽

David Dean ◽

Mitchell McLaren ◽

Robert Vogt ◽

Sridha Sridharan

Keyword(s):

Speaker Recognition ◽

Voice Activity Detection ◽

Phase Spectrum ◽

Activity Detection ◽

Delta Phase ◽

Voice Activity

Download Full-text

A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement

The Scientific World JOURNAL ◽

10.1155/2014/723643 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Yan Zhang ◽

Zhen-min Tang ◽

Yan-ping Li ◽

Yang Luo

Keyword(s):

Speech Enhancement ◽

Speaker Recognition ◽

Wiener Filter ◽

Voice Activity Detection ◽

Activity Detection ◽

Hierarchical Framework ◽

Framework Approach ◽

Noisy Conditions ◽

Voice Activity ◽

Timit Database

Accurate and effective voice activity detection (VAD) is a fundamental step for robust speech or speaker recognition. In this study, we proposed a hierarchical framework approach for VAD and speech enhancement. The modified Wiener filter (MWF) approach is utilized for noise reduction in the speech enhancement block. For the feature selection and voting block, several discriminating features were employed in a voting paradigm for the consideration of reliability and discriminative power. Effectiveness of the proposed approach is compared and evaluated to other VAD techniques by using two well-known databases, namely, TIMIT database and NOISEX-92 database. Experimental results show that the proposed method performs well under a variety of noisy conditions.

Download Full-text

A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2016.v9s3.14352 ◽

2016 ◽

Vol 9 (9) ◽

pp. 360

Author(s):

P Mahalakshmi

Keyword(s):

Signal Processing ◽

Speaker Recognition ◽

Voice Activity Detection ◽

Activity Detection ◽

Mel Frequency Cepstral Coefficients ◽

Detection Techniques ◽

Cepstral Coefficients ◽

Clear Idea ◽

Mel Frequency Cepstral Coefficient ◽

Voice Activity

ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.

Download Full-text