MFCC AND CMN BASED SPEAKER RECOGNITION IN NOISY ENVIRONMENT

The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognition systems is their unsatisfactory performance in noisy environments. In this experimental research, we have studied a combination of Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Cepstral Mean Normalization (CMN) techniques for speech enhancement. Our system uses a Gaussian Mixture Models (GMM) classifier and is implemented under MATLAB®7 programming environment. The process involves the use of speaker data for both training and testing. The data used for testing is matched up against a speaker model, which is trained with the training data using GMM modeling. Finally, experiments are carried out to test the new model for ASR given limited training data and with differing levels and types of realistic background noise. The results have demonstrated the robustness of the new system.

Download Full-text

Automatic speaker recognition using Gaussian mixture models

1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251) ◽

10.1109/idc.1999.754201 ◽

1999 ◽

Cited By ~ 8

Author(s):

W.J.J. Roberts ◽

J.P. Willmore

Keyword(s):

Mixture Models ◽

Speaker Recognition ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Automatic Speaker Recognition

Download Full-text

Gaussian Mixture Model Based Classification of Stuttering Dysfluencies

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0140 ◽

2016 ◽

Vol 25 (3) ◽

pp. 387-399

Author(s):

P. Mahesha ◽

D.S. Vinod

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Gaussian Mixture ◽

Modeling Technique ◽

Mel Frequency Cepstral Coefficients ◽

Automatic Speaker Recognition ◽

Word Repetition ◽

Syllable Repetition

AbstractThe classification of dysfluencies is one of the important steps in objective measurement of stuttering disorder. In this work, the focus is on investigating the applicability of automatic speaker recognition (ASR) method for stuttering dysfluency recognition. The system designed for this particular task relies on the Gaussian mixture model (GMM), which is the most widely used probabilistic modeling technique in ASR. The GMM parameters are estimated from Mel frequency cepstral coefficients (MFCCs). This statistical speaker-modeling technique represents the fundamental characteristic sounds of speech signal. Using this model, we build a dysfluency recognizer that is capable of recognizing dysfluencies irrespective of a person as well as what is being said. The performance of the system is evaluated for different types of dysfluencies such as syllable repetition, word repetition, prolongation, and interjection using speech samples from the University College London Archive of Stuttered Speech (UCLASS).

Download Full-text

Speaker Recognition Systems in the Last Decade – A Survey

Engineering and Technology Journal ◽

10.30684/etj.v39i1b.1589 ◽

2021 ◽

Vol 39 (1B) ◽

pp. 30-40

Author(s):

Ahmed M. Ahmed ◽

Aliaa K. Hassan

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Clustering Algorithms ◽

Predictive Coding ◽

Gaussian Mixture ◽

Linear Predictive Coding ◽

Mel Frequency Cepstral Coefficients ◽

Voice Signal ◽

Automatic Speaker Recognition ◽

Authentication System

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.

Download Full-text

SPEAKER IDENTIFICATION BY AGGREGATING GAUSSIAN MIXTURE MODELS (GMMs) BASED ON UNCORRELATED MFCC-DERIVED FEATURES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414560060 ◽

2014 ◽

Vol 28 (04) ◽

pp. 1456006 ◽

Cited By ~ 2

Author(s):

AMITA PAL ◽

SMARAJIT BOSE ◽

GOPAL K. BASAK ◽

AMITAVA MUKHOPADHYAY

Keyword(s):

Mixture Models ◽

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture Models ◽

Principal Component ◽

Gaussian Mixture ◽

Recognition System ◽

Mel Frequency Cepstral Coefficients ◽

Speech Corpus ◽

Signal Process

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.

Download Full-text

Automatic Speaker Recognition with Multi-Resolution Gaussian Mixture Models (MR-GMM)

The International Journal of Forensic Computer Science ◽

10.5769/j200901001 ◽

2009 ◽

pp. 9-21

Author(s):

Frederico D’Almeida ◽

Francisco Assis Nascimento ◽

Pedro Berger ◽

Lúcio Silva

Keyword(s):

Mixture Models ◽

Speaker Recognition ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Automatic Speaker Recognition

Download Full-text

Voice Pathologies Classification Using GMM And SVM Classifiers

International Journal of Mathematics and Computers in Simulation ◽

10.46300/9102.2021.15.21 ◽

2021 ◽

Vol 15 ◽

pp. 110-114

Author(s):

Amara Fethi ◽

Fezari Mohamed

Keyword(s):

Speaker Recognition ◽

Gaussian Mixture ◽

Polynomial Kernel ◽

Support Vector ◽

Mel Frequency Cepstral Coefficients ◽

Classification Rate ◽

Automatic Speaker Recognition ◽

Statistical Pattern ◽

Voice Pathologies ◽

Sensitivity Specificity

In this paper we investigate the proprieties of automatic speaker recognition (ASR) to develop a system for voice pathologies detection, where the model does not correspond to a speaker but it corresponds to group of patients who shares the same diagnostic. One of essential part in this topic is the database (described later), the samples voices (healthy and pathological) are chosen from a German database which contains many diseases, spasmodic dysphonia is proposed for this study. This problematic can be solved by statistical pattern recognition techniques where we have proposed the mel frequency cepstral coefficients (MFCC) to be modeled first, with gaussian mixture model (GMM) massively used in ASR then, they are modeled with support vector machine (SVM). The obtained results are compared in order to evaluate the more preferment classifier. The performance of each method is evaluated in a term of the accuracy, sensitivity, specificity. The best performance is obtained with 12 coefficientsMFCC, energy and second derivate along SVM with a polynomial kernel function, the classification rate is 90% for normal class and 93% for pathological class.This work is developed under MATLAB

Download Full-text

Automatic Speaker Recognition for Mobile Forensic Applications

Mobile Information Systems ◽

10.1155/2017/6986391 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6 ◽

Cited By ~ 4

Author(s):

Mohammed Algabri ◽

Hassan Mathkour ◽

Mohamed A. Bencherif ◽

Mansour Alsulaiman ◽

Mohamed A. Mekhtiche

Keyword(s):

Speaker Recognition ◽

Gaussian Mixture ◽

Arabic Language ◽

Error Rates ◽

Noisy Environments ◽

Law Enforcement Agencies ◽

Mel Frequency Cepstral Coefficients ◽

King Saud University ◽

Mobile Channel ◽

Speech Database

Presently, lawyers, law enforcement agencies, and judges in courts use speech and other biometric features to recognize suspects. In general, speaker recognition is used for discriminating people based on their voices. The process of determining, if a suspected speaker is the source of trace, is called forensic speaker recognition. In such applications, the voice samples are most probably noisy, the recording sessions might mismatch each other, the sessions might not contain sufficient recording for recognition purposes, and the suspect voices are recorded through mobile channel. The identification of a person through his voice within a forensic quality context is challenging. In this paper, we propose a method for forensic speaker recognition for the Arabic language; the King Saud University Arabic Speech Database is used for obtaining experimental results. The advantage of this database is that each speaker’s voice is recorded in both clean and noisy environments, through a microphone and a mobile channel. This diversity facilitates its usage in forensic experimentations. Mel-Frequency Cepstral Coefficients are used for feature extraction and the Gaussian mixture model-universal background model is used for speaker modeling. Our approach has shown low equal error rates (EER), within noisy environments and with very short test samples.

Download Full-text

Soft frame margin estimation of Gaussian Mixture Models for speaker recognition with sparse training data

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2011.5947546 ◽

2011 ◽

Cited By ~ 4

Author(s):

Yan Yin ◽

Qi Li

Keyword(s):

Mixture Models ◽

Speaker Recognition ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Training Data

Download Full-text

Robust speaker verification by combining MFCC and entrocy in noisy conditions

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i4.2957 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2310-2319

Author(s):

Duraid Y. Mohammed ◽

Khamis Al-Karawi ◽

Ahmed Aljuboori

Keyword(s):

Speaker Recognition ◽

Speaker Verification ◽

Gaussian Mixture ◽

Mel Frequency Cepstral Coefficients ◽

Automatic Speaker Recognition ◽

Robust Speaker Recognition ◽

Noisy Conditions ◽

New Feature ◽

Highly Correlated ◽

The Fourier Transform

Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.

Download Full-text