scholarly journals Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>

Author(s):  
Hesham A. Alabbasi ◽  
Ali M. Jalil ◽  
Fadhil S. Hasan

The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.


2011 ◽  
Vol 2011 ◽  
pp. 1-8 ◽  
Author(s):  
Phaklen EhKan ◽  
Timothy Allen ◽  
Steven F. Quigley

In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.


2019 ◽  
Vol 17 (2) ◽  
pp. 170-177
Author(s):  
Lei Deng ◽  
Yong Gao

In this paper, authors propose an auditory feature extraction algorithm in order to improve the performance of the speaker recognition system in noisy environments. In this auditory feature extraction algorithm, the Gammachirp filter bank is adapted to simulate the auditory model of human cochlea. In addition, the following three techniques are applied: cube-root compression method, Relative Spectral Filtering Technique (RASTA), and Cepstral Mean and Variance Normalization algorithm (CMVN).Subsequently, based on the theory of Gaussian Mixes Model-Universal Background Model (GMM-UBM), the simulated experiment was conducted. The experimental results implied that speaker recognition systems with the new auditory feature has better robustness and recognition performance compared to Mel-Frequency Cepstral Coefficients(MFCC), Relative Spectral-Perceptual Linear Predictive (RASTA-PLP),Cochlear Filter Cepstral Coefficients (CFCC) and gammatone Frequency Cepstral Coefficeints (GFCC)


Author(s):  
Anny Tandyo ◽  
Martono Martono ◽  
Adi Widyatmoko

Article discussed a speaker identification system. Which was a part of speaker recognition. The system identified asubject based on the voice from a group of pattern had been saved before. This system used a wavelet discrete transformationas a feature extraction method and an artificial neural network of back-propagation as a classification method. The voiceinput was processed by the wavelet discrete transformation in order to obtain signal coefficient of low frequency as adecomposition result which kept voice characteristic of everyone. The coefficient then was classified artificial neural networkof back-propagation. A system trial was conducted by collecting voice samples directly by using 225 microphones in nonsoundproof rooms; contained of 15 subjects (persons) and each of them had 15 voice samples. The 10 samples were used as atraining voice and 5 others as a testing voice. Identification accuracy rate reached 84 percent. The testing was also done onthe subjects who pronounced same words. It can be concluded that, the similar selection of words by different subjects has noinfluence on the accuracy rate produced by system.Keywords: speaker identification, wavelet discrete transformation, artificial neural network, back-propagation.


Author(s):  
Minho Jin ◽  
Chang D. Yoo

A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matching pertains to matching the input feature to various speaker models. Speaker modeling techniques including Gaussian mixture model (GMM), hidden Markov model (HMM), and phone n-grams are presented, and in this chapter, their performances are compared under various tasks. Several verification and identification experimental results presented in this chapter indicate that speaker recognition performances are highly dependent on the acoustical environment. A comparative study between human listeners and an automatic speaker verification system is presented, and it indicates that an automatic speaker verification system can outperform human listeners. The applications of speaker recognition are summarized, and finally various obstacles that must be overcome are discussed.


2016 ◽  
Vol 25 (4) ◽  
pp. 529-538
Author(s):  
H.S. Jayanna ◽  
B.G. Nagaraja

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.


2021 ◽  
Vol 39 (1B) ◽  
pp. 30-40
Author(s):  
Ahmed M. Ahmed ◽  
Aliaa K. Hassan

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.


Author(s):  
K. Manikandan ◽  
E. Chandra

Speaker Identification denotes the speech samples of known speaker and it identifies the best matches of the input model. The SGMFC method is the combination of Sub Gaussian Mixture Model (SGMM) with the Mel-frequency Cepstral Coefficients (MFCC) for feature extraction. The SGMFC method minimizes the error rate, memory footprint and also computational throughput measure needs of a medium-vocabulary speaker identification system, supposed for preparation on a transportable or otherwise. Fuzzy C-means and k-means clustering are used in the SGMM method to attain the improved efficiency and their outcomes with parameters such as precision, sensitivity and specificity are compared.


Sign in / Sign up

Export Citation Format

Share Document