Feature Extraction of Automatic Speaker Recognition, Analysis and Evaluation in Real Environment

Author(s):  
Edward L. Campbell ◽  
Gabriel Hernández ◽  
José Ramón Calvo
2021 ◽  
Vol 39 (1B) ◽  
pp. 30-40
Author(s):  
Ahmed M. Ahmed ◽  
Aliaa K. Hassan

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.


Author(s):  
Shung-Yung Lung

This chapter presents some of the main contributions in several topics of speaker recognition since 1978. Representative books and surveys on speaker recognition published during this period are listed. Theoretical models for automatic speaker recognition are contrasted with practical design methodology. Research contributions to measure process, feature extraction, and classification are selectively discussed, including contributions to measure analysis, feature selection, and the experimental design of speaker classifiers. The chapter concludes with a representative set of applications of speaker recognition technology.


Automatic speaker recognition is the process of identification of a person automatically from his/her voices. A robust feature extraction algorithm is required for effective and efficient classification. In this paper, a new method is proposed for identifying the speaker using an artificial neural network. Here mel- frequency cepstral coefficient(MFCC) is used as a feature extraction technique that provides useful features for the recognition process. Using these extracted features value, input samples are then created and finally, classification is performed using Multilayer Perceptron (MLP) which is trained by backpropagation. This proposed method gives an accuracy of 94.44%.


Author(s):  
Shung-Yung Lung

This chapter presents some of the main contributions in several topics of speaker recognition since 1978. Representative books and surveys on speaker recognition published during this period are listed. Theoretical models for automatic speaker recognition are contrasted with practical design methodology. Research contributions to measure process, feature extraction, and classification are selectively discussed, including contributions to measure analysis, feature selection, and the experimental design of speaker classifiers. The chapter concludes with a representative set of applications of speaker recognition technology.


2021 ◽  
Vol 10 (1) ◽  
pp. 374-382
Author(s):  
Ayoub Bouziane ◽  
Jamal Kharroubi ◽  
Arsalane Zarghili

A common limitation of the previous comparative studies on speaker-features extraction techniques lies in the fact that the comparison is done independently of the used speaker modeling technique and its parameters. The aim of the present paper is twofold. Firstly, it aims to review the most significant advancements in feature extraction techniques used for automatic speaker recognition. Secondly, it seeks to evaluate and compare the currently dominant ones using an objective comparison methodology that overcomes the various limitations and drawbacks of the previous comparative studies. The results of the carried out experiments underlines the importance of the proposed comparison methodology. 


Author(s):  
Satyanand Singh

<p>In this paper, I present high-level speaker specific feature extraction considering intonation, linguistics rhythm, linguistics stress, prosodic features directly from speech signals. I assume that the rhythm is related to language units such as syllables and appears as changes in measurable parameters such as fundamental frequency (  ), duration, and energy. In this work, the syllable type features are selected as the basic unit for expressing the prosodic features. The approximate segmentation of continuous speech to syllable units is achieved by automatically locating the vowel starting point. The knowledge of high-level speaker’s specific speakers is used as a reference for extracting the prosodic features of the speech signal. High-level speaker-specific features extracted using this method may be useful in applications such as speaker recognition where explicit phoneme/syllable boundaries are not readily available. The efficiency of the particular characteristics of the specific features used for automatic speaker recognition was evaluated on TIMIT and HTIMIT corpora initially sampled in the TIMIT at 16 kHz to 8 kHz. In summary, the experiment, the basic discriminating system, and the HMM system are formed on TIMIT corpus with a set of 48 phonemes. Proposed ASR system shows 1.99%, 2.10%,  2.16%  and  2.19 % of efficiency improvements compared to traditional ASR system for and of 16KHz TIMIT utterances.</p>


Identifying the person from his or her voice characteristics is an essential trait for human interaction. Automatic speaker recognition (ASR) systems are developed to find the identity of the speaker in the field of forensics, business interactions and law enforcement. It can be achieved by extracting prosodic, linguistic, and acoustic speech characteristics. Furthermore optimized neural network based approaches are reviewed to classify the extracted features. In this paper, literatures are surveyed on recognition of speaker through the neural network using an optimization algorithm that has developed from the previous years for ASR systems. We deliberate different characteristics of ASR arrangements, containing features, neural network based classification, performance metrics and standard evaluation data sets. ASR system is discussed in two parts. The first part illustrates different feature extraction techniques and the second part involves the classification approaches which identify the speaker. We accomplish this evaluation through a comparative analysis of various recognition of speaker approaches and compare the results of the same


Sign in / Sign up

Export Citation Format

Share Document