Pertinent Prosodic Features for Speaker Identification by Voice

Author(s):  
Halim Sayoud ◽  
Siham Ouamour

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.

Author(s):  
Halim Sayoud ◽  
Siham Ouamour

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.


Author(s):  
Minho Jin ◽  
Chang D. Yoo

A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matching pertains to matching the input feature to various speaker models. Speaker modeling techniques including Gaussian mixture model (GMM), hidden Markov model (HMM), and phone n-grams are presented, and in this chapter, their performances are compared under various tasks. Several verification and identification experimental results presented in this chapter indicate that speaker recognition performances are highly dependent on the acoustical environment. A comparative study between human listeners and an automatic speaker verification system is presented, and it indicates that an automatic speaker verification system can outperform human listeners. The applications of speaker recognition are summarized, and finally various obstacles that must be overcome are discussed.


Cryptography ◽  
2020 ◽  
pp. 277-294
Author(s):  
S. Selva Nidhyananthan ◽  
M. Prasad ◽  
R. Shantha Selva Kumari

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.


2021 ◽  
pp. 3256-3281
Author(s):  
Thabit Sultan Mohammed ◽  
Karim M. Aljebory ◽  
Mohammed Aref Abdul Rasheed ◽  
Muzhir Shaban Al-Ani ◽  
Ali Makki Sagheer

The theories and applications of speaker identification, recognition, and verification are among the well-established fields. Many publications and advances in the relevant products are still emerging. In this paper, research-related publications of the past 25 years (from 1996 to 2020) were studied and analysed. Our main focus was on speaker identification, speaker recognition, and speaker verification. The study was carried out using the Science Direct databases. Several references, such as review articles, research articles, encyclopaedia, book chapters, conference abstracts, and others, were categorized and investigated. Summary of these kinds of literature is presented in this paper, together with statistical analyses to represent the publications and their categories over the mentioned period. Important information, including the dataset used, the size of the data adopted, the implemented methods, and the accuracy of the obtained results in the analysed research, are extracted from the explored publications and tabulated. The results show that the sum of published research articles is outnumbering other categories of publications. The number of researches in speech and speaker identification, recognition, and verification shows an increasing trend. Based on the normalized comparative factors of research publications, we found that many of them reached a high level of accuracy in their findings; hence the significantly superior techniques were derived and discussed for future researches. This survey paper would be beneficial for all those who wish to enhance their researches in the area of voice identification, recognition, and verification.


2015 ◽  
Vol 9 (4) ◽  
pp. 1-19
Author(s):  
S. Selva Nidhyananthan ◽  
Prasad M. ◽  
Shantha Selva Kumari R.

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.


2020 ◽  
Author(s):  
Obonee Kushum ◽  
Julkar Nayeen Mahi ◽  
Milon Biswas

Abstract Given the increasing popularity of smartphones as all-in-one computing devices for corporate work and everyday personal use, it is no wonder that mobile devices have become the most appealing attack surface for today's cyber criminals. In that case obscene or harassing phone calls can be one of the most stressful and frightening invasions of privacy a person experiences. Thus Mobile security has become increasingly important in mobile computing. There exist various applications that block spam calls through the SIM card numbers by establishing a spam database which identities the source of income calls. But unfortunately, their effciency of work is not up to the mark, since its usually pointless to track and block the SIM card number, as the number of spam callers is constantly changed. Considering this point, we are presenting a new concept in which frauds will be recognized through their vocals, even in a noisy environment, with a few seconds of speech, as one can change his number several times but can't change his voice. Here we have used several algorithms and techniques, such as speaker verification, speaker identification, forensic speaker recognition (FSR), spectrogram masking, voice ltering, Mel-Frequency Cepstral Coeffcient (MFCC) and a combination of Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Moreover, this system doesn't require any kind of personal information of the users. In this consequence, safety issues also remain in force. Findings of this study will be useful for lawyers, law enforcement agencies, and judges in the courts to recognize their suspects.


2021 ◽  
Author(s):  
Chander Prabha ◽  
Sukhvinder Kaur ◽  
Meenu Gupta ◽  
Fadi Al-Turjman

Abstract An important application of speech processing is speaker recognition, which automatically recognizes the person speaking in an audio recording, basis of which is speaker-specific information included in its speech features. It involves speaker verification and speaker identification. This paper presents an efficient method based on discrete wavelet transform and optimized variance spectral flux to enhance the enactment of speaker identification system. An effective feature extraction technique uses Daubechies 40 (db40) wavelet to compress and de-noised the speech signal by its decomposition into approximations and details coefficients at level 1. The approximation coefficients contain 99.9% of speech information as compared to detailed coefficients. So, the optimized variance spectral flux is applied on wavelet approximation coefficients which efficiently extract the frequency contents of the speech signal and gives unique features. The distance between extracted features has been obtained by applying traditional Bayesian information criteria. Experimental results were computed on recording data of 33 speakers (23 female and 10 males) for text independent identification of speaker. Evaluation of effectiveness of the proposed system is done by applying detection error trade-off curves, receiver operating characteristic, and area under curve. It shows 94.38% of speaker identification results when compared with traditional method using Mel frequency spectral coefficients which is 90.70%.


Author(s):  
Dong Wang

AbstractIn this article, we conduct a comprehensive simulation study for the optimal scores of speaker recognition systems that are based on speaker embedding. For that purpose, we first revisit the optimal scores for the speaker identification (SI) task and the speaker verification (SV) task in the sense of minimum Bayes risk (MBR) and show that the optimal scores for the two tasks can be formulated as a single form of normalized likelihood (NL). We show that when the underlying model is linear Gaussian, the NL score is mathematically equivalent to the PLDA likelihood ratio (LR), and the empirical scores based on cosine distance and Euclidean distance can be seen as approximations of this linear Gaussian NL score under some conditions.Based on the unified NL score, we conducted a comprehensive simulation study to investigate the behavior of the scoring component on both the SI task and SV task, in the case where the distribution of the speaker vectors perfectly matches the assumption of the NL model, as well as the case where some mismatch is involved. Importantly, our simulation is based on the statistics of speaker vectors derived from a practical speaker recognition system, hence reflecting the behavior of the NL scoring in real-life scenarios that are full of imperfection, including non-Gaussianality, non-homogeneity, and domain/condition mismatch.


Sign in / Sign up

Export Citation Format

Share Document