Combating Phone Harassment through VoiceAnalysis Filtration of Anonymous Reports

Mapping Intimacies ◽

10.21203/rs.3.rs-52452/v1 ◽

2020 ◽

Author(s):

Obonee Kushum ◽

Julkar Nayeen Mahi ◽

Milon Biswas

Keyword(s):

Speaker Recognition ◽

Speaker Identification ◽

Speaker Verification ◽

Personal Information ◽

Gaussian Mixture ◽

Law Enforcement Agencies ◽

Safety Issues ◽

Phone Calls ◽

Sim Card ◽

Attack Surface

Abstract Given the increasing popularity of smartphones as all-in-one computing devices for corporate work and everyday personal use, it is no wonder that mobile devices have become the most appealing attack surface for today's cyber criminals. In that case obscene or harassing phone calls can be one of the most stressful and frightening invasions of privacy a person experiences. Thus Mobile security has become increasingly important in mobile computing. There exist various applications that block spam calls through the SIM card numbers by establishing a spam database which identities the source of income calls. But unfortunately, their effciency of work is not up to the mark, since its usually pointless to track and block the SIM card number, as the number of spam callers is constantly changed. Considering this point, we are presenting a new concept in which frauds will be recognized through their vocals, even in a noisy environment, with a few seconds of speech, as one can change his number several times but can't change his voice. Here we have used several algorithms and techniques, such as speaker verification, speaker identification, forensic speaker recognition (FSR), spectrogram masking, voice ltering, Mel-Frequency Cepstral Coeffcient (MFCC) and a combination of Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Moreover, this system doesn't require any kind of personal information of the users. In this consequence, safety issues also remain in force. Findings of this study will be useful for lawyers, law enforcement agencies, and judges in the courts to recognize their suspects.

Get full-text (via PubEx)

Speaker Verification and Identification

Behavioral Biometrics for Human Identification ◽

10.4018/978-1-60566-725-6.ch013 ◽

2010 ◽

pp. 264-289

Author(s):

Minho Jin ◽

Chang D. Yoo

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Speaker Identification ◽

Speaker Verification ◽

Gaussian Mixture ◽

Recognition System ◽

Essential Elements ◽

Verification System ◽

Biometric Characteristic ◽

Speaker Modeling

A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matching pertains to matching the input feature to various speaker models. Speaker modeling techniques including Gaussian mixture model (GMM), hidden Markov model (HMM), and phone n-grams are presented, and in this chapter, their performances are compared under various tasks. Several verification and identification experimental results presented in this chapter indicate that speaker recognition performances are highly dependent on the acoustical environment. A comparative study between human listeners and an automatic speaker verification system is presented, and it indicates that an automatic speaker verification system can outperform human listeners. The applications of speaker recognition are summarized, and finally various obstacles that must be overcome are discussed.

Get full-text (via PubEx)

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i2.pp782-789 ◽

2020 ◽

Vol 18 (2) ◽

pp. 782

Author(s):

Musab T. S. Al-Kaltakchi ◽

Haithem Abd Al-Raheem Taha ◽

Mohanad Abd Shehab ◽

Mohamed A.M. Abdullah

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture ◽

Identification Accuracy ◽

Identification System ◽

Good Representation ◽

Mel Frequency Cepstral Coefficients ◽

Normalization Methods ◽

Cepstral Coefficients

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>

Get full-text (via PubEx)

Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0085 ◽

2014 ◽

Vol 23 (4) ◽

pp. 359-378

Author(s):

M. S. Rudramurthy ◽

V. Kamakshi Prasad ◽

R. Kumaraswamy

Keyword(s):

Speaker Recognition ◽

Speaker Verification ◽

Signal To Noise Ratio ◽

Gaussian Mixture ◽

Detection Algorithm ◽

Voice Activity Detection ◽

Activity Detection ◽

Front End ◽

Different Types ◽

Voice Activity

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.

Get full-text (via PubEx)

An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0128 ◽

2016 ◽

Vol 25 (4) ◽

pp. 529-538

Author(s):

H.S. Jayanna ◽

B.G. Nagaraja

Keyword(s):

Speaker Recognition ◽

English Language ◽

Speaker Identification ◽

Poor Performance ◽

Gaussian Mixture ◽

Experimental Comparison ◽

Identification System ◽

Specific Information ◽

Self Organizing Map ◽

Modeling Techniques

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.

Get full-text (via PubEx)

Secure Speaker Recognition using BGN Cryptosystem with Prime Order Bilinear Group

Cryptography ◽

10.4018/978-1-7998-1763-5.ch016 ◽

2020 ◽

pp. 277-294

Author(s):

S. Selva Nidhyananthan ◽

M. Prasad ◽

R. Shantha Selva Kumari

Keyword(s):

Speaker Recognition ◽

Execution Time ◽

Prime Order ◽

Speaker Identification ◽

Speaker Verification ◽

Recognition System ◽

Secure Multiparty Computation ◽

Multiparty Computation ◽

Speech Input ◽

Order Group

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.

Get full-text (via PubEx)

Pertinent Prosodic Features for Speaker Identification by Voice

Advancing the Next-Generation of Mobile Computing ◽

10.4018/978-1-4666-0119-2.ch015 ◽

2012 ◽

pp. 227-241

Author(s):

Halim Sayoud ◽

Siham Ouamour

Keyword(s):

Vector Quantization ◽

Speaker Recognition ◽

Speaker Identification ◽

Speaker Verification ◽

Low Frequency ◽

Arabic Language ◽

Prosodic Features ◽

Acoustic Features ◽

Heterogeneous Features ◽

The Mean

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.

Get full-text (via PubEx)

SPEAKER IDENTIFICATION BY AGGREGATING GAUSSIAN MIXTURE MODELS (GMMs) BASED ON UNCORRELATED MFCC-DERIVED FEATURES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414560060 ◽

2014 ◽

Vol 28 (04) ◽

pp. 1456006 ◽

Cited By ~ 2

Author(s):

AMITA PAL ◽

SMARAJIT BOSE ◽

GOPAL K. BASAK ◽

AMITAVA MUKHOPADHYAY

Keyword(s):

Mixture Models ◽

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture Models ◽

Principal Component ◽

Gaussian Mixture ◽

Recognition System ◽

Mel Frequency Cepstral Coefficients ◽

Speech Corpus ◽

Signal Process

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.

Get full-text (via PubEx)

Robust speaker verification by combining MFCC and entrocy in noisy conditions

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i4.2957 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2310-2319

Author(s):

Duraid Y. Mohammed ◽

Khamis Al-Karawi ◽

Ahmed Aljuboori

Keyword(s):

Speaker Recognition ◽

Speaker Verification ◽

Gaussian Mixture ◽

Mel Frequency Cepstral Coefficients ◽

Automatic Speaker Recognition ◽

Robust Speaker Recognition ◽

Noisy Conditions ◽

New Feature ◽

Highly Correlated ◽

The Fourier Transform

Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.

Get full-text (via PubEx)

Analysis of Methods and Techniques Used for Speaker Identification, Recognition, and Verification: A Study on Quarter-Century Research Outcomes

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.9.38 ◽

2021 ◽

pp. 3256-3281

Author(s):

Thabit Sultan Mohammed ◽

Karim M. Aljebory ◽

Mohammed Aref Abdul Rasheed ◽

Muzhir Shaban Al-Ani ◽

Ali Makki Sagheer

Keyword(s):

Speaker Recognition ◽

Speaker Identification ◽

Speaker Verification ◽

Research Articles ◽

Quarter Century ◽

Survey Paper ◽

Voice Identification ◽

Increasing Trend ◽

High Level ◽

Published Research

The theories and applications of speaker identification, recognition, and verification are among the well-established fields. Many publications and advances in the relevant products are still emerging. In this paper, research-related publications of the past 25 years (from 1996 to 2020) were studied and analysed. Our main focus was on speaker identification, speaker recognition, and speaker verification. The study was carried out using the Science Direct databases. Several references, such as review articles, research articles, encyclopaedia, book chapters, conference abstracts, and others, were categorized and investigated. Summary of these kinds of literature is presented in this paper, together with statistical analyses to represent the publications and their categories over the mentioned period. Important information, including the dataset used, the size of the data adopted, the implemented methods, and the accuracy of the obtained results in the analysed research, are extracted from the explored publications and tabulated. The results show that the sum of published research articles is outnumbering other categories of publications. The number of researches in speech and speaker identification, recognition, and verification shows an increasing trend. Based on the normalized comparative factors of research publications, we found that many of them reached a high level of accuracy in their findings; hence the significantly superior techniques were derived and discussed for future researches. This survey paper would be beneficial for all those who wish to enhance their researches in the area of voice identification, recognition, and verification.

Get full-text (via PubEx)

APPLICATION OF GAUSSIAN SUPERVECTOR IN SPEECH ANALYSIS

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2015.1158 ◽

2015 ◽

pp. 215-219

Author(s):

KAULESHWAR PRASAD ◽

PIYUSH LOTIA

Keyword(s):

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture ◽

Speech Analysis ◽

Speech Signals ◽

Powerful Method ◽

Mel Frequency Cepstral Coefficients ◽

Basic Goal ◽

Unique Identity ◽

Key Features

The idea of the Speaker Identification is to implement a recognizer using Matlab which can identify a person by processing his/her voice. The basic goal of the paper is to classify and recognize the speeches of different persons. This classification is mainly based on extracting several key features like Mel Frequency Cepstral Coefficients (MFCC) from the speech signals of those persons by using the process of feature extraction using MATLAB. The above features may consists of pitch, amplitude, frequency etc. Using a statistical model like Gaussian mixture model (GMM) and features extracted from those speech signals we build a unique identity for each person who enrolled for speaker recognition. There is an elegant and powerful method for finding the maximum likelihood and that method is called Expectation and Maximization algorithm. The performance of the technique has been measured by three parameters: Number of Speakers in Database, Number of Persons Tested and the % Error.

Get full-text (via PubEx)