scholarly journals Combating Phone Harassment through VoiceAnalysis Filtration of Anonymous Reports

2020 ◽  
Author(s):  
Obonee Kushum ◽  
Julkar Nayeen Mahi ◽  
Milon Biswas

Abstract Given the increasing popularity of smartphones as all-in-one computing devices for corporate work and everyday personal use, it is no wonder that mobile devices have become the most appealing attack surface for today's cyber criminals. In that case obscene or harassing phone calls can be one of the most stressful and frightening invasions of privacy a person experiences. Thus Mobile security has become increasingly important in mobile computing. There exist various applications that block spam calls through the SIM card numbers by establishing a spam database which identities the source of income calls. But unfortunately, their effciency of work is not up to the mark, since its usually pointless to track and block the SIM card number, as the number of spam callers is constantly changed. Considering this point, we are presenting a new concept in which frauds will be recognized through their vocals, even in a noisy environment, with a few seconds of speech, as one can change his number several times but can't change his voice. Here we have used several algorithms and techniques, such as speaker verification, speaker identification, forensic speaker recognition (FSR), spectrogram masking, voice ltering, Mel-Frequency Cepstral Coeffcient (MFCC) and a combination of Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Moreover, this system doesn't require any kind of personal information of the users. In this consequence, safety issues also remain in force. Findings of this study will be useful for lawyers, law enforcement agencies, and judges in the courts to recognize their suspects.

Author(s):  
Minho Jin ◽  
Chang D. Yoo

A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matching pertains to matching the input feature to various speaker models. Speaker modeling techniques including Gaussian mixture model (GMM), hidden Markov model (HMM), and phone n-grams are presented, and in this chapter, their performances are compared under various tasks. Several verification and identification experimental results presented in this chapter indicate that speaker recognition performances are highly dependent on the acoustical environment. A comparative study between human listeners and an automatic speaker verification system is presented, and it indicates that an automatic speaker verification system can outperform human listeners. The applications of speaker recognition are summarized, and finally various obstacles that must be overcome are discussed.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


2014 ◽  
Vol 23 (4) ◽  
pp. 359-378
Author(s):  
M. S. Rudramurthy ◽  
V. Kamakshi Prasad ◽  
R. Kumaraswamy

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.


2016 ◽  
Vol 25 (4) ◽  
pp. 529-538
Author(s):  
H.S. Jayanna ◽  
B.G. Nagaraja

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.


Cryptography ◽  
2020 ◽  
pp. 277-294
Author(s):  
S. Selva Nidhyananthan ◽  
M. Prasad ◽  
R. Shantha Selva Kumari

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.


Author(s):  
Halim Sayoud ◽  
Siham Ouamour

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.


Author(s):  
AMITA PAL ◽  
SMARAJIT BOSE ◽  
GOPAL K. BASAK ◽  
AMITAVA MUKHOPADHYAY

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.


Author(s):  
Halim Sayoud ◽  
Siham Ouamour

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.


2017 ◽  
Vol 2017 ◽  
pp. 1-6 ◽  
Author(s):  
Mohammed Algabri ◽  
Hassan Mathkour ◽  
Mohamed A. Bencherif ◽  
Mansour Alsulaiman ◽  
Mohamed A. Mekhtiche

Presently, lawyers, law enforcement agencies, and judges in courts use speech and other biometric features to recognize suspects. In general, speaker recognition is used for discriminating people based on their voices. The process of determining, if a suspected speaker is the source of trace, is called forensic speaker recognition. In such applications, the voice samples are most probably noisy, the recording sessions might mismatch each other, the sessions might not contain sufficient recording for recognition purposes, and the suspect voices are recorded through mobile channel. The identification of a person through his voice within a forensic quality context is challenging. In this paper, we propose a method for forensic speaker recognition for the Arabic language; the King Saud University Arabic Speech Database is used for obtaining experimental results. The advantage of this database is that each speaker’s voice is recorded in both clean and noisy environments, through a microphone and a mobile channel. This diversity facilitates its usage in forensic experimentations. Mel-Frequency Cepstral Coefficients are used for feature extraction and the Gaussian mixture model-universal background model is used for speaker modeling. Our approach has shown low equal error rates (EER), within noisy environments and with very short test samples.


2021 ◽  
Vol 10 (4) ◽  
pp. 2310-2319
Author(s):  
Duraid Y. Mohammed ◽  
Khamis Al-Karawi ◽  
Ahmed Aljuboori

Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.


Sign in / Sign up

Export Citation Format

Share Document