scholarly journals Combating Phone Harassment through VoiceAnalysis Filtration of Anonymous Reports

2020 ◽  
Author(s):  
Obonee Kushum ◽  
Julkar Nayeen Mahi ◽  
Milon Biswas

Abstract Given the increasing popularity of smartphones as all-in-one computing devices for corporate work and everyday personal use, it is no wonder that mobile devices have become the most appealing attack surface for today's cyber criminals. In that case obscene or harassing phone calls can be one of the most stressful and frightening invasions of privacy a person experiences. Thus Mobile security has become increasingly important in mobile computing. There exist various applications that block spam calls through the SIM card numbers by establishing a spam database which identities the source of income calls. But unfortunately, their effciency of work is not up to the mark, since its usually pointless to track and block the SIM card number, as the number of spam callers is constantly changed. Considering this point, we are presenting a new concept in which frauds will be recognized through their vocals, even in a noisy environment, with a few seconds of speech, as one can change his number several times but can't change his voice. Here we have used several algorithms and techniques, such as speaker verification, speaker identification, forensic speaker recognition (FSR), spectrogram masking, voice ltering, Mel-Frequency Cepstral Coeffcient (MFCC) and a combination of Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Moreover, this system doesn't require any kind of personal information of the users. In this consequence, safety issues also remain in force. Findings of this study will be useful for lawyers, law enforcement agencies, and judges in the courts to recognize their suspects.

Author(s):  
Minho Jin ◽  
Chang D. Yoo

A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matching pertains to matching the input feature to various speaker models. Speaker modeling techniques including Gaussian mixture model (GMM), hidden Markov model (HMM), and phone n-grams are presented, and in this chapter, their performances are compared under various tasks. Several verification and identification experimental results presented in this chapter indicate that speaker recognition performances are highly dependent on the acoustical environment. A comparative study between human listeners and an automatic speaker verification system is presented, and it indicates that an automatic speaker verification system can outperform human listeners. The applications of speaker recognition are summarized, and finally various obstacles that must be overcome are discussed.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


2014 ◽  
Vol 23 (4) ◽  
pp. 359-378
Author(s):  
M. S. Rudramurthy ◽  
V. Kamakshi Prasad ◽  
R. Kumaraswamy

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.


2016 ◽  
Vol 25 (4) ◽  
pp. 529-538
Author(s):  
H.S. Jayanna ◽  
B.G. Nagaraja

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.


Cryptography ◽  
2020 ◽  
pp. 277-294
Author(s):  
S. Selva Nidhyananthan ◽  
M. Prasad ◽  
R. Shantha Selva Kumari

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.


Author(s):  
Halim Sayoud ◽  
Siham Ouamour

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.


Author(s):  
AMITA PAL ◽  
SMARAJIT BOSE ◽  
GOPAL K. BASAK ◽  
AMITAVA MUKHOPADHYAY

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.


2021 ◽  
Vol 10 (4) ◽  
pp. 2310-2319
Author(s):  
Duraid Y. Mohammed ◽  
Khamis Al-Karawi ◽  
Ahmed Aljuboori

Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.


2021 ◽  
pp. 3256-3281
Author(s):  
Thabit Sultan Mohammed ◽  
Karim M. Aljebory ◽  
Mohammed Aref Abdul Rasheed ◽  
Muzhir Shaban Al-Ani ◽  
Ali Makki Sagheer

The theories and applications of speaker identification, recognition, and verification are among the well-established fields. Many publications and advances in the relevant products are still emerging. In this paper, research-related publications of the past 25 years (from 1996 to 2020) were studied and analysed. Our main focus was on speaker identification, speaker recognition, and speaker verification. The study was carried out using the Science Direct databases. Several references, such as review articles, research articles, encyclopaedia, book chapters, conference abstracts, and others, were categorized and investigated. Summary of these kinds of literature is presented in this paper, together with statistical analyses to represent the publications and their categories over the mentioned period. Important information, including the dataset used, the size of the data adopted, the implemented methods, and the accuracy of the obtained results in the analysed research, are extracted from the explored publications and tabulated. The results show that the sum of published research articles is outnumbering other categories of publications. The number of researches in speech and speaker identification, recognition, and verification shows an increasing trend. Based on the normalized comparative factors of research publications, we found that many of them reached a high level of accuracy in their findings; hence the significantly superior techniques were derived and discussed for future researches. This survey paper would be beneficial for all those who wish to enhance their researches in the area of voice identification, recognition, and verification.


Author(s):  
KAULESHWAR PRASAD ◽  
PIYUSH LOTIA

The idea of the Speaker Identification is to implement a recognizer using Matlab which can identify a person by processing his/her voice. The basic goal of the paper is to classify and recognize the speeches of different persons. This classification is mainly based on extracting several key features like Mel Frequency Cepstral Coefficients (MFCC) from the speech signals of those persons by using the process of feature extraction using MATLAB. The above features may consists of pitch, amplitude, frequency etc. Using a statistical model like Gaussian mixture model (GMM) and features extracted from those speech signals we build a unique identity for each person who enrolled for speaker recognition. There is an elegant and powerful method for finding the maximum likelihood and that method is called Expectation and Maximization algorithm. The performance of the technique has been measured by three parameters: Number of Speakers in Database, Number of Persons Tested and the % Error.


Sign in / Sign up

Export Citation Format

Share Document