Performance Evaluation of Mel and Bark Scale based Features for Text-Independent Speaker Identification

The performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. Mel scale and Bark scale are designed according to human auditory system. The filter bank structure is defined using Mel and Bark scales for speech and speaker recognition systems to extract speaker specific speech features. In this work, performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. It is found that Bark scale centre frequencies are more effective than Mel scale centre frequencies in case of Indian dialect speaker databases. Mel scale is defined as per interpretation of pitch by human ear and Bark scale is based on critical band selectivity at which loudness becomes significantly different. The recognition rate achieved using Bark scale filter bank is 96% for AISSMSIOIT database and 95% for Marathi database.

Download Full-text

Data Augmentation for Speaker Identification under Stress Conditions to Combat Gender-Based Violence

Applied Sciences ◽

10.3390/app9112298 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2298 ◽

Cited By ~ 4

Author(s):

Esther Rituerto-González ◽

Alba Mínguez-Sánchez ◽

Ascensión Gallardo-Antolín ◽

Carmen Peláez-Moreno

Keyword(s):

Speaker Recognition ◽

Data Augmentation ◽

Speaker Identification ◽

Stress Conditions ◽

Identification System ◽

Gender Based Violence ◽

Augmentation Techniques ◽

Recognition Systems ◽

Gender Based ◽

Using Data

A Speaker Identification system for a personalized wearable device to combat gender-based violence is presented in this paper. Speaker recognition systems exhibit a decrease in performance when the user is under emotional or stress conditions, thus the objective of this paper is to measure the effects of stress in speech to ultimately try to mitigate their consequences on a speaker identification task, by using data augmentation techniques specifically tailored for this purpose given the lack of data resources for this condition. An extensive experimentation has been carried out for assessing the effectiveness of the proposed techniques. First, we conclude that the best performance is always obtained when naturally stressed samples are included in the training set, and second, when these are not available, their substitution and augmentation with synthetically generated stress-like samples improves the performance of the system.

Download Full-text

New Feature Vectors using GFCC for Speaker Identification

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.146 ◽

2018 ◽

Vol 6 (8) ◽

pp. 243

Author(s):

A. Nagesh

Keyword(s):

Speaker Recognition ◽

Speaker Identification ◽

Signal To Noise Ratio ◽

Main Idea ◽

Extraction Methods ◽

Identification System ◽

Identification Performance ◽

Feature Vectors ◽

Overall Performance ◽

New Feature

The feature vectors of speaker identification system plays a crucial role in the overall performance of the system. There are many new feature vectors extraction methods based on MFCC, but ultimately we want to maximize the performance of SID system. The objective of this paper to derive Gammatone Frequency Cepstral Coefficients (GFCC) based a new set of feature vectors using Gaussian Mixer model (GMM) for speaker identification. The MFCC are the default feature vectors for speaker recognition, but they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is GFCC features based on GMM feature extraction is to improve the overall speaker identification performance in low signal to noise ratio (SNR) conditions.

Download Full-text

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i2.pp782-789 ◽

2020 ◽

Vol 18 (2) ◽

pp. 782

Author(s):

Musab T. S. Al-Kaltakchi ◽

Haithem Abd Al-Raheem Taha ◽

Mohanad Abd Shehab ◽

Mohamed A.M. Abdullah

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture ◽

Identification Accuracy ◽

Identification System ◽

Good Representation ◽

Mel Frequency Cepstral Coefficients ◽

Normalization Methods ◽

Cepstral Coefficients

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>

Download Full-text

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Wireless Personal Communications ◽

10.1007/s11277-016-3530-3 ◽

2016 ◽

Vol 91 (3) ◽

pp. 1321-1333 ◽

Cited By ~ 5

Author(s):

S. Selva Nidhyananthan ◽

R. Shantha Selva Kumari ◽

T. Senthur Selvi

Keyword(s):

Filter Bank ◽

Speaker Identification ◽

Bank Structure ◽

Robust Speaker Identification ◽

Noise Robust

Download Full-text

SPEAKER IDENTIFICATION MENGGUNAKAN TRANSFORMASI WAVELET DISKRIT DAN JARINGAN SARAF TIRUAN BACK-PROPAGATION

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v2i1.482 ◽

2008 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Anny Tandyo ◽

Martono Martono ◽

Adi Widyatmoko

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Speaker Recognition ◽

Speaker Identification ◽

Back Propagation ◽

Identification Accuracy ◽

Identification System ◽

Accuracy Rate ◽

Discrete Transformation ◽

Artificial Neural

Article discussed a speaker identification system. Which was a part of speaker recognition. The system identified asubject based on the voice from a group of pattern had been saved before. This system used a wavelet discrete transformationas a feature extraction method and an artificial neural network of back-propagation as a classification method. The voiceinput was processed by the wavelet discrete transformation in order to obtain signal coefficient of low frequency as adecomposition result which kept voice characteristic of everyone. The coefficient then was classified artificial neural networkof back-propagation. A system trial was conducted by collecting voice samples directly by using 225 microphones in nonsoundproof rooms; contained of 15 subjects (persons) and each of them had 15 voice samples. The 10 samples were used as atraining voice and 5 others as a testing voice. Identification accuracy rate reached 84 percent. The testing was also done onthe subjects who pronounced same words. It can be concluded that, the similar selection of words by different subjects has noinfluence on the accuracy rate produced by system.Keywords: speaker identification, wavelet discrete transformation, artificial neural network, back-propagation.

Download Full-text

Acoustic and auxiliary speech features for speaker identification system

2015 57th International Symposium ELMAR (ELMAR) ◽

10.1109/elmar.2015.7334508 ◽

2015 ◽

Author(s):

Juraj Kacur ◽

Peter Truchly

Keyword(s):

Speaker Identification ◽

Identification System ◽

Speech Features

Download Full-text

An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0128 ◽

2016 ◽

Vol 25 (4) ◽

pp. 529-538

Author(s):

H.S. Jayanna ◽

B.G. Nagaraja

Keyword(s):

Speaker Recognition ◽

English Language ◽

Speaker Identification ◽

Poor Performance ◽

Gaussian Mixture ◽

Experimental Comparison ◽

Identification System ◽

Specific Information ◽

Self Organizing Map ◽

Modeling Techniques

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.

Download Full-text

A CNN based Speaker Recognition System using an Alternate Bone Microphone

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7647.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4224-4227

Keyword(s):

Speaker Recognition ◽

Input Data ◽

Speaker Identification ◽

Recognition System ◽

The Other ◽

Identification System ◽

Video Recorder ◽

State Of Art

State-of-art speaker recognition system uses acoustic microphone speech to identify/verify a speaker. The multimodal speaker recognition system includes modality of input data recorded using sources like acoustics mic,array mic ,throat mic, bone mic and video recorder. In this paper we implemented a multi-modal speaker identification system with three modality of speech as input, recorded from different microphones like air mic, throat mic and bone mic . we propose and claim an alternate way of recording the bone speech using a throat microphone and the results of a implemented speaker recognition using CNN and spectrogram is presented. The obtained results supports our claim to use the throat microphone as suitable mic to record the bone conducted speech and the accuracy of the speaker recognition system with signal speech recorded from air microphone get improved about 10% after including the other modality of speech like throat and bone speech along with the air conducted speech.

Download Full-text

A Novel Approach to Increase the Efficiency of a Multi-lingual Real-time Speaker Identification System

International Journal of Systems Applications, Engineering & Development ◽

10.46300/91015.2020.14.21 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Real Time ◽

Speech Enhancement ◽

Speaker Recognition ◽

Speaker Identification ◽

Cost Effective ◽

Recognition System ◽

Identification System ◽

Novel Approach ◽

Activity Method ◽

Voice Activity

Nowadays, the real-time speaker recognition system is very popular due to its cost-effective nature. However, it is a very challenging one to produce a more efficient speaker identification system. In our work, we work on a multi-lingual real-time speaker identification system. We work in a novel way to enhance the efficiency of the said system. We take some real speech signals and use different speech enhancement methods and our proposed voice activity method (VAD) to enhance the efficiency of said system. By doing so, we increase the accuracy of the said system relatively by 2% as compared to existing methods.

Download Full-text

A Self-Organizing Algorithm for Vector Quantizer Design Applied to Signal Processing

International Journal of Neural Systems ◽

10.1142/s0129065799000216 ◽

1999 ◽

Vol 09 (03) ◽

pp. 219-226 ◽

Cited By ~ 9

Author(s):

F. MADEIRO ◽

R. M. VILAR ◽

J. M. FECHINE ◽

B. G. AGUIAR NETO

Keyword(s):

Signal Processing ◽

Speaker Recognition ◽

Speaker Identification ◽

Rate Distortion ◽

Identification System ◽

Performance Bounds ◽

Codebook Design ◽

Vector Quantizer ◽

Definition Of ◽

Self Organizing

Vector quantization plays an important role in many signal processing problems, such as speech/speaker recognition and signal compression. This paper presents an unsupervised algorithm for vector quantizer design. Although the proposed method is inspired in Kohonen learning, it does not incorporate the classical definition of topological neighborhood as an array of nodes. Simulations are carried out to compare the performance of the proposed algorithm, named SOA (self-organizing algorithm), to that of the traditional LBG (Linde-Buzo-Gray) algorithm. The authors present an evaluation concerning the codebook design for Gauss-Markov and Gaussian sources, since the theoretic optimal performance bounds for these sources, as described by Shannon's Rate-Distortion Theory, are known. In speech and image compression, SOA codebooks lead to reconstructed (vector-quantized) signals with better quality as compared to the ones obtained by using LBG codebooks. Additionally, the influence of the initial codebook in the algorithm performance is investigated and the algorithm ability to learn representative patterns is evaluated. In a speaker identification system, it is shown that the the codebooks designed by SOA lead to higher identification rates when compared to the ones designed by LBG.

Download Full-text