New Feature Vectors using GFCC for Speaker Identification

Author(s):  
A. Nagesh

The feature vectors of speaker identification system plays a crucial role in the overall performance of the system. There are many new feature vectors extraction methods based on MFCC, but ultimately we want to maximize the performance of SID system.  The objective of this paper to derive Gammatone Frequency Cepstral Coefficients (GFCC) based a new set of feature vectors using Gaussian Mixer model (GMM) for speaker identification. The MFCC are the default feature vectors for speaker recognition, but they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is  GFCC features based on GMM feature extraction is to improve the overall speaker identification performance in low signal to noise ratio (SNR) conditions.

The performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. Mel scale and Bark scale are designed according to human auditory system. The filter bank structure is defined using Mel and Bark scales for speech and speaker recognition systems to extract speaker specific speech features. In this work, performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. It is found that Bark scale centre frequencies are more effective than Mel scale centre frequencies in case of Indian dialect speaker databases. Mel scale is defined as per interpretation of pitch by human ear and Bark scale is based on critical band selectivity at which loudness becomes significantly different. The recognition rate achieved using Bark scale filter bank is 96% for AISSMSIOIT database and 95% for Marathi database.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Md. Rabiul Islam ◽  
Md. Abdus Sobhan

The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI) system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM) is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) are combined to get the audio feature vectors and Active Shape Model (ASM) based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA) method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.


Author(s):  
Anny Tandyo ◽  
Martono Martono ◽  
Adi Widyatmoko

Article discussed a speaker identification system. Which was a part of speaker recognition. The system identified asubject based on the voice from a group of pattern had been saved before. This system used a wavelet discrete transformationas a feature extraction method and an artificial neural network of back-propagation as a classification method. The voiceinput was processed by the wavelet discrete transformation in order to obtain signal coefficient of low frequency as adecomposition result which kept voice characteristic of everyone. The coefficient then was classified artificial neural networkof back-propagation. A system trial was conducted by collecting voice samples directly by using 225 microphones in nonsoundproof rooms; contained of 15 subjects (persons) and each of them had 15 voice samples. The 10 samples were used as atraining voice and 5 others as a testing voice. Identification accuracy rate reached 84 percent. The testing was also done onthe subjects who pronounced same words. It can be concluded that, the similar selection of words by different subjects has noinfluence on the accuracy rate produced by system.Keywords: speaker identification, wavelet discrete transformation, artificial neural network, back-propagation.


2020 ◽  
Vol 10 (15) ◽  
pp. 5256
Author(s):  
Jian Xue ◽  
Lan Tang ◽  
Xinggan Zhang ◽  
Lin Jin

To deal with the problem of reliability degradation of radar emitter identification (REID) based on the traditional five parameters in a complex electromagnetic environment, a new feature extraction method based on the autocorrelation function of coherent signals, which makes full use of the coherent characteristic of modern radar emitters, is proposed in this paper. The main idea of this paper is utilizing the instantaneous autocorrelation function to obtain the correlation results of coherent and noncoherent signals. To this end, a new feature parameter, named the ratio of the secondary peak value to the main peak value (SMR), is defined to describe the difference of correlation results between coherent and noncoherent signals. Through simulation analysis, the feasibility of using SMR as the coherent feature for REID is verified. In order to evaluate the effectiveness of the coherent feature, an analytical hierarchy process (AHP) was introduced to compare the comprehensive performance of the coherent feature and the existing parameters, and then convolution neural network (CNN) and support vector machine (SVM) were selected as the classifier to check the recognition capability of the proposed feature. Simulation results show that the proposed feature can not only be used as a new feature for REID but can also be used as a supplement to existing feature parameters to improve the accuracy of REID as it is more insensitive to the signal-to-noise ratio (SNR) and signal modulation type changes.


2016 ◽  
Vol 25 (4) ◽  
pp. 529-538
Author(s):  
H.S. Jayanna ◽  
B.G. Nagaraja

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.


The security of systems is a vital issue for any society. Hence, the need for authentication mechanisms that protect the confidentiality of users is important. This paper proposes a speech based security system that is able to identify Arabic speakers by using an Arabic word )شكرا (which means “Thank you”. The pre-processing steps are performed on the speech signals to enhance the signal to noise ratio. Features of speakers are obtained as Mel-Frequency Cepstral Coefficients (MFCC). Moreover, feature selection (FS) and radial basis function neural network (RBFNN) are implemented to classify and identify speakers. The proposed security system gives a 97.5% accuracy rate in its user identification process.


State-of-art speaker recognition system uses acoustic microphone speech to identify/verify a speaker. The multimodal speaker recognition system includes modality of input data recorded using sources like acoustics mic,array mic ,throat mic, bone mic and video recorder. In this paper we implemented a multi-modal speaker identification system with three modality of speech as input, recorded from different microphones like air mic, throat mic and bone mic . we propose and claim an alternate way of recording the bone speech using a throat microphone and the results of a implemented speaker recognition using CNN and spectrogram is presented. The obtained results supports our claim to use the throat microphone as suitable mic to record the bone conducted speech and the accuracy of the speaker recognition system with signal speech recorded from air microphone get improved about 10% after including the other modality of speech like throat and bone speech along with the air conducted speech.


Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1144
Author(s):  
Jian Xue ◽  
Lan Tang ◽  
Xinggan Zhang ◽  
Lin Jin

Aiming at the problem of reliability reduction of signal sorting in terms of the traditional five parameters and intrapulse feature in a complex electromagnetic environment, a new signal sorting method based on radar coherent characteristics is proposed. The main idea of this method is using spectrum analysis to obtain the spectrum images of coherent and noncoherent signals. Image-processing technology is used to extract the feature difference between the two spectrum images, and the central-moment feature is introduced to describe this difference. Through simulation analysis, the feasibility of using the central-moment feature as the coherent feature for signal sorting was proved. In order to check the effectiveness of the proposed feature, a number of simulations were conducted to demonstrate the sorting capability in terms of the coherent feature. From the simulations, it can be seen that the proposed feature not only can be used as a new feature for signal sorting but also that it can be utilized as a supplement for five typical parameters and the intrapulse feature to improve the sorting accuracy rate. Simulations also showed the proposed method could achieve satisfactory sorting results in a low signal-to-noise ratio (SNR). When the SNR was 5 dB, the sorting accuracy rate could reach 98%.


Sign in / Sign up

Export Citation Format

Share Document