Speaker Verification Employing Combinations of Self-Attention Mechanisms

One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker’s properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.

Download Full-text

Robustness Speaker Recognition Based on Feature Space in Clean and Noisy Condition

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327909666181219143918 ◽

2019 ◽

Vol 9 (4) ◽

pp. 497-506 ◽

Cited By ~ 1

Author(s):

Khamis A. Al-Karawi

Keyword(s):

Speech Processing ◽

Speaker Recognition ◽

System Performance ◽

Speaker Verification ◽

Signal To Noise Ratio ◽

Recognition Performance ◽

Feature Space ◽

Signal To Noise ◽

Verification Systems ◽

Noisy Condition

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.

Download Full-text

Misuse Detection for Mobile Devices Using Behaviour Profiling

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2011010105 ◽

2011 ◽

Vol 1 (1) ◽

pp. 41-53 ◽

Cited By ~ 6

Author(s):

Fudong Li ◽

Nathan Clarke ◽

Maria Papadaki ◽

Paul Dowland

Keyword(s):

Mobile Devices ◽

Error Rate ◽

Text Message ◽

Modern Society ◽

Experimental Results ◽

Equal Error Rate ◽

Misuse Detection ◽

General Application ◽

Point Of Entry ◽

Application Specific

Mobile devices have become essential to modern society; however, as their popularity has grown, so has the requirement to ensure devices remain secure. This paper proposes a behaviour-based profiling technique using a mobile user’s application usage to detect abnormal activities. Through operating transparently to the user, the approach offers significant advantages over traditional point-of-entry authentication and can provide continuous protection. The experiment employed the MIT Reality dataset and a total of 45,529 log entries. Four experiments were devised based on an application-level dataset containing the general application; two application-specific datasets combined with telephony and text message data; and a combined dataset that included both application-level and application-specific. Based on the experiments, a user’s profile was built using either static or dynamic profiles and the best experimental results for the application-level applications, telephone, text message, and multi-instance applications were an EER (Equal Error Rate) of 13.5%, 5.4%, 2.2%, and 10%, respectively.

Download Full-text

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System

Electronics ◽

10.3390/electronics9101706 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1706

Author(s):

Soonshin Seo ◽

Ji-Hwan Kim

Keyword(s):

State Of The Art ◽

Speaker Verification ◽

Model Parameters ◽

Equal Error Rate ◽

Layer Depth ◽

Verification System ◽

Evaluation Dataset ◽

Representational Power ◽

Fully Connected ◽

Text Independent Speaker Verification

One of the most important parts of a text-independent speaker verification system is speaker embedding generation. Previous studies demonstrated that shortcut connections-based multi-layer aggregation improves the representational power of a speaker embedding system. However, model parameters are relatively large in number, and unspecified variations increase in the multi-layer aggregation. Therefore, in this study, we propose a self-attentive multi-layer aggregation with feature recalibration and deep length normalization for a text-independent speaker verification system. To reduce the number of model parameters, we set the ResNet with the scaled channel width and layer depth as a baseline. To control the variability in the training, we apply a self-attention mechanism to perform multi-layer aggregation with dropout regularizations and batch normalizations. Subsequently, we apply a feature recalibration layer to the aggregated feature using fully-connected layers and nonlinear activation functions. Further, deep length normalization is used on a recalibrated feature in the training process. Experimental results using the VoxCeleb1 evaluation dataset showed that the performance of the proposed methods was comparable to that of state-of-the-art models (equal error rate of 4.95% and 2.86%, using the VoxCeleb1 and VoxCeleb2 training datasets, respectively).

Download Full-text

Analysis of Speaker Verification System Using Support Vector Machine

JOURNAL OF ADVANCES IN CHEMISTRY ◽

10.24297/jac.v13i10.5839 ◽

2017 ◽

Vol 13 (10) ◽

pp. 6531-6542

Author(s):

P Shanmugapriya ◽

Y. Venkataramani

Keyword(s):

Support Vector Machine ◽

Speaker Recognition ◽

Speaker Verification ◽

Fuzzy Theory ◽

Support Vector ◽

Fuzzy Support Vector Machine ◽

System A ◽

Verification System ◽

Svm Model

The integration of GMM- super vector and Support Vector Machine (SVM) has become one of most popular strategy in text-independent speaker verification system.Â This paper describes the application of Fuzzy Support Vector Machine (FSVM) for classification of speakers using GMM-super vectors. Super vectors are formed by stacking the mean vectors of adapted GMMs from UBM using maximum a posteriori (MAP). GMM super vectors characterize speakerâ€™s acoustic characteristics which are used for developing a speaker dependent fuzzy SVM model. Introducing fuzzy theory in support vector machine yields better classification accuracy and requires less number of support vectors. Experiments were conducted on 2001 NIST speaker recognition evaluation corpus. Performance of GMM-FSVM based speaker verification system is compared with the conventional GMM-UBM and GMM-SVM based systems.Â Experimental results indicate that the fuzzy SVM based speaker verification system with GMM super vector achieves better performance to GMM-UBM system. Â

Download Full-text

Speaker Verification and Identification

Behavioral Biometrics for Human Identification ◽

10.4018/978-1-60566-725-6.ch013 ◽

2010 ◽

pp. 264-289

Author(s):

Minho Jin ◽

Chang D. Yoo

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Speaker Identification ◽

Speaker Verification ◽

Gaussian Mixture ◽

Recognition System ◽

Essential Elements ◽

Verification System ◽

Biometric Characteristic ◽

Speaker Modeling

A speaker recognition system verifies or identifies a speaker’s identity based on his/her voice. It is considered as one of the most convenient biometric characteristic for human machine communication. This chapter introduces several speaker recognition systems and examines their performances under various conditions. Speaker recognition can be classified into either speaker verification or speaker identification. Speaker verification aims to verify whether an input speech corresponds to a claimed identity, and speaker identification aims to identify an input speech by selecting one model from a set of enrolled speaker models. Both the speaker verification and identification system consist of three essential elements: feature extraction, speaker modeling, and matching. The feature extraction pertains to extracting essential features from an input speech for speaker recognition. The speaker modeling pertains to probabilistically modeling the feature of the enrolled speakers. The matching pertains to matching the input feature to various speaker models. Speaker modeling techniques including Gaussian mixture model (GMM), hidden Markov model (HMM), and phone n-grams are presented, and in this chapter, their performances are compared under various tasks. Several verification and identification experimental results presented in this chapter indicate that speaker recognition performances are highly dependent on the acoustical environment. A comparative study between human listeners and an automatic speaker verification system is presented, and it indicates that an automatic speaker verification system can outperform human listeners. The applications of speaker recognition are summarized, and finally various obstacles that must be overcome are discussed.

Download Full-text

Combining Cryptography with EEG Biometrics

Computational Intelligence and Neuroscience ◽

10.1155/2018/1867548 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 15

Author(s):

Robertas Damaševičius ◽

Rytis Maskeliūnas ◽

Egidijus Kazanavičius ◽

Marcin Woźniak

Keyword(s):

Error Rate ◽

User Authentication ◽

Discrete Logarithm ◽

Security Analysis ◽

Experimental Results ◽

Biometric Authentication ◽

Equal Error Rate ◽

Eeg Data ◽

Authentication System ◽

Biometric Cryptosystem

Cryptographic frameworks depend on key sharing for ensuring security of data. While the keys in cryptographic frameworks must be correctly reproducible and not unequivocally connected to the identity of a user, in biometric frameworks this is different. Joining cryptography techniques with biometrics can solve these issues. We present a biometric authentication method based on the discrete logarithm problem and Bose-Chaudhuri-Hocquenghem (BCH) codes, perform its security analysis, and demonstrate its security characteristics. We evaluate a biometric cryptosystem using our own dataset of electroencephalography (EEG) data collected from 42 subjects. The experimental results show that the described biometric user authentication system is effective, achieving an Equal Error Rate (ERR) of 0.024.

Download Full-text

Palm Vein Recognition Algorithm Using Curvelet and Wavelet

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.710.655 ◽

2013 ◽

Vol 710 ◽

pp. 655-659

Author(s):

Zhi Xian Jiu ◽

Qiang Li

Keyword(s):

Wavelet Transform ◽

Error Rate ◽

Minimum Distance ◽

Image Database ◽

Recognition System ◽

Recognition Algorithm ◽

Experimental Results ◽

Equal Error Rate ◽

Vein Recognition ◽

Palm Vein

In this paper we report on a curvelet and wavelet based palm vein recognition algorithm. Using our palm vein image database, we employed minimum distance classifier to test the performance of the system. Experimental results show that the algorithm based on cuvelet transform can reach equal error rate of 1.7%, and the algorithm based on wavelet transform can only reach equal error rate of 2.3%, indicating that the curvelet based palm vein recognition system improves representation.

Download Full-text

Bidirectional Attention for Text-Dependent Speaker Verification

Sensors ◽

10.3390/s20236784 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6784

Author(s):

Xin Fang ◽

Tian Gao ◽

Liang Zou ◽

Zhenhua Ling

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Cost Function ◽

Error Rate ◽

Speaker Verification ◽

Feature Learning ◽

Biometric Authentication ◽

Equal Error Rate ◽

Text Dependent Speaker Verification ◽

Target Speaker

Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.

Download Full-text

A Preliminary Study on Non-Intrusive User Authentication Method Using Smartphone Sensors

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3270 ◽

2013 ◽

Vol 284-287 ◽

pp. 3270-3274 ◽

Cited By ~ 1

Author(s):

Chien Cheng Lin ◽

Chin Chun Chang ◽

De Ron Liang ◽

Ching Han Yang

Keyword(s):

Error Rate ◽

User Authentication ◽

Experimental Results ◽

Equal Error Rate ◽

Behavioral Biometrics ◽

Reported Study ◽

Smartphone Sensors ◽

Preliminary Study ◽

Orientation Sensor

This paper proposes a non-intrusive authentication method based on two sensitive apparatus of smartphones, namely, the orientation sensor and the touchscreen. We have found that these two sensors are capable of capturing behavioral biometrics of a user while the user is engaged in relatively stationary activities. The experimental results with respect to two types of flick operating have an equal error rate of about 3.5% and 5%, respectively. To the best of our knowledge, this work is the first publicly reported study that simultaneously adopts the orientation sensor and the touchscreen to build an authentication model for smartphone users. Finally, we show that the proposed approach can be used together with existing intrusive mechanisms, such as password and/or fingerprints, to build a more robust authentication framework for smartphone users.

Download Full-text

STUDY OF PERCEPTUAL SIMILARITY BETWEEN DIFFERENT LEXICONS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001404003629 ◽

2004 ◽

Vol 18 (07) ◽

pp. 1321-1338 ◽

Cited By ~ 4

Author(s):

CINTHIA O. A. FREITAS ◽

FLÁVIO BORTOLOZZI ◽

ROBERT SABOURIN

Keyword(s):

Visual Perception ◽

Error Rate ◽

Recognition Performance ◽

Recognition Rate ◽

Experimental Results ◽

Perceptual Similarity ◽

Training Set ◽

Feature Similarity ◽

Observation Sequence ◽

French Words

The study investigates the perceptual feature similarity between different lexicons based on visual perception of the words and their representation through an observation sequence. We confirm that it is possible to use databases, which are similar in terms of morphological/perceptual features to improve the recognition performance. In this work, we demonstrated through experimentation, that it is possible to improve the recognition rate of handwritten Portuguese words by adding samples of French words in the training set. Experimental results show the efficiency of this strategy reducing the error rate.

Download Full-text