Regularized Within-Class Precision Matrix Based PLDA in Text-Dependent Speaker Verification

In the field of speaker verification, probabilistic linear discriminant analysis (PLDA) is the dominant method for back-end scoring. To estimate the PLDA model, the between-class covariance and within-class precision matrices must be estimated from samples. However, the empirical covariance/precision estimated from samples has estimation errors due to the limited number of samples available. In this paper, we propose a method to improve the conventional PLDA by estimating the PLDA model using the regularized within-class precision matrix. We use graphical least absolute shrinking and selection operator (GLASSO) for the regularization. The GLASSO regularization decreases the estimation errors in the empirical precision matrix by making the precision matrix sparse, which corresponds to the reflection of the conditional independence structure. The experimental results on text-dependent speaker verification reveal that the proposed method reduce the relative equal error rate by up to 23% compared with the conventional PLDA.

Download Full-text

A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification

Expert Systems ◽

10.1111/exsy.12496 ◽

2020 ◽

Vol 37 (3) ◽

Author(s):

Mohammad Azharuddin Laskar ◽

Rabul Hussain Laskar

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Fuzzy Clustering ◽

Speaker Verification ◽

Linear Discriminant ◽

Analysis System ◽

Text Dependent Speaker Verification

Download Full-text

Bidirectional Attention for Text-Dependent Speaker Verification

Sensors ◽

10.3390/s20236784 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6784

Author(s):

Xin Fang ◽

Tian Gao ◽

Liang Zou ◽

Zhenhua Ling

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Cost Function ◽

Error Rate ◽

Speaker Verification ◽

Feature Learning ◽

Biometric Authentication ◽

Equal Error Rate ◽

Text Dependent Speaker Verification ◽

Target Speaker

Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.

Download Full-text