scholarly journals Bidirectional Attention for Text-Dependent Speaker Verification

Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6784
Author(s):  
Xin Fang ◽  
Tian Gao ◽  
Liang Zou ◽  
Zhenhua Ling

Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal “DAN DAN NI HAO” benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.

Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 443
Author(s):  
Chyan-long Jan

Because of the financial information asymmetry, the stakeholders usually do not know a company’s real financial condition until financial distress occurs. Financial distress not only influences a company’s operational sustainability and damages the rights and interests of its stakeholders, it may also harm the national economy and society; hence, it is very important to build high-accuracy financial distress prediction models. The purpose of this study is to build high-accuracy and effective financial distress prediction models by two representative deep learning algorithms: Deep neural networks (DNN) and convolutional neural networks (CNN). In addition, important variables are selected by the chi-squared automatic interaction detector (CHAID). In this study, the data of Taiwan’s listed and OTC sample companies are taken from the Taiwan Economic Journal (TEJ) database during the period from 2000 to 2019, including 86 companies in financial distress and 258 not in financial distress, for a total of 344 companies. According to the empirical results, with the important variables selected by CHAID and modeling by CNN, the CHAID-CNN model has the highest financial distress prediction accuracy rate of 94.23%, and the lowest type I error rate and type II error rate, which are 0.96% and 4.81%, respectively.


2020 ◽  
Author(s):  
Anbiao Huang ◽  
Shuo Gao ◽  
Arokia Nathan

In Internet of Things (IoT) applications, among various authentication techniques, keystroke authentication methods based on a user’s touch behavior have received increasing attention, due to their unique benefits. In this paper, we present a technique for obtaining high user authentication accuracy by utilizing a user’s touch time and force information, which are obtained from an assembled piezoelectric touch panel. After combining artificial neural networks with the user’s touch features, an equal error rate (EER) of 1.09% is achieved, and hence advancing the development of security techniques in the field of IoT.


2020 ◽  
Vol 10 (18) ◽  
pp. 6571 ◽  
Author(s):  
Sung-Hyun Yoon ◽  
Jong-June Jeon ◽  
Ha-Jin Yu

In the field of speaker verification, probabilistic linear discriminant analysis (PLDA) is the dominant method for back-end scoring. To estimate the PLDA model, the between-class covariance and within-class precision matrices must be estimated from samples. However, the empirical covariance/precision estimated from samples has estimation errors due to the limited number of samples available. In this paper, we propose a method to improve the conventional PLDA by estimating the PLDA model using the regularized within-class precision matrix. We use graphical least absolute shrinking and selection operator (GLASSO) for the regularization. The GLASSO regularization decreases the estimation errors in the empirical precision matrix by making the precision matrix sparse, which corresponds to the reflection of the conditional independence structure. The experimental results on text-dependent speaker verification reveal that the proposed method reduce the relative equal error rate by up to 23% compared with the conventional PLDA.


2020 ◽  
Vol 10 (12) ◽  
pp. 4092 ◽  
Author(s):  
Sung-Hyun Yoon ◽  
Ha-Jin Yu

Recurrent neural networks (RNNs) can model the time-dependency of time-series data. It has also been widely used in text-dependent speaker verification to extract speaker-and-phrase-discriminant embeddings. As with other neural networks, RNNs are trained in mini-batch units. In order to feed input sequences into an RNN in mini-batch units, all the sequences in each mini-batch must have the same length. However, the sequences have variable lengths and we have no knowledge of these lengths in advance. Truncation/padding are most commonly used to make all sequences the same length. However, the truncation/padding causes information distortion because some information is lost and/or unnecessary information is added, which can degrade the performance of text-dependent speaker verification. In this paper, we propose a method to handle variable length sequences for RNNs without adding information distortion by truncating the output sequence so that it has the same length as corresponding original input sequence. The experimental results for the text-dependent speaker verification task in part 2 of RSR 2015 show that our method reduces the relative equal error rate by approximately 1.3% to 27.1%, depending on the task, compared to the baselines but with an associated, small overhead in execution time.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Robertas Damaševičius ◽  
Rytis Maskeliūnas ◽  
Egidijus Kazanavičius ◽  
Marcin Woźniak

Cryptographic frameworks depend on key sharing for ensuring security of data. While the keys in cryptographic frameworks must be correctly reproducible and not unequivocally connected to the identity of a user, in biometric frameworks this is different. Joining cryptography techniques with biometrics can solve these issues. We present a biometric authentication method based on the discrete logarithm problem and Bose-Chaudhuri-Hocquenghem (BCH) codes, perform its security analysis, and demonstrate its security characteristics. We evaluate a biometric cryptosystem using our own dataset of electroencephalography (EEG) data collected from 42 subjects. The experimental results show that the described biometric user authentication system is effective, achieving an Equal Error Rate (ERR) of 0.024.


Author(s):  
Dong-Dong Chen ◽  
Wei Wang ◽  
Wei Gao ◽  
Zhi-Hua Zhou

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.


Sign in / Sign up

Export Citation Format

Share Document