When Speaker Recognition Meets Noisy Labels: Optimizations for Front-ends and Back-ends

10.36227/techrxiv.17121863.v1 ◽

2021 ◽

Author(s):

Lin Li ◽

Fuchuan Tong ◽

Qingyang Hong

Keyword(s):

Speaker Recognition ◽

Large Scale ◽

Recognition Performance ◽

Recognition System ◽

Correction Method ◽

Superior Performance ◽

Label Noise ◽

Practical Applications ◽

Front End ◽

Noisy Labels

A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identity back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale and correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades the speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve the understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise for the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE0410 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the SRE04-10 dataset is made up of label errors.<br>

Download Full-text

When Speaker Recognition Meets Noisy Labels: Optimizations for Front-ends and Back-ends

10.36227/techrxiv.17121863 ◽

2021 ◽

Author(s):

Lin Li ◽

Fuchuan Tong ◽

Qingyang Hong

Keyword(s):

Speaker Recognition ◽

Large Scale ◽

Recognition Performance ◽

Recognition System ◽

Correction Method ◽

Superior Performance ◽

Label Noise ◽

Practical Applications ◽

Front End ◽

Noisy Labels

A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identity back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale and correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades the speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve the understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise for the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE0410 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the SRE04-10 dataset is made up of label errors.<br>

Download Full-text

U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data

Applied Sciences ◽

10.3390/app112110079 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10079

Author(s):

Muhammad Firoz Mridha ◽

Abu Quwsar Ohi ◽

Muhammad Mostafa Monowar ◽

Md. Abdul Hamid ◽

Md. Rashedul Islam ◽

...

Keyword(s):

Speaker Recognition ◽

Large Scale ◽

English Language ◽

Domain Adaptation ◽

Recognition System ◽

Extraction Process ◽

Unlabeled Data ◽

Training Strategy ◽

Speech Segment ◽

Recognition Systems

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.

Download Full-text

Gammachirp Filter Banks Applied in Roust Speaker Recognition Based GMM-UBM Classifier

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/2/4 ◽

2019 ◽

Vol 17 (2) ◽

pp. 170-177

Author(s):

Lei Deng ◽

Yong Gao

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Recognition Performance ◽

Recognition System ◽

Cube Root ◽

Mel Frequency Cepstral Coefficients ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Auditory Feature ◽

Cepstral Coefficients

In this paper, authors propose an auditory feature extraction algorithm in order to improve the performance of the speaker recognition system in noisy environments. In this auditory feature extraction algorithm, the Gammachirp filter bank is adapted to simulate the auditory model of human cochlea. In addition, the following three techniques are applied: cube-root compression method, Relative Spectral Filtering Technique (RASTA), and Cepstral Mean and Variance Normalization algorithm (CMVN).Subsequently, based on the theory of Gaussian Mixes Model-Universal Background Model (GMM-UBM), the simulated experiment was conducted. The experimental results implied that speaker recognition systems with the new auditory feature has better robustness and recognition performance compared to Mel-Frequency Cepstral Coefficients(MFCC), Relative Spectral-Perceptual Linear Predictive (RASTA-PLP),Cochlear Filter Cepstral Coefficients (CFCC) and gammatone Frequency Cepstral Coefficeints (GFCC)

Download Full-text

An Adaptive Prediction-Correction Method for Solving Large-Scale Nonlinear Systems of Monotone Equations with Applications

Abstract and Applied Analysis ◽

10.1155/2013/619123 ◽

2013 ◽

Vol 2013 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Gaohang Yu ◽

Shanzhou Niu ◽

Jianhua Ma ◽

Yisheng Song

Keyword(s):

Nonlinear Systems ◽

Large Scale ◽

Signal Reconstruction ◽

Correction Method ◽

Merit Function ◽

Convergence Result ◽

Image Deconvolution ◽

Practical Applications ◽

Monotone Equations ◽

Adaptive Prediction

Combining multivariate spectral gradient method with projection scheme, this paper presents an adaptive prediction-correction method for solving large-scale nonlinear systems of monotone equations. The proposed method possesses some favorable properties: (1) it is progressive step by step, that is, the distance between iterates and the solution set is decreasing monotonically; (2) global convergence result is independent of the merit function and its Lipschitz continuity; (3) it is a derivative-free method and could be applied for solving large-scale nonsmooth equations due to its lower storage requirement. Preliminary numerical results show that the proposed method is very effective. Some practical applications of the proposed method are demonstrated and tested on sparse signal reconstruction, compressed sensing, and image deconvolution problems.

Download Full-text

MetaStore: A Task-adaptative Meta-learning Model for Optimal Store Placement with Multi-city Knowledge Transfer

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3447271 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1-23

Author(s):

Yan Liu ◽

Bin Guo ◽

Daqing Zhang ◽

Djamal Zeghlache ◽

Jingmin Chen ◽

...

Keyword(s):

Large Scale ◽

Learning Strategy ◽

Learning Algorithm ◽

Data Distribution ◽

Superior Performance ◽

Quality Data ◽

Practical Applications ◽

Multiple Data ◽

Brick And Mortar ◽

Meta Learning

Optimal store placement aims to identify the optimal location for a new brick-and-mortar store that can maximize its sale by analyzing and mining users’ preferences from large-scale urban data. In recent years, the expansion of chain enterprises in new cities brings some challenges because of two aspects: (1) data scarcity in new cities, so most existing models tend to not work (i.e., overfitting), because the superior performance of these works is conditioned on large-scale training samples; (2) data distribution discrepancy among different cities, so knowledge learned from other cities cannot be utilized directly in new cities. In this article, we propose a task-adaptative model-agnostic meta-learning framework, namely, MetaStore, to tackle these two challenges and improve the prediction performance in new cities with insufficient data for optimal store placement, by transferring prior knowledge learned from multiple data-rich cities. Specifically, we develop a task-adaptative meta-learning algorithm to learn city-specific prior initializations from multiple cities, which is capable of handling the multimodal data distribution and accelerating the adaptation in new cities compared to other methods. In addition, we design an effective learning strategy for MetaStore to promote faster convergence and optimization by sampling high-quality data for each training batch in view of noisy data in practical applications. The extensive experimental results demonstrate that our proposed method leads to state-of-the-art performance compared with various baselines.

Download Full-text

In situ TEM Studies of agglomeration of sub-nanometer Ru layers in Ru/C multilayers

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100121685 ◽

1992 ◽

Vol 50 (1) ◽

pp. 256-257

Author(s):

Tai D. Nguyen ◽

Ronald Gronsky ◽

Jeffrey B. Kortright

Keyword(s):

Layered Structure ◽

Driving Force ◽

High Energy ◽

Superior Performance ◽

In Situ Tem ◽

Rayleigh Instability ◽

Practical Applications ◽

Transmission Electron ◽

Diffraction Patterns

Nanometer period Ru/C multilayers are one of the prime candidates for normal incident reflecting mirrors at wavelengths < 10 nm. Superior performance, which requires uniform layers and smooth interfaces, and high stability of the layered structure under thermal loadings are some of the demands in practical applications. Previous studies however show that the Ru layers in the 2 nm period Ru/C multilayer agglomerate upon moderate annealing, and the layered structure is no longer retained. This agglomeration and crystallization of the Ru layers upon annealing to form almost spherical crystallites is a result of the reduction of surface or interfacial energy from die amorphous high energy non-equilibrium state of the as-prepared sample dirough diffusive arrangements of the atoms. Proposed models for mechanism of thin film agglomeration include one analogous to Rayleigh instability, and grain boundary grooving in polycrystalline films. These models however are not necessarily appropriate to explain for the agglomeration in the sub-nanometer amorphous Ru layers in Ru/C multilayers. The Ru-C phase diagram shows a wide miscible gap, which indicates the preference of phase separation between these two materials and provides an additional driving force for agglomeration. In this paper, we study the evolution of the microstructures and layered structure via in-situ Transmission Electron Microscopy (TEM), and attempt to determine the order of occurence of agglomeration and crystallization in the Ru layers by observing the diffraction patterns.

Download Full-text

IRIS AND FINGER VEIN MULTI MODEL RECOGNITION SYSTEM BASED ON SIFT FEATURES

Journal of Advanced Sciences and Engineering Technologies ◽

10.32441/jaset.v1i2.119 ◽

2018 ◽

Vol 1 (2) ◽

pp. 34-44

Author(s):

Faris E Mohammed ◽

Dr. Eman M ALdaidamony ◽

Prof. A. M Raid

Keyword(s):

Iris Recognition ◽

Recognition Performance ◽

Recognition System ◽

Individual Identification ◽

Work Place ◽

Identification Process ◽

Finger Vein ◽

Noise Point ◽

Vein Recognition ◽

A New Technique

Individual identification process is a very significant process that resides a large portion of day by day usages. Identification process is appropriate in work place, private zones, banks …etc. Individuals are rich subject having many characteristics that can be used for recognition purpose such as finger vein, iris, face …etc. Finger vein and iris key-points are considered as one of the most talented biometric authentication techniques for its security and convenience. SIFT is new and talented technique for pattern recognition. However, some shortages exist in many related techniques, such as difficulty of feature loss, feature key extraction, and noise point introduction. In this manuscript a new technique named SIFT-based iris and SIFT-based finger vein identification with normalization and enhancement is proposed for achieving better performance. In evaluation with other SIFT-based iris or SIFT-based finger vein recognition algorithms, the suggested technique can overcome the difficulties of tremendous key-point extraction and exclude the noise points without feature loss. Experimental results demonstrate that the normalization and improvement steps are critical for SIFT-based recognition for iris and finger vein , and the proposed technique can accomplish satisfactory recognition performance. Keywords: SIFT, Iris Recognition, Finger Vein identification and Biometric Systems. © 2018 JASET, International Scholars and Researchers Association

Download Full-text