Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning

Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach.

Download Full-text

Deep Siamese Metric Learning: A Highly Scalable Approach to Searching Unordered Sets of Trajectories

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3465057 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-23

Author(s):

Christoffer Löffler ◽

Luca Reeb ◽

Daniel Dzibela ◽

Robert Marzilger ◽

Nicolas Witt ◽

...

Keyword(s):

Assignment Problem ◽

Network Architecture ◽

Metric Learning ◽

Representation Learning ◽

Trajectory Data ◽

Convolutional Network ◽

Professional Soccer ◽

Gating Mechanism ◽

Previous State ◽

Low Dimensional

This work proposes metric learning for fast similarity-based scene retrieval of unstructured ensembles of trajectory data from large databases. We present a novel representation learning approach using Siamese Metric Learning that approximates a distance preserving low-dimensional representation and that learns to estimate reasonable solutions to the assignment problem. To this end, we employ a Temporal Convolutional Network architecture that we extend with a gating mechanism to enable learning from sparse data, leading to solutions to the assignment problem exhibiting varying degrees of sparsity. Our experimental results on professional soccer tracking data provides insights on learned features and embeddings, as well as on generalization, sensitivity, and network architectural considerations. Our low approximation errors for learned representations and the interactive performance with retrieval times several magnitudes smaller shows that we outperform previous state of the art.

Download Full-text

Centroid-based Deep Metric Learning for Speaker Recognition

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683393 ◽

2019 ◽

Cited By ~ 10

Author(s):

Jixuan Wang ◽

Kuan-Chieh Wang ◽

Marc T. Law ◽

Frank Rudzicz ◽

Michael Brudno

Keyword(s):

Speaker Recognition ◽

Metric Learning ◽

Deep Metric Learning

Download Full-text

Bayesian distance metric learning and its application in automatic speaker recognition systems

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2960-2967 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2960

Author(s):

Satyanand Singh

Keyword(s):

Distance Learning ◽

Covariance Matrix ◽

Speaker Recognition ◽

Metric Learning ◽

Recognition System ◽

Training Data ◽

Distance Metric ◽

Automatic Speaker Recognition ◽

Data Pair ◽

Metric Distance

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.

Download Full-text