Zero-shot Metric Learning

Author(s):  
Xinyi Xu ◽  
Huanhuan Cao ◽  
Yanhua Yang ◽  
Erkun Yang ◽  
Cheng Deng

In this work, we tackle the zero-shot metric learning problem and propose a novel method abbreviated as ZSML, with the purpose to learn a distance metric that measures the similarity of unseen categories (even unseen datasets). ZSML achieves strong transferability by capturing multi-nonlinear yet continuous relation among data. It is motivated by two facts: 1) relations can be essentially described from various perspectives; and 2) traditional binary supervision is insufficient to represent continuous visual similarity. Specifically, we first reformulate a collection of specific-shaped convolutional kernels to combine data pairs and generate multiple relation vectors. Furthermore, we design a new cross-update regression loss to discover continuous similarity. Extensive experiments including intra-dataset transfer and inter-dataset transfer on four benchmark datasets demonstrate that ZSML can achieve state-of-the-art performance.

Author(s):  
Tao Jin ◽  
Siyu Huang ◽  
Ming Chen ◽  
Yingming Li ◽  
Zhongfei Zhang

In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Wei Yang ◽  
Luhui Xu ◽  
Xiaopan Chen ◽  
Fengbin Zheng ◽  
Yang Liu

Learning a proper distance metric for histogram data plays a crucial role in many computer vision tasks. The chi-squared distance is a nonlinear metric and is widely used to compare histograms. In this paper, we show how to learn a general form of chi-squared distance based on the nearest neighbor model. In our method, the margin of sample is first defined with respect to the nearest hits (nearest neighbors from the same class) and the nearest misses (nearest neighbors from the different classes), and then the simplex-preserving linear transformation is trained by maximizing the margin while minimizing the distance between each sample and its nearest hits. With the iterative projected gradient method for optimization, we naturally introduce thel2,1norm regularization into the proposed method for sparse metric learning. Comparative studies with the state-of-the-art approaches on five real-world datasets verify the effectiveness of the proposed method.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Huaping Guo ◽  
Weimei Zhi ◽  
Hongbing Liu ◽  
Mingliang Xu

In recent years, imbalanced learning problem has attracted more and more attentions from both academia and industry, and the problem is concerned with the performance of learning algorithms in the presence of data with severe class distribution skews. In this paper, we apply the well-known statistical model logistic discrimination to this problem and propose a novel method to improve its performance. To fully consider the class imbalance, we design a new cost function which takes into account the accuracies of both positive class and negative class as well as the precision of positive class. Unlike traditional logistic discrimination, the proposed method learns its parameters by maximizing the proposed cost function. Experimental results show that, compared with other state-of-the-art methods, the proposed one shows significantly better performance on measures of recall,g-mean,f-measure, AUC, and accuracy.


Author(s):  
Yuqing Ma ◽  
Shihao Bai ◽  
Shan An ◽  
Wei Liu ◽  
Aishan Liu ◽  
...  

Few-shot learning, aiming to learn novel concepts from few labeled examples, is an interesting and very challenging problem with many practical advantages. To accomplish this task, one should concentrate on revealing the accurate relations of the support-query pairs. We propose a transductive relation-propagation graph neural network (TRPN) to explicitly model and propagate such relations across support-query pairs. Our TRPN treats the relation of each support-query pair as a graph node, named relational node, and resorts to the known relations between support samples, including both intra-class commonality and inter-class uniqueness, to guide the relation propagation in the graph, generating the discriminative relation embeddings for support-query pairs. A pseudo relational node is further introduced to propagate the query characteristics, and a fast, yet effective transductive learning strategy is devised to fully exploit the relation information among different queries. To the best of our knowledge, this is the first work that explicitly takes the relations of support-query pairs into consideration in few-shot learning, which might offer a new way to solve the few-shot learning problem. Extensive experiments conducted on several benchmark datasets demonstrate that our method can significantly outperform a variety of state-of-the-art few-shot learning methods.


Author(s):  
Kai Liu ◽  
Lodewijk Brand ◽  
Hua Wang ◽  
Feiping Nie

Metric Learning, which aims at learning a distance metric for a given data set, plays an important role in measuring the distance or similarity between data objects. Due to its broad usefulness, it has attracted a lot of interest in machine learning and related areas in the past few decades. This paper proposes to learn the distance metric from the side information in the forms of must-links and cannot-links. Given the pairwise constraints, our goal is to learn a Mahalanobis distance that minimizes the ratio of the distances of the data pairs in the must-links to those in the cannot-links. Different from many existing papers that use the traditional squared L2-norm distance, we develop a robust model that is less sensitive to data noise or outliers by using the not-squared L2-norm distance. In our objective, the orthonormal constraint is enforced to avoid degenerate solutions. To solve our objective, we have derived an efficient iterative solution algorithm. We have conducted extensive experiments, which demonstrated the superiority of our method over state-of-the-art.


2021 ◽  
Author(s):  
Xiao-wei CHEN

<p>Generalized zero-shot learning (GZSL) is one of the most realistic problems, but also one of the most challenging problems due to the partiality of the classifier to supervised classes. Instance-borrowing methods and synthesizing methods solve this problem to some extent with the help of testing semantics, but therefore neither can be used under the class-inductive instance-inductive (CIII) training setting where testing data are not available, and the latter require the training process of a classifier after generating examples. In contrast, a novel method called Semantic Borrowing for improving GZSL methods with compatibility metric learning under CIII is proposed in this paper. It borrows similar semantics in the training set, so that the classifier can model the relationship between the semantics of zero-shot and supervised classes more accurately during training. In practice, the information of semantics of unseen or unknown classes would not be available for training while this approach does NOT need any information of semantics of unseen or unknown classes. The experimental results on representative GZSL benchmark datasets show that it can reduce the partiality of the classifier to supervised classes and improve the performance of generalized zero-shot classification.</p>


Author(s):  
Shuo Chen ◽  
Chen Gong ◽  
Jian Yang ◽  
Xiang Li ◽  
Yang Wei ◽  
...  

In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the different samplings between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the sampling bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both adversarial pairs and original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to representative state-of-the-art metric learning models.


Author(s):  
Yongguo Ling ◽  
Zhiming Luo ◽  
Yaojin Lin ◽  
Shaozi Li

The challenges of visible-thermal person re-identification (VT-ReID) lies in the inter-modality discrepancy and the intra-modality variations. An appropriate metric learning plays a crucial role in optimizing the feature similarity between the two modalities. However, most existing metric learning-based methods mainly constrain the similarity between individual instances or class centers, which are inadequate to explore the rich data relationships in the cross-modality data. Besides, most of these methods fail to consider the importance of different pairs, incurring an inefficiency and ineffectiveness of optimization. To address these issues, we propose a Multi-Constraint (MC) similarity learning method that jointly considers the cross-modality relationships from three different aspects, i.e., Instance-to-Instance (I2I), Center-to-Instance (C2I), and Center-to-Center (C2C). Moreover, we devise an Adaptive Weighting Loss (AWL) function to implement the MC efficiently. In the AWL, we first use an adaptive margin pair mining to select informative pairs and then adaptively adjust weights of mined pairs based on their similarity. Finally, the mined and weighted pairs are used for the metric learning. Extensive experiments on two benchmark datasets demonstrate the superior performance of the proposed over the state-of-the-art methods.


Computation ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 20 ◽  
Author(s):  
Yunfei Teng ◽  
Anna Choromanska

The unsupervised image-to-image translation aims at finding a mapping between the source ( A ) and target ( B ) image domains, where in many applications aligned image pairs are not available at training. This is an ill-posed learning problem since it requires inferring the joint probability distribution from marginals. Joint learning of coupled mappings F A B : A → B and F B A : B → A is commonly used by the state-of-the-art methods, like CycleGAN to learn this translation by introducing cycle consistency requirement to the learning problem, i.e., F A B ( F B A ( B ) ) ≈ B and F B A ( F A B ( A ) ) ≈ A . Cycle consistency enforces the preservation of the mutual information between input and translated images. However, it does not explicitly enforce F B A to be an inverse operation to F A B . We propose a new deep architecture that we call invertible autoencoder (InvAuto) to explicitly enforce this relation. This is done by forcing an encoder to be an inverted version of the decoder, where corresponding layers perform opposite mappings and share parameters. The mappings are constrained to be orthonormal. The resulting architecture leads to the reduction of the number of trainable parameters (up to 2 times). We present image translation results on benchmark datasets and demonstrate state-of-the art performance of our approach. Finally, we test the proposed domain adaptation method on the task of road video conversion. We demonstrate that the videos converted with InvAuto have high quality and show that the NVIDIA neural-network-based end-to-end learning system for autonomous driving, known as PilotNet, trained on real road videos performs well when tested on the converted ones.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Muhamamd Adnan Syed ◽  
Zhenjun Han ◽  
Zhaoju Li ◽  
Jianbin Jiao

In person reidentification distance metric learning suffers a great challenge from impostor persons. Mostly, distance metrics are learned by maximizing the similarity between positive pair against impostors that lie on different transform modals. In addition, these impostors are obtained from Gallery view for query sample only, while the Gallery sample is totally ignored. In real world, a given pair of query and Gallery experience different changes in pose, viewpoint, and lighting. Thus, impostors only from Gallery view can not optimally maximize their similarity. Therefore, to resolve these issues we have proposed an impostor resilient multimodal metric (IRM3). IRM3 is learned for each modal transform in the image space and uses impostors from both Probe and Gallery views to effectively restrict large number of impostors. Learned IRM3 is then evaluated on three benchmark datasets, VIPeR, CUHK01, and CUHK03, and shows significant improvement in performance compared to many previous approaches.


Sign in / Sign up

Export Citation Format

Share Document