scholarly journals Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer

Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Baochang Zhang ◽  
Yongjian Wu ◽  
...  

Recent advances on fine-grained image retrieval prefer learning convolutional neural network (CNN) with specific fullyconnect layer designed loss function for discriminative feature representation. Essentially, such loss should establish a robust metric to efficiently distinguish high-dimensional features within and outside fine-grained categories. To this end, the existing loss functions are defected in two aspects: (a) The feature relationship is encoded inside the training batch. Such a local scope leads to low accuracy. (b) The error is established by the mean square, which needs pairwise distance computation in training set and results in low efficiency. In this paper, we propose a novel metric learning scheme, termed Normalize-Scale Layer and Decorrelated Global Centralized Ranking Loss, which achieves extremely efficient and discriminative learning, i.e., 5× speedup over triplet loss and 12% recall boost on CARS196. Our method originates from the classic softmax loss, which has a global structure but does not directly optimize the distance metric as well as the inter/intra class distance. We tackle this issue through a hypersphere layer and a global centralized ranking loss with a pairwise decorrelated learning. In particular, we first propose a Normalize-Scale Layer to eliminate the gap between metric distance (for measuring distance in retrieval) and dot product (for dimension reduction in classification). Second, the relationship between features is encoded under a global centralized ranking loss, which targets at optimizing metric distance globally and accelerating learning procedure. Finally, the centers are further decorrelated by Gram-Schmidt process, leading to extreme efficiency (with 20 epochs in training procedure) and discriminability in feature learning. We have conducted quantitative evaluations on two fine-grained retrieval benchmark. The superior performance demonstrates the merits of the proposed approach over the state-of-the-arts.

Author(s):  
Hong Liu ◽  
Jie Li ◽  
Yongjian Wu ◽  
Rongrong Ji

Symmetric positive defined (SPD) matrix has attracted increasing research focus in image/video analysis, which merits in capturing the Riemannian geometry in its structured 2D feature representation. However, computation in the vector space on SPD matrices cannot capture the geometric properties, which corrupts the classification performance. To this end, Riemannian based deep network has become a promising solution for SPD matrix classification, because of its excellence in performing non-linear learning over SPD matrix. Besides, Riemannian metric learning typically adopts a kNN classifier that cannot be extended to large-scale datasets, which limits its application in many time-efficient scenarios. In this paper, we propose a Bag-of-Matrix-Summarization (BoMS) method to be combined with Riemannian network, which handles the above issues towards highly efficient and scalable SPD feature representation. Our key innovation lies in the idea of summarizing data in a Riemannian geometric space instead of the vector space. First, the whole training set is compressed with a small number of matrix features to ensure high scalability. Second, given such a compressed set, a constant-length vector representation is extracted by efficiently measuring the distribution variations between the summarized data and the latent feature of the Riemannian network. Finally, the proposed BoMS descriptor is integrated into the Riemannian network, upon which the whole framework is end-to-end trained via matrix back-propagation. Experiments on four different classification tasks demonstrate the superior performance of the proposed method over the state-of-the-art methods.


Author(s):  
Pingyang Dai ◽  
Rongrong Ji ◽  
Haibin Wang ◽  
Qiong Wu ◽  
Yuyu Huang

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.


Author(s):  
Xiaoyu He ◽  
Yong Wang ◽  
Shuang Zhao ◽  
Chunli Yao

AbstractCurrently, convolutional neural networks (CNNs) have made remarkable achievements in skin lesion classification because of their end-to-end feature representation abilities. However, precise skin lesion classification is still challenging because of the following three issues: (1) insufficient training samples, (2) inter-class similarities and intra-class variations, and (3) lack of the ability to focus on discriminative skin lesion parts. To address these issues, we propose a deep metric attention learning CNN (DeMAL-CNN) for skin lesion classification. In DeMAL-CNN, a triplet-based network (TPN) is first designed based on deep metric learning, which consists of three weight-shared embedding extraction networks. TPN adopts a triplet of samples as input and uses the triplet loss to optimize the embeddings, which can not only increase the number of training samples, but also learn the embeddings robust to inter-class similarities and intra-class variations. In addition, a mixed attention mechanism considering both the spatial-wise and channel-wise attention information is designed and integrated into the construction of each embedding extraction network, which can further strengthen the skin lesion localization ability of DeMAL-CNN. After extracting the embeddings, three weight-shared classification layers are used to generate the final predictions. In the training procedure, we combine the triplet loss with the classification loss as a hybrid loss to train DeMAL-CNN. We compare DeMAL-CNN with the baseline method, attention methods, advanced challenge methods, and state-of-the-art skin lesion classification methods on the ISIC 2016 and ISIC 2017 datasets, and test its generalization ability on the PH2 dataset. The results demonstrate its effectiveness.


Author(s):  
Mang Ye ◽  
Zheng Wang ◽  
Xiangyuan Lan ◽  
Pong C. Yuen

Cross-modality person re-identification between the thermal and visible domains is extremely important for night-time surveillance applications. Existing works in this filed mainly focus on learning sharable feature representations to handle the cross-modality discrepancies. However, besides the cross-modality discrepancy caused by different camera spectrums, visible thermal person re-identification also suffers from large cross-modality and intra-modality variations caused by different camera views and human poses. In this paper, we propose a dual-path network with a novel bi-directional dual-constrained top-ranking loss to learn discriminative feature representations. It is advantageous in two aspects: 1) end-to-end feature learning directly from the data without extra metric learning steps, 2) it simultaneously handles the cross-modality and intra-modality variations to ensure the discriminability of the learnt representations. Meanwhile, identity loss is further incorporated to model the identity-specific information to handle large intra-class variations. Extensive experiments on two datasets demonstrate the superior performance compared to the state-of-the-arts.


Author(s):  
Yufei Li ◽  
Xiaoyong Ma ◽  
Xiangyu Zhou ◽  
Pengzhen Cheng ◽  
Kai He ◽  
...  

Abstract Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (16) ◽  
pp. 2757-2765 ◽  
Author(s):  
Balachandran Manavalan ◽  
Shaherin Basith ◽  
Tae Hwan Shin ◽  
Leyi Wei ◽  
Gwang Lee

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.


Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2158
Author(s):  
Xin Zhang ◽  
Jiwei Qin ◽  
Jiong Zheng

For personalized recommender systems, matrix factorization and its variants have become mainstream in collaborative filtering. However, the dot product in matrix factorization does not satisfy the triangle inequality and therefore fails to capture fine-grained information. Metric learning-based models have been shown to be better at capturing fine-grained information than matrix factorization. Nevertheless, most of these models only focus on rating data and social information, which are not sufficient for dealing with the challenges of data sparsity. In this paper, we propose a metric learning-based social recommendation model called SRMC. SRMC exploits users’ co-occurrence patterns to discover their potentially similar or dissimilar users with symmetric relationships and change their relative positions to achieve better recommendations. Experiments on three public datasets show that our model is more effective than the compared models.


Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Yongjian Wu ◽  
Feiyue Huang ◽  
...  

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.  


2021 ◽  
Vol 2050 (1) ◽  
pp. 012006
Author(s):  
Xili Dai ◽  
Chunmei Ma ◽  
Jingwei Sun ◽  
Tao Zhang ◽  
Haigang Gong ◽  
...  

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.


Sign in / Sign up

Export Citation Format

Share Document