scholarly journals Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking

Author(s):  
Mang Ye ◽  
Zheng Wang ◽  
Xiangyuan Lan ◽  
Pong C. Yuen

Cross-modality person re-identification between the thermal and visible domains is extremely important for night-time surveillance applications. Existing works in this filed mainly focus on learning sharable feature representations to handle the cross-modality discrepancies. However, besides the cross-modality discrepancy caused by different camera spectrums, visible thermal person re-identification also suffers from large cross-modality and intra-modality variations caused by different camera views and human poses. In this paper, we propose a dual-path network with a novel bi-directional dual-constrained top-ranking loss to learn discriminative feature representations. It is advantageous in two aspects: 1) end-to-end feature learning directly from the data without extra metric learning steps, 2) it simultaneously handles the cross-modality and intra-modality variations to ensure the discriminability of the learnt representations. Meanwhile, identity loss is further incorporated to model the identity-specific information to handle large intra-class variations. Extensive experiments on two datasets demonstrate the superior performance compared to the state-of-the-arts.

Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Yongjian Wu ◽  
Feiyue Huang ◽  
...  

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.  


Author(s):  
Pingyang Dai ◽  
Rongrong Ji ◽  
Haibin Wang ◽  
Qiong Wu ◽  
Yuyu Huang

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.


Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Baochang Zhang ◽  
Yongjian Wu ◽  
...  

Recent advances on fine-grained image retrieval prefer learning convolutional neural network (CNN) with specific fullyconnect layer designed loss function for discriminative feature representation. Essentially, such loss should establish a robust metric to efficiently distinguish high-dimensional features within and outside fine-grained categories. To this end, the existing loss functions are defected in two aspects: (a) The feature relationship is encoded inside the training batch. Such a local scope leads to low accuracy. (b) The error is established by the mean square, which needs pairwise distance computation in training set and results in low efficiency. In this paper, we propose a novel metric learning scheme, termed Normalize-Scale Layer and Decorrelated Global Centralized Ranking Loss, which achieves extremely efficient and discriminative learning, i.e., 5× speedup over triplet loss and 12% recall boost on CARS196. Our method originates from the classic softmax loss, which has a global structure but does not directly optimize the distance metric as well as the inter/intra class distance. We tackle this issue through a hypersphere layer and a global centralized ranking loss with a pairwise decorrelated learning. In particular, we first propose a Normalize-Scale Layer to eliminate the gap between metric distance (for measuring distance in retrieval) and dot product (for dimension reduction in classification). Second, the relationship between features is encoded under a global centralized ranking loss, which targets at optimizing metric distance globally and accelerating learning procedure. Finally, the centers are further decorrelated by Gram-Schmidt process, leading to extreme efficiency (with 20 epochs in training procedure) and discriminability in feature learning. We have conducted quantitative evaluations on two fine-grained retrieval benchmark. The superior performance demonstrates the merits of the proposed approach over the state-of-the-arts.


Author(s):  
Yi Hao ◽  
Nannan Wang ◽  
Jie Li ◽  
Xinbo Gao

Person Re-identification(re-ID) has great potential to contribute to video surveillance that automatically searches and identifies people across different cameras. Heterogeneous person re-identification between thermal(infrared) and visible images is essentially a cross-modality problem and important for night-time surveillance application. Current methods usually train a model by combining classification and metric learning algorithms to obtain discriminative and robust feature representations. However, the combined loss function ignored the correlation between classification subspace and feature embedding subspace. In this paper, we use Sphere Softmax to learn a hypersphere manifold embedding and constrain the intra-modality variations and cross-modality variations on this hypersphere. We propose an end-to-end dualstream hypersphere manifold embedding network(HSMEnet) with both classification and identification constraint. Meanwhile, we design a two-stage training scheme to acquire decorrelated features, we refer the HSME with decorrelation as D-HSME. We conduct experiments on two crossmodality person re-identification datasets. Experimental results demonstrate that our method outperforms the state-of-the-art methods on two datasets. On RegDB dataset, rank-1 accuracy is improved from 33.47% to 50.85%, and mAP is improved from 31.83% to 47.00%.


2019 ◽  
Vol 11 (22) ◽  
pp. 2718 ◽  
Author(s):  
Zhe Meng ◽  
Lingling Li ◽  
Licheng Jiao ◽  
Zhixi Feng ◽  
Xu Tang ◽  
...  

The convolutional neural network (CNN) can automatically extract hierarchical feature representations from raw data and has recently achieved great success in the classification of hyperspectral images (HSIs). However, most CNN based methods used in HSI classification neglect adequately utilizing the strong complementary yet correlated information from each convolutional layer and only employ the last convolutional layer features for classification. In this paper, we propose a novel fully dense multiscale fusion network (FDMFN) that takes full advantage of the hierarchical features from all the convolutional layers for HSI classification. In the proposed network, shortcut connections are introduced between any two layers in a feed-forward manner, enabling features learned by each layer to be accessed by all subsequent layers. This fully dense connectivity pattern achieves comprehensive feature reuse and enforces discriminative feature learning. In addition, various spectral-spatial features with multiple scales from all convolutional layers are fused to extract more discriminative features for HSI classification. Experimental results on three widely used hyperspectral scenes demonstrate that the proposed FDMFN can achieve better classification performance in comparison with several state-of-the-art approaches.


Author(s):  
Yongguo Ling ◽  
Zhiming Luo ◽  
Yaojin Lin ◽  
Shaozi Li

The challenges of visible-thermal person re-identification (VT-ReID) lies in the inter-modality discrepancy and the intra-modality variations. An appropriate metric learning plays a crucial role in optimizing the feature similarity between the two modalities. However, most existing metric learning-based methods mainly constrain the similarity between individual instances or class centers, which are inadequate to explore the rich data relationships in the cross-modality data. Besides, most of these methods fail to consider the importance of different pairs, incurring an inefficiency and ineffectiveness of optimization. To address these issues, we propose a Multi-Constraint (MC) similarity learning method that jointly considers the cross-modality relationships from three different aspects, i.e., Instance-to-Instance (I2I), Center-to-Instance (C2I), and Center-to-Center (C2C). Moreover, we devise an Adaptive Weighting Loss (AWL) function to implement the MC efficiently. In the AWL, we first use an adaptive margin pair mining to select informative pairs and then adaptively adjust weights of mined pairs based on their similarity. Finally, the mined and weighted pairs are used for the metric learning. Extensive experiments on two benchmark datasets demonstrate the superior performance of the proposed over the state-of-the-art methods.


Author(s):  
Yang Yang ◽  
Yi-Feng Wu ◽  
De-Chuan Zhan ◽  
Zhi-Bin Liu ◽  
Yuan Jiang

In real-world applications, data are often with multiple modalities, and many multi-modal learning approaches are proposed for integrating the information from different sources. Most of the previous multi-modal methods utilize the modal consistency to reduce the complexity of the learning problem, therefore the modal completeness needs to be guaranteed. However, due to the data collection failures, self-deficiencies, and other various reasons, multi-modal instances are often incomplete in real applications, and have the inconsistent anomalies even in the complete instances, which jointly result in the inconsistent problem. These degenerate the multi-modal feature learning performance, and will finally affect the generalization abilities in different tasks. In this paper, we propose a novel Deep Robust Unsupervised Multi-modal Network structure (DRUMN) for solving this real problem within a unified framework. The proposed DRUMN can utilize the extrinsic heterogeneous information from unlabeled data against the insufficiency caused by the incompleteness. On the other hand, the inconsistent anomaly issue is solved with an adaptive weighted estimation, rather than adjusting the complex thresholds. As DRUMN can extract the discriminative feature representations for each modality, experiments on real-world multimodal datasets successfully validate the effectiveness of our proposed method.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hoanh Nguyen

Vehicle detection is a crucial task in autonomous driving systems. Due to large variance of scales and heavy occlusion of vehicle in an image, this task is still a challenging problem. Recent vehicle detection methods typically exploit feature pyramid to detect vehicles at different scales. However, the drawbacks in the design prevent the multiscale features from being completely exploited. This paper introduces a feature pyramid architecture to address this problem. In the proposed architecture, an improving region proposal network is designed to generate intermediate feature maps which are then used to add more discriminative representations to feature maps generated by the backbone network, as well as improving the computational cost of the network. To generate more discriminative feature representations, this paper introduces multilayer enhancement module to reweight feature representations of feature maps generated by the backbone network to increase the discrimination of foreground objects and background regions in each feature map. In addition, an adaptive RoI pooling module is proposed to pool features from all pyramid levels for each proposal and fuse them for the detection network. Experimental results on the KITTI vehicle detection benchmark and the PASCAL VOC 2007 car dataset show that the proposed approach obtains better detection performance compared with recent methods on vehicle detection.


2019 ◽  
Vol 11 (3) ◽  
pp. 281 ◽  
Author(s):  
Wei Xiong ◽  
Yafei Lv ◽  
Yaqi Cui ◽  
Xiaohan Zhang ◽  
Xiangqi Gu

Effective feature representations play a decisive role in content-based remote sensing image retrieval (CBRSIR). Recently, learning-based features have been widely used in CBRSIR and they show powerful ability of feature representations. In addition, a significant effort has been made to improve learning-based features from the perspective of the network structure. However, these learning-based features are not sufficiently discriminative for CBRSIR. In this paper, we propose two effective schemes for generating discriminative features for CBRSIR. In the first scheme, the attention mechanism and a new attention module are introduced to the Convolutional Neural Networks (CNNs) structure, causing more attention towards salient features, and the suppression of other features. In the second scheme, a multi-task learning network structure is proposed, to force learning-based features to be more discriminative, with inter-class dispersion and intra-class compaction, through penalizing the distances between the feature representations and their corresponding class centers. Then, a new method for constructing more challenging datasets is first used for remote sensing image retrieval, to better validate our schemes. Extensive experiments on challenging datasets are conducted to evaluate the effectiveness of our two schemes, and the comparison of the results demonstrate that our proposed schemes, especially the fusion of the two schemes, can improve the baseline methods by a significant margin.


Author(s):  
Sarah A. Ebiaredoh-Mienye ◽  
E. Esenogho ◽  
Theo G. Swart

Presently, the use of a credit card has become an integral part of contemporary banking and financial system. Predicting potential credit card defaulters or debtors is a crucial business opportunity for financial institutions. For now, some machine learning methods have been applied to achieve this task. However, with the dynamic and imbalanced nature of credit card default data, it is challenging for classical machine learning algorithms to proffer robust models with optimal performance. Research has shown that the performance of machine learning algorithms can be significantly improved when provided with optimal features. In this paper, we propose an unsupervised feature learning method to improve the performance of various classifiers using a stacked sparse autoencoder (SSAE). The SSAE was optimized to achieve improved performance. The proposed SSAE learned excellent feature representations that were used to train the classifiers. The performance of the proposed approach is compared with an instance where the classifiers were trained using the raw data. Also, a comparison is made with previous scholarly works, and the proposed approach showed superior performance over other methods.


Sign in / Sign up

Export Citation Format

Share Document