scholarly journals Cross-Modality Person Re-Identification with Generative Adversarial Training

Author(s):  
Pingyang Dai ◽  
Rongrong Ji ◽  
Haibin Wang ◽  
Qiong Wu ◽  
Yuyu Huang

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.

Author(s):  
Hong Liu ◽  
Jie Li ◽  
Yongjian Wu ◽  
Rongrong Ji

Symmetric positive defined (SPD) matrix has attracted increasing research focus in image/video analysis, which merits in capturing the Riemannian geometry in its structured 2D feature representation. However, computation in the vector space on SPD matrices cannot capture the geometric properties, which corrupts the classification performance. To this end, Riemannian based deep network has become a promising solution for SPD matrix classification, because of its excellence in performing non-linear learning over SPD matrix. Besides, Riemannian metric learning typically adopts a kNN classifier that cannot be extended to large-scale datasets, which limits its application in many time-efficient scenarios. In this paper, we propose a Bag-of-Matrix-Summarization (BoMS) method to be combined with Riemannian network, which handles the above issues towards highly efficient and scalable SPD feature representation. Our key innovation lies in the idea of summarizing data in a Riemannian geometric space instead of the vector space. First, the whole training set is compressed with a small number of matrix features to ensure high scalability. Second, given such a compressed set, a constant-length vector representation is extracted by efficiently measuring the distribution variations between the summarized data and the latent feature of the Riemannian network. Finally, the proposed BoMS descriptor is integrated into the Riemannian network, upon which the whole framework is end-to-end trained via matrix back-propagation. Experiments on four different classification tasks demonstrate the superior performance of the proposed method over the state-of-the-art methods.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2158
Author(s):  
Juan Du ◽  
Kuanhong Cheng ◽  
Yue Yu ◽  
Dabao Wang ◽  
Huixin Zhou

Panchromatic (PAN) images contain abundant spatial information that is useful for earth observation, but always suffer from low-resolution ( LR) due to the sensor limitation and large-scale view field. The current super-resolution (SR) methods based on traditional attention mechanism have shown remarkable advantages but remain imperfect to reconstruct the edge details of SR images. To address this problem, an improved SR model which involves the self-attention augmented Wasserstein generative adversarial network ( SAA-WGAN) is designed to dig out the reference information among multiple features for detail enhancement. We use an encoder-decoder network followed by a fully convolutional network (FCN) as the backbone to extract multi-scale features and reconstruct the High-resolution (HR) results. To exploit the relevance between multi-layer feature maps, we first integrate a convolutional block attention module (CBAM) into each skip-connection of the encoder-decoder subnet, generating weighted maps to enhance both channel-wise and spatial-wise feature representation automatically. Besides, considering that the HR results and LR inputs are highly similar in structure, yet cannot be fully reflected in traditional attention mechanism, we, therefore, designed a self augmented attention (SAA) module, where the attention weights are produced dynamically via a similarity function between hidden features; this design allows the network to flexibly adjust the fraction relevance among multi-layer features and keep the long-range inter information, which is helpful to preserve details. In addition, the pixel-wise loss is combined with perceptual and gradient loss to achieve comprehensive supervision. Experiments on benchmark datasets demonstrate that the proposed method outperforms other SR methods in terms of both objective evaluation and visual effect.


2020 ◽  
Vol 28 (5) ◽  
pp. 975-988
Author(s):  
Sivamurugan Vellakani ◽  
Indumathi Pushbam

Human eye is affected by the different eye diseases including choroidal neovascularization (CNV), diabetic macular edema (DME) and age-related macular degeneration (AMD). This work aims to design an artificial intelligence (AI) based clinical decision support system for eye disease detection and classification to assist the ophthalmologists more effectively detecting and classifying CNV, DME and drusen by using the Optical Coherence Tomography (OCT) images depicting different tissues. The methodology used for designing this system involves different deep learning convolutional neural network (CNN) models and long short-term memory networks (LSTM). The best image captioning model is selected after performance analysis by comparing nine different image captioning systems for assisting ophthalmologists to detect and classify eye diseases. The quantitative data analysis results obtained for the image captioning models designed using DenseNet201 with LSTM have superior performance in terms of overall accuracy of 0.969, positive predictive value of 0.972 and true-positive rate of 0.969using OCT images enhanced by the generative adversarial network (GAN). The corresponding performance values for the Xception with LSTM image captioning models are 0.969, 0.969 and 0.938, respectively. Thus, these two models yield superior performance and have potential to assist ophthalmologists in making optimal diagnostic decision.


2020 ◽  
Vol 34 (07) ◽  
pp. 10909-10916
Author(s):  
Ligong Han ◽  
Ruijiang Gao ◽  
Mun Kim ◽  
Xin Tao ◽  
Bo Liu ◽  
...  

Conditional generative adversarial networks have shown exceptional generation performance over the past few years. However, they require large numbers of annotations. To address this problem, we propose a novel generative adversarial network utilizing weak supervision in the form of pairwise comparisons (PC-GAN) for image attribute editing. In the light of Bayesian uncertainty estimation and noise-tolerant adversarial training, PC-GAN can estimate attribute rating efficiently and demonstrate robust performance in noise resistance. Through extensive experiments, we show both qualitatively and quantitatively that PC-GAN performs comparably with fully-supervised methods and outperforms unsupervised baselines. Code and Supplementary can be found on the project website*.


2019 ◽  
Vol 9 (8) ◽  
pp. 1550 ◽  
Author(s):  
Aihong Shen ◽  
Huasheng Wang ◽  
Junjie Wang ◽  
Hongchen Tan ◽  
Xiuping Liu ◽  
...  

Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map level to enforce feature diversity in the network. Furthermore, an attention assignment mechanism is proposed to enforce that the person re-ID classifier focuses on nearly all important regions of the input person image. To achieve this, a three-stage framework is proposed. First, a baseline classification network is trained for person re-ID. Second, an attention assignment network is proposed based on the baseline network, in which the attention module learns to suppress the response of the current detected regions and re-assign attentions to other important locations. By this means, multiple important regions for classification are highlighted by the attention map. Finally, the attention map is integrated in the attention-aware adversarial network (AAA-Net), which generates high-performance classification results with an adversarial training strategy. We evaluate the proposed method on two large-scale benchmark datasets, including Market1501 and DukeMTMC-reID. Experimental results show that our algorithm performs favorably against the state-of-the-art methods.


2018 ◽  
Author(s):  
Gongbo Liang ◽  
Sajjad Fouladvand ◽  
Jie Zhang ◽  
Michael A. Brooks ◽  
Nathan Jacobs ◽  
...  

AbstractComputed tomography (CT) is a widely-used diag-reproducibility regarding radiomic features, such as intensity, nostic image modality routinely used for assessing anatomical tissue characteristics. However, non-standardized imaging pro-tocols are commonplace, which poses a fundamental challenge in large-scale cross-center CT image analysis. One approach to address the problem is to standardize CT images using generative adversarial network models (GAN). GAN learns the data distribution of training images and generate synthesized images under the same distribution. However, existing GAN models are not directly applicable to this task mainly due to the lack of constraints on the mode of data to generate. Furthermore, they treat every image equally, but in real applications, some images are more difficult to standardize than the others. All these may lead to the lack-of-detail problem in CT image synthesis. We present a new GAN model called GANai to mitigate the differences in radiomic features across CT images captured using non-standard imaging protocols. Given source images, GANai composes new images by specifying a high-level goal that the image features of the synthesized images should be similar to those of the standard images. GANai introduces an alternative improvement training strategy to alternatively and steadily improve model performance. The new training strategy enables a series of technical improvements, including phase-specific loss functions, phase-specific training data, and the adoption of ensemble learning, leading to better model performance. The experimental results show that GANai is significantly better than the existing state-of-the-art image synthesis algorithms on CT image standardization. Also, it significantly improves the efficiency and stability of GAN model training.


Author(s):  
Zhou Zhao ◽  
Ben Gao ◽  
Vincent W. Zheng ◽  
Deng Cai ◽  
Xiaofei He ◽  
...  

Link prediction is a challenging problem for complex network analysis, arising in many disciplines such as social networks and telecommunication networks. Currently, many existing approaches estimate the proximity of the link endpoints for link prediction from their feature or the local neighborhood around them, which suffer from the localized view of network connections and insufficiency of discriminative feature representation. In this paper, we consider the problem of link prediction from the viewpoint of learning discriminative path-based proximity ranking metric embedding. We propose a novel ranking metric network learning framework by jointly exploiting both node-level and path-level attentional proximity of the endpoints for link prediction. We then develop the path-based dual-level reasoning attentional learning method with recurrent neural network for proximity ranking metric embedding. The extensive experiments on two large-scale datasets show that our method achieves better performance than other state-of-the-art solutions to the problem.


2021 ◽  
Author(s):  
Zhen Liang ◽  
Xihao Zhang ◽  
Rushuang Zhou ◽  
Li Zhang ◽  
Linling Li ◽  
...  

How to effectively and efficiently extract valid and reliable features from high-dimensional electroencephalography (EEG), particularly how to fuse the spatial and temporal dynamic brain information into a better feature representation, is a critical issue in brain data analysis. Most current EEG studies work in a task driven manner and explore the valid EEG features with a supervised model, which would be limited by the given labels to a great extent. In this paper, we propose a practical hybrid unsupervised deep convolutional recurrent generative adversarial network based EEG feature characterization and fusion model, which is termed as EEGFuseNet. EEGFuseNet is trained in an unsupervised manner, and deep EEG features covering both spatial and temporal dynamics are automatically characterized. Comparing to the existing features, the characterized deep EEG features could be considered to be more generic and independent of any specific EEG task. The performance of the extracted deep and low-dimensional features by EEGFuseNet is carefully evaluated in an unsupervised emotion recognition application based on three public emotion databases. The results demonstrate the proposed EEGFuseNet is a robust and reliable model, which is easy to train and performs efficiently in the representation and fusion of dynamic EEG features. In particular, EEGFuseNet is established as an optimal unsupervised fusion model with promising cross-subject emotion recognition performance. It proves EEGFuseNet is capable of characterizing and fusing deep features that imply comparative cortical dynamic significance corresponding to the changing of different emotion states, and also demonstrates the possibility of realizing EEG based cross-subject emotion recognition in a pure unsupervised manner.


2021 ◽  
Author(s):  
Khandakar Tanvir Ahmed ◽  
Jiao Sun ◽  
Jeongsik Yong ◽  
Wei Zhang

Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network (GAN) model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. Large-scale experiments on The Cancer Genome Atlas (TCGA) breast cancer and ovarian cancer datasets validate that (1) the model can effectively integrate two omics data (i.e., mRNA and microRNA expression data) and their interaction network (i.e., microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (2) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals.


Sign in / Sign up

Export Citation Format

Share Document