scholarly journals Invertible Autoencoder for Domain Adaptation

Computation ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 20 ◽  
Author(s):  
Yunfei Teng ◽  
Anna Choromanska

The unsupervised image-to-image translation aims at finding a mapping between the source ( A ) and target ( B ) image domains, where in many applications aligned image pairs are not available at training. This is an ill-posed learning problem since it requires inferring the joint probability distribution from marginals. Joint learning of coupled mappings F A B : A → B and F B A : B → A is commonly used by the state-of-the-art methods, like CycleGAN to learn this translation by introducing cycle consistency requirement to the learning problem, i.e., F A B ( F B A ( B ) ) ≈ B and F B A ( F A B ( A ) ) ≈ A . Cycle consistency enforces the preservation of the mutual information between input and translated images. However, it does not explicitly enforce F B A to be an inverse operation to F A B . We propose a new deep architecture that we call invertible autoencoder (InvAuto) to explicitly enforce this relation. This is done by forcing an encoder to be an inverted version of the decoder, where corresponding layers perform opposite mappings and share parameters. The mappings are constrained to be orthonormal. The resulting architecture leads to the reduction of the number of trainable parameters (up to 2 times). We present image translation results on benchmark datasets and demonstrate state-of-the art performance of our approach. Finally, we test the proposed domain adaptation method on the task of road video conversion. We demonstrate that the videos converted with InvAuto have high quality and show that the NVIDIA neural-network-based end-to-end learning system for autonomous driving, known as PilotNet, trained on real road videos performs well when tested on the converted ones.

Author(s):  
Yonghao Xu ◽  
Bo Du ◽  
Lefei Zhang ◽  
Qian Zhang ◽  
Guoli Wang ◽  
...  

Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.


Author(s):  
Pin Jiang ◽  
Aming Wu ◽  
Yahong Han ◽  
Yunfeng Shao ◽  
Meiyu Qi ◽  
...  

Semi-supervised domain adaptation (SSDA) is a novel branch of machine learning that scarce labeled target examples are available, compared with unsupervised domain adaptation. To make effective use of these additional data so as to bridge the domain gap, one possible way is to generate adversarial examples, which are images with additional perturbations, between the two domains and fill the domain gap. Adversarial training has been proven to be a powerful method for this purpose. However, the traditional adversarial training adds noises in arbitrary directions, which is inefficient to migrate between domains, or generate directional noises from the source to target domain and reverse. In this work, we devise a general bidirectional adversarial training method and employ gradient to guide adversarial examples across the domain gap, i.e., the Adaptive Adversarial Training (AAT) for source to target domain and Entropy-penalized Virtual Adversarial Training (E-VAT) for target to source domain. Particularly, we devise a Bidirectional Adversarial Training (BiAT) network to perform diverse adversarial trainings jointly. We evaluate the effectiveness of BiAT on three benchmark datasets and experimental results demonstrate the proposed method achieves the state-of-the-art.


Author(s):  
Yuqing Ma ◽  
Shihao Bai ◽  
Shan An ◽  
Wei Liu ◽  
Aishan Liu ◽  
...  

Few-shot learning, aiming to learn novel concepts from few labeled examples, is an interesting and very challenging problem with many practical advantages. To accomplish this task, one should concentrate on revealing the accurate relations of the support-query pairs. We propose a transductive relation-propagation graph neural network (TRPN) to explicitly model and propagate such relations across support-query pairs. Our TRPN treats the relation of each support-query pair as a graph node, named relational node, and resorts to the known relations between support samples, including both intra-class commonality and inter-class uniqueness, to guide the relation propagation in the graph, generating the discriminative relation embeddings for support-query pairs. A pseudo relational node is further introduced to propagate the query characteristics, and a fast, yet effective transductive learning strategy is devised to fully exploit the relation information among different queries. To the best of our knowledge, this is the first work that explicitly takes the relations of support-query pairs into consideration in few-shot learning, which might offer a new way to solve the few-shot learning problem. Extensive experiments conducted on several benchmark datasets demonstrate that our method can significantly outperform a variety of state-of-the-art few-shot learning methods.


Author(s):  
Tao Jin ◽  
Siyu Huang ◽  
Ming Chen ◽  
Yingming Li ◽  
Zhongfei Zhang

In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6735
Author(s):  
Yi Zhang ◽  
Shizhou Zhang ◽  
Ying Li ◽  
Yanning Zhang

Timely and accurate change detection on satellite images by using computer vision techniques has been attracting lots of research efforts in recent years. Existing approaches based on deep learning frameworks have achieved good performance for the task of change detection on satellite images. However, under the scenario of disjoint changed areas in various shapes on land surface, existing methods still have shortcomings in detecting all changed areas correctly and representing the changed areas boundary. To deal with these problems, we design a coarse-to-fine detection framework via a boundary-aware attentive network with a hybrid loss to detect the change in high resolution satellite images. Specifically, we first perform an attention guided encoder-decoder subnet to obtain the coarse change map of the bi-temporal image pairs, and then apply residual learning to obtain the refined change map. We also propose a hybrid loss to provide the supervision from pixel, patch, and map levels. Comprehensive experiments are conducted on two benchmark datasets: LEBEDEV and SZTAKI to verify the effectiveness of the proposed method and the experimental results show that our model achieves state-of-the-art performance.


Author(s):  
Xinyi Xu ◽  
Huanhuan Cao ◽  
Yanhua Yang ◽  
Erkun Yang ◽  
Cheng Deng

In this work, we tackle the zero-shot metric learning problem and propose a novel method abbreviated as ZSML, with the purpose to learn a distance metric that measures the similarity of unseen categories (even unseen datasets). ZSML achieves strong transferability by capturing multi-nonlinear yet continuous relation among data. It is motivated by two facts: 1) relations can be essentially described from various perspectives; and 2) traditional binary supervision is insufficient to represent continuous visual similarity. Specifically, we first reformulate a collection of specific-shaped convolutional kernels to combine data pairs and generate multiple relation vectors. Furthermore, we design a new cross-update regression loss to discover continuous similarity. Extensive experiments including intra-dataset transfer and inter-dataset transfer on four benchmark datasets demonstrate that ZSML can achieve state-of-the-art performance.


Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2479
Author(s):  
Jia Chen ◽  
Fan Wang ◽  
Chunjiang Li ◽  
Yingjie Zhang ◽  
Yibo Ai ◽  
...  

Multi object tracking (MOT) is a key research technology in the environment sensing system of automatic driving, which is very important to driving safety. Online multi object tracking needs to accurately extend the trajectory of multiple objects without using future frame information, so it will face greater challenges. Most of the existing online MOT methods are anchor-based detectors, which have many misdetections and missed detection problems, and have a poor effect on the trajectory extension of adjacent object objects when they are occluded and overlapped. In this paper, we propose a discrimination learning online tracker that can effectively solve the occlusion problem based on an anchor-free detector. This method uses the different weight characteristics of the object when the occlusion occurs and realizes the extension of the competition trajectory through the discrimination module to prevent the ID-switch problem. In the experimental part, we compared the algorithm with other trackers on two public benchmark datasets, MOT16 and MOT17, and proved that our algorithm has achieved state-of-the-art performance, and conducted a qualitative analysis on the convincing autonomous driving dataset KITTI.


Author(s):  
Pei Yang ◽  
Qi Tan ◽  
Jieping Ye ◽  
Hanghang Tong ◽  
Jingrui He

In this paper, we propose a deep multi-Task learning model based on Adversarial-and-COoperative nets (TACO). The goal is to use an adversarial-and-cooperative strategy to decouple the task-common and task-specific knowledge, facilitating the fine-grained knowledge sharing among tasks. TACO accommodates multiple game players, i.e., feature extractors, domain discriminator, and tri-classifiers. They play the MinMax games adversarially and cooperatively to distill the task-common and task-specific features, while respecting their discriminative structures. Moreover, it adopts a divide-and-combine strategy to leverage the decoupled multi-view information to further improve the generalization performance of the model. The experimental results show that our proposed method significantly outperforms the state-of-the-art algorithms on the benchmark datasets in both multi-task learning and semi-supervised domain adaptation scenarios.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-23
Author(s):  
Min-Ling Zhang ◽  
Jun-Peng Fang ◽  
Yi-Bo Wang

In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.


Sign in / Sign up

Export Citation Format

Share Document