Invertible Autoencoder for Domain Adaptation

Yunfei Teng; Anna Choromanska

doi:10.3390/computation7020020

Invertible Autoencoder for Domain Adaptation

Computation ◽

10.3390/computation7020020 ◽

2019 ◽

Vol 7 (2) ◽

pp. 20 ◽

Cited By ~ 2

Author(s):

Yunfei Teng ◽

Anna Choromanska

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Joint Probability ◽

Autonomous Driving ◽

Learning System ◽

Joint Probability Distribution ◽

Learning Problem ◽

Image Translation ◽

Benchmark Datasets ◽

Image Pairs

The unsupervised image-to-image translation aims at finding a mapping between the source ( A ) and target ( B ) image domains, where in many applications aligned image pairs are not available at training. This is an ill-posed learning problem since it requires inferring the joint probability distribution from marginals. Joint learning of coupled mappings F A B : A → B and F B A : B → A is commonly used by the state-of-the-art methods, like CycleGAN to learn this translation by introducing cycle consistency requirement to the learning problem, i.e., F A B ( F B A ( B ) ) ≈ B and F B A ( F A B ( A ) ) ≈ A . Cycle consistency enforces the preservation of the mutual information between input and translated images. However, it does not explicitly enforce F B A to be an inverse operation to F A B . We propose a new deep architecture that we call invertible autoencoder (InvAuto) to explicitly enforce this relation. This is done by forcing an encoder to be an inverted version of the decoder, where corresponding layers perform opposite mappings and share parameters. The mappings are constrained to be orthonormal. The resulting architecture leads to the reduction of the number of trainable parameters (up to 2 times). We present image translation results on benchmark datasets and demonstrate state-of-the art performance of our approach. Finally, we test the proposed domain adaptation method on the task of road video conversion. We demonstrate that the videos converted with InvAuto have high quality and show that the NVIDIA neural-network-based end-to-end learning system for autonomous driving, known as PilotNet, trained on real road videos performs well when tested on the converted ones.

Download Full-text

Self-Ensembling Attention Networks: Addressing Domain Shift for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015581 ◽

2019 ◽

Vol 33 ◽

pp. 5581-5588 ◽

Cited By ~ 3

Author(s):

Yonghao Xu ◽

Bo Du ◽

Lefei Zhang ◽

Qian Zhang ◽

Guoli Wang ◽

...

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Semantic Segmentation ◽

Great Success ◽

Learning Models ◽

Target Domain ◽

Attention Networks ◽

Source Domain ◽

Benchmark Datasets ◽

Different Levels

Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.

Download Full-text

Bidirectional Adversarial Training for Semi-Supervised Domain Adaptation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/130 ◽

2020 ◽

Author(s):

Pin Jiang ◽

Aming Wu ◽

Yahong Han ◽

Yunfeng Shao ◽

Meiyu Qi ◽

...

Keyword(s):

Additional Data ◽

Domain Adaptation ◽

State Of The Art ◽

Powerful Method ◽

Target Domain ◽

Unsupervised Domain Adaptation ◽

Benchmark Datasets ◽

Adversarial Examples ◽

Adversarial Training ◽

Effective Use

Semi-supervised domain adaptation (SSDA) is a novel branch of machine learning that scarce labeled target examples are available, compared with unsupervised domain adaptation. To make effective use of these additional data so as to bridge the domain gap, one possible way is to generate adversarial examples, which are images with additional perturbations, between the two domains and fill the domain gap. Adversarial training has been proven to be a powerful method for this purpose. However, the traditional adversarial training adds noises in arbitrary directions, which is inefficient to migrate between domains, or generate directional noises from the source to target domain and reverse. In this work, we devise a general bidirectional adversarial training method and employ gradient to guide adversarial examples across the domain gap, i.e., the Adaptive Adversarial Training (AAT) for source to target domain and Entropy-penalized Virtual Adversarial Training (E-VAT) for target to source domain. Particularly, we devise a Bidirectional Adversarial Training (BiAT) network to perform diverse adversarial trainings jointly. We evaluate the effectiveness of BiAT on three benchmark datasets and experimental results demonstrate the proposed method achieves the state-of-the-art.

Download Full-text

Transductive Relation-Propagation Network for Few-shot Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/112 ◽

2020 ◽

Author(s):

Yuqing Ma ◽

Shihao Bai ◽

Shan An ◽

Wei Liu ◽

Aishan Liu ◽

...

Keyword(s):

Neural Network ◽

Learning Strategy ◽

State Of The Art ◽

Challenging Problem ◽

Learning Problem ◽

Graph Node ◽

Learning Methods ◽

Transductive Learning ◽

Benchmark Datasets ◽

Propagation Network

Few-shot learning, aiming to learn novel concepts from few labeled examples, is an interesting and very challenging problem with many practical advantages. To accomplish this task, one should concentrate on revealing the accurate relations of the support-query pairs. We propose a transductive relation-propagation graph neural network (TRPN) to explicitly model and propagate such relations across support-query pairs. Our TRPN treats the relation of each support-query pair as a graph node, named relational node, and resorts to the known relations between support samples, including both intra-class commonality and inter-class uniqueness, to guide the relation propagation in the graph, generating the discriminative relation embeddings for support-query pairs. A pseudo relational node is further introduced to propagate the query characteristics, and a fast, yet effective transductive learning strategy is devised to fully exploit the relation information among different queries. To the best of our knowledge, this is the first work that explicitly takes the relations of support-query pairs into consideration in few-shot learning, which might offer a new way to solve the few-shot learning problem. Extensive experiments conducted on several benchmark datasets demonstrate that our method can significantly outperform a variety of state-of-the-art few-shot learning methods.

Download Full-text

SBAT: Video Captioning with Sparse Boundary-Aware Transformer

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/88 ◽

2020 ◽

Author(s):

Tao Jin ◽

Siyu Huang ◽

Ming Chen ◽

Yingming Li ◽

Zhongfei Zhang

Keyword(s):

State Of The Art ◽

Multimodal Interaction ◽

Multimodal Learning ◽

Local Correlation ◽

Learning Problem ◽

Generation Task ◽

Language Generation ◽

Video Captioning ◽

Benchmark Datasets ◽

Novel Method

In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics.

Download Full-text

Coarse-to-Fine Satellite Images Change Detection Framework via Boundary-Aware Attentive Network

Sensors ◽

10.3390/s20236735 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6735

Author(s):

Yi Zhang ◽

Shizhou Zhang ◽

Ying Li ◽

Yanning Zhang

Keyword(s):

Change Detection ◽

Land Surface ◽

Satellite Images ◽

State Of The Art ◽

High Resolution Satellite Images ◽

Benchmark Datasets ◽

Image Pairs ◽

Learning Frameworks ◽

Coarse To Fine ◽

Change Map

Timely and accurate change detection on satellite images by using computer vision techniques has been attracting lots of research efforts in recent years. Existing approaches based on deep learning frameworks have achieved good performance for the task of change detection on satellite images. However, under the scenario of disjoint changed areas in various shapes on land surface, existing methods still have shortcomings in detecting all changed areas correctly and representing the changed areas boundary. To deal with these problems, we design a coarse-to-fine detection framework via a boundary-aware attentive network with a hybrid loss to detect the change in high resolution satellite images. Specifically, we first perform an attention guided encoder-decoder subnet to obtain the coarse change map of the bi-temporal image pairs, and then apply residual learning to obtain the refined change map. We also propose a hybrid loss to provide the supervision from pixel, patch, and map levels. Comprehensive experiments are conducted on two benchmark datasets: LEBEDEV and SZTAKI to verify the effectiveness of the proposed method and the experimental results show that our model achieves state-of-the-art performance.

Download Full-text

Zero-shot Metric Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/555 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xinyi Xu ◽

Huanhuan Cao ◽

Yanhua Yang ◽

Erkun Yang ◽

Cheng Deng

Keyword(s):

State Of The Art ◽

Metric Learning ◽

Visual Similarity ◽

Distance Metric ◽

Learning Problem ◽

Combine Data ◽

Benchmark Datasets ◽

Novel Method ◽

Multiple Relation ◽

Continuous Relation

In this work, we tackle the zero-shot metric learning problem and propose a novel method abbreviated as ZSML, with the purpose to learn a distance metric that measures the similarity of unseen categories (even unseen datasets). ZSML achieves strong transferability by capturing multi-nonlinear yet continuous relation among data. It is motivated by two facts: 1) relations can be essentially described from various perspectives; and 2) traditional binary supervision is insufficient to represent continuous visual similarity. Specifically, we first reformulate a collection of specific-shaped convolutional kernels to combine data pairs and generate multiple relation vectors. Furthermore, we design a new cross-update regression loss to discover continuous similarity. Extensive experiments including intra-dataset transfer and inter-dataset transfer on four benchmark datasets demonstrate that ZSML can achieve state-of-the-art performance.

Download Full-text

Online Multiple Object Tracking Using a Novel Discriminative Module for Autonomous Driving

Electronics ◽

10.3390/electronics10202479 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2479

Author(s):

Jia Chen ◽

Fan Wang ◽

Chunjiang Li ◽

Yingjie Zhang ◽

Yibo Ai ◽

...

Keyword(s):

Object Tracking ◽

State Of The Art ◽

Autonomous Driving ◽

Driving Safety ◽

Multiple Objects ◽

Automatic Driving ◽

Research Technology ◽

Experimental Part ◽

Sensing System ◽

Benchmark Datasets

Multi object tracking (MOT) is a key research technology in the environment sensing system of automatic driving, which is very important to driving safety. Online multi object tracking needs to accurately extend the trajectory of multiple objects without using future frame information, so it will face greater challenges. Most of the existing online MOT methods are anchor-based detectors, which have many misdetections and missed detection problems, and have a poor effect on the trajectory extension of adjacent object objects when they are occluded and overlapped. In this paper, we propose a discrimination learning online tracker that can effectively solve the occlusion problem based on an anchor-free detector. This method uses the different weight characteristics of the object when the occlusion occurs and realizes the extension of the competition trajectory through the discrimination module to prevent the ID-switch problem. In the experimental part, we compared the algorithm with other trackers on two public benchmark datasets, MOT16 and MOT17, and proved that our algorithm has achieved state-of-the-art performance, and conducted a qualitative analysis on the convincing autonomous driving dataset KITTI.

Download Full-text

Deep Multi-Task Learning with Adversarial-and-Cooperative Nets

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/566 ◽

2019 ◽

Cited By ~ 1

Author(s):

Pei Yang ◽

Qi Tan ◽

Jieping Ye ◽

Hanghang Tong ◽

Jingrui He

Keyword(s):

Knowledge Sharing ◽

Domain Adaptation ◽

State Of The Art ◽

Specific Knowledge ◽

Fine Grained ◽

Task Learning ◽

Combine Strategy ◽

Benchmark Datasets ◽

Adaptation Scenarios ◽

And Task

In this paper, we propose a deep multi-Task learning model based on Adversarial-and-COoperative nets (TACO). The goal is to use an adversarial-and-cooperative strategy to decouple the task-common and task-specific knowledge, facilitating the fine-grained knowledge sharing among tasks. TACO accommodates multiple game players, i.e., feature extractors, domain discriminator, and tri-classifiers. They play the MinMax games adversarially and cooperatively to distill the task-common and task-specific features, while respecting their discriminative structures. Moreover, it adopts a divide-and-combine strategy to leverage the decoupled multi-view information to further improve the generalization performance of the model. The experimental results show that our proposed method significantly outperforms the state-of-the-art algorithms on the benchmark datasets in both multi-task learning and semi-supervised domain adaptation scenarios.

Download Full-text

JOINT PROBABILITY DISTRIBUTION MODEL OF WIND AND WAVES FOR OFFSHORE WIND TURBINE IN JAPAN SEA AREA

Journal of Japan Society of Civil Engineers Ser B3 (Ocean Engineering) ◽

10.2208/jscejoe.75.i_31 ◽

2019 ◽

Vol 75 (2) ◽

pp. I_31-I_36

Author(s):

Yoji TANAKA ◽

Takeshi YOSHIOKA ◽

Keiji NAKAI ◽

Toshihiko NAGAI

Keyword(s):

Probability Distribution ◽

Wind Turbine ◽

Joint Probability ◽

Japan Sea ◽

Offshore Wind ◽

Distribution Model ◽

Offshore Wind Turbine ◽

Joint Probability Distribution ◽

Sea Area

Download Full-text

BiLabel-Specific Features for Multi-Label Classification

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3458283 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-23

Author(s):

Min-Ling Zhang ◽

Jun-Peng Fang ◽

Yi-Bo Wang

Keyword(s):

Predictive Models ◽

Comparative Studies ◽

State Of The Art ◽

Classification Model ◽

Generation Process ◽

Prototype Selection ◽

Class Label ◽

Benchmark Datasets ◽

Label Correlations ◽

Class Labels

In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.

Download Full-text