scholarly journals Towards Unsupervised Deformable-Instances Image-to-Image Translation

Author(s):  
Sitong Su ◽  
Jingkuan Song ◽  
Lianli Gao ◽  
Junchen Zhu

Replacing objects in images is a practical functionality of Photoshop, e.g., clothes changing. This task is defined as Unsupervised Deformable-Instances Image-to-Image Translation (UDIT), which maps multiple foreground instances of a source domain to a target domain, involving significant changes in shape. In this paper, we propose an effective pipeline named Mask-Guided Deformable-instances GAN (MGD-GAN) which first generates target masks in batch and then utilizes them to synthesize corresponding instances on the background image, with all instances efficiently translated and background well preserved. To promote the quality of synthesized images and stabilize the training, we design an elegant training procedure which transforms the unsupervised mask-to-instance process into a supervised way by creating paired examples. To objectively evaluate the performance of UDIT task, we design new evaluation metrics which are based on the object detection. Extensive experiments on four datasets demonstrate the significant advantages of our MGD-GAN over existing methods both quantitatively and qualitatively. Furthermore, our training time consumption is hugely reduced compared to the state-of-the-art. The code could be available at https://github.com/sitongsu/MGD_GAN.

2021 ◽  
Author(s):  
Tien Tai Doan ◽  
Guillaume Ghyselinck ◽  
Blaise Hanczar

We propose a new method of multimodal image translation, called InfoMUNIT, which is an extension of the state-of-the-art method MUNIT. Our method allows controlling the style of the generated images and improves their quality and diversity. It learns to maximize the mutual information between a subset of style code and the distribution of the output images. Experiments show that our model cannot only translate one image from the source domain to multiple images in the target domain but also explore and manipulate features of the outputs without annotation. Furthermore, it achieves a superior diversity and a competitive image quality to state-of-the-art methods in multiple image translation tasks.


2020 ◽  
Vol 34 (04) ◽  
pp. 4099-4106
Author(s):  
Yuwei He ◽  
Xiaoming Jin ◽  
Guiguang Ding ◽  
Yuchen Guo ◽  
Jungong Han ◽  
...  

Instance-correspondence (IC) data are potent resources for heterogeneous transfer learning (HeTL) due to the capability of bridging the source and the target domains at the instance-level. To this end, people tend to use machine-generated IC data, because manually establishing IC data is expensive and primitive. However, existing IC data machine generators are not perfect and always produce the data that are not of high quality, thus hampering the performance of domain adaption. In this paper, instead of improving the IC data generator, which might not be an optimal way, we accept the fact that data quality variation does exist but find a better way to use the data. Specifically, we propose a novel heterogeneous transfer learning method named Transfer Learning with Weighted Correspondence (TLWC), which utilizes IC data to adapt the source domain to the target domain. Rather than treating IC data equally, TLWC can assign solid weights to each IC data pair depending on the quality of the data. We conduct extensive experiments on HeTL datasets and the state-of-the-art results verify the effectiveness of TLWC.


Author(s):  
Masoumeh Zareapoor ◽  
Jie Yang

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.


Author(s):  
Jianxin Lin ◽  
Yingce Xia ◽  
Yijun Wang ◽  
Tao Qin ◽  
Zhibo Chen

Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from a source domain to a target domain as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version. In this work, we introduce a new kind of loss, multi-path consistency loss, which evaluates the differences between direct translation from source domain to target domain and indirect translation from source domain to an auxiliary domain to target domain, to regularize training. For multi-domain translation (at least, three) which focuses on building translation models between any two domains, at each training iteration, we randomly select three domains, set them respectively as the source, auxiliary and target domains, build the multi-path consistency loss and optimize the network. For two-domain translation, we need to introduce an additional auxiliary domain and construct the multi-path consistency loss. We conduct various experiments to demonstrate the effectiveness of our proposed methods, including face-to-face translation, paint-to-photo translation, and de-raining/de-noising translation.


Author(s):  
Alejandro Moreo Fernández ◽  
Andrea Esuli ◽  
Fabrizio Sebastiani

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a “target” domain when the only available training data belongs to a different “source” domain. In this extended abstract, we briefly describe our new DA method called Distributional Correspondence Indexing (DCI) for sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. The experiments we have conducted show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Peng Liu ◽  
Fuyu Li ◽  
Shanshan Yuan ◽  
Wanyi Li

Object detection in thermal images is an important computer vision task and has many applications such as unmanned vehicles, robotics, surveillance, and night vision. Deep learning-based detectors have achieved major progress, which usually need large amount of labelled training data. However, labelled data for object detection in thermal images is scarce and expensive to collect. How to take advantage of the large number labelled visible images and adapt them into thermal image domain is expected to solve. This paper proposes an unsupervised image-generation enhanced adaptation method for object detection in thermal images. To reduce the gap between visible domain and thermal domain, the proposed method manages to generate simulated fake thermal images that are similar to the target images and preserves the annotation information of the visible source domain. The image generation includes a CycleGAN-based image-to-image translation and an intensity inversion transformation. Generated fake thermal images are used as renewed source domain, and then the off-the-shelf domain adaptive faster RCNN is utilized to reduce the gap between the generated intermediate domain and the thermal target domain. Experiments demonstrate the effectiveness and superiority of the proposed method.


Author(s):  
Yonghao Xu ◽  
Bo Du ◽  
Lefei Zhang ◽  
Qian Zhang ◽  
Guoli Wang ◽  
...  

Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.


2021 ◽  
Author(s):  
Shuo Jiang ◽  
Jie Hu ◽  
Jianxi Luo

Abstract Design-by-Analogy (DbA) is a design methodology that draws inspiration from a source domain to a target domain to generate new solutions to problems or designs, which can benefit designers in mitigating design fixation and improving design ideation outcomes. Recently, the increasingly available design databases and rapidly advancing data science and artificial intelligence technologies have presented new opportunities for developing data-driven methods and tools for DbA support. Herein, we survey the prior data-driven DbA studies and categorize and analyze individual study according to the data, methods and applications in four categories including analogy encoding, retrieval, mapping, and evaluation. Based on such structured literature analysis, this paper elucidates the state of the art of data-driven DbA research to date and benchmarks it with the frontier of data science and AI research to identify promising research opportunities and directions for the field.


AI Magazine ◽  
2011 ◽  
Vol 32 (1) ◽  
pp. 12 ◽  
Author(s):  
Daniel G. Shapiro ◽  
Hector Munoz-Avila ◽  
David Stracuzzi

This issue summarizes the state of the art in structured knowledge transfer, which is an emerging approach to the general problem of knowledge acquisition and reuse. Its goal is to capture, in a general form, the internal structure of the objects, relations, strategies, and processes used to solve tasks drawn from a source domain, and exploit that knowledge to improve performance in a target domain.


2021 ◽  
Vol 11 (6) ◽  
pp. 2599
Author(s):  
Felix Nobis ◽  
Felix Fent ◽  
Johannes Betz ◽  
Markus Lienkamp

State-of-the-art 3D object detection for autonomous driving is achieved by processing lidar sensor data with deep-learning methods. However, the detection quality of the state of the art is still far from enabling safe driving in all conditions. Additional sensor modalities need to be used to increase the confidence and robustness of the overall detection result. Researchers have recently explored radar data as an additional input source for universal 3D object detection. This paper proposes artificial neural network architectures to segment sparse radar point cloud data. Segmentation is an intermediate step towards radar object detection as a complementary concept to lidar object detection. Conceptually, we adapt Kernel Point Convolution (KPConv) layers for radar data. Additionally, we introduce a long short-term memory (LSTM) variant based on KPConv layers to make use of the information content in the time dimension of radar data. This is motivated by classical radar processing, where tracking of features over time is imperative to generate confident object proposals. We benchmark several variants of the network on the public nuScenes data set against a state-of-the-art pointnet-based approach. The performance of the networks is limited by the quality of the publicly available data. The radar data and radar-label quality is of great importance to the training and evaluation of machine learning models. Therefore, the advantages and disadvantages of the available data set, regarding its radar data, are discussed in detail. The need for a radar-focused data set for object detection is expressed. We assume that higher segmentation scores should be achievable with better-quality data for all models compared, and differences between the models should manifest more clearly. To facilitate research with additional radar data, the modular code for this research will be made available to the public.


Sign in / Sign up

Export Citation Format

Share Document