scholarly journals Exploiting Images for Video Recognition with Hierarchical Generative Adversarial Networks

Author(s):  
Feiwu Yu ◽  
Xinxiao Wu ◽  
Yuchao Sun ◽  
Lixin Duan

Existing deep learning methods of video recognition usually require a large number of labeled videos for training. But for a new task, videos are often unlabeled and it is also time-consuming and labor-intensive to annotate them. Instead of human annotation, we try to make use of existing fully labeled images to help recognize those videos. However, due to the problem of domain shifts and heterogeneous feature representations, the performance of classifiers trained on images may be dramatically degraded for video recognition tasks. In this paper, we propose a novel method, called Hierarchical Generative Adversarial Networks (HiGAN), to enhance recognition in videos (i.e., target domain) by transferring knowledge from images (i.e., source domain). The HiGAN model consists of a \emph{low-level} conditional GAN and a \emph{high-level} conditional GAN. By taking advantage of these two-level adversarial learning, our method is capable of learning a domain-invariant feature representation of source images and target videos. Comprehensive experiments on two challenging video recognition datasets (i.e. UCF101 and HMDB51) demonstrate the effectiveness of the proposed method when compared with the existing state-of-the-art domain adaptation methods.

2019 ◽  
Vol 11 (11) ◽  
pp. 1369 ◽  
Author(s):  
Bilel Benjdira ◽  
Yakoub Bazi ◽  
Anis Koubaa ◽  
Kais Ouni

Segmenting aerial images is of great potential in surveillance and scene understanding of urban areas. It provides a mean for automatic reporting of the different events that happen in inhabited areas. This remarkably promotes public safety and traffic management applications. After the wide adoption of convolutional neural networks methods, the accuracy of semantic segmentation algorithms could easily surpass 80% if a robust dataset is provided. Despite this success, the deployment of a pretrained segmentation model to survey a new city that is not included in the training set significantly decreases accuracy. This is due to the domain shift between the source dataset on which the model is trained and the new target domain of the new city images. In this paper, we address this issue and consider the challenge of domain adaptation in semantic segmentation of aerial images. We designed an algorithm that reduces the domain shift impact using generative adversarial networks (GANs). In the experiments, we tested the proposed methodology on the International Society for Photogrammetry and Remote Sensing (ISPRS) semantic segmentation dataset and found that our method improves overall accuracy from 35% to 52% when passing from the Potsdam domain (considered as source domain) to the Vaihingen domain (considered as target domain). In addition, the method allows efficiently recovering the inverted classes due to sensor variation. In particular, it improves the average segmentation accuracy of the inverted classes due to sensor variation from 14% to 61%.


Author(s):  
Daniel Fleury ◽  
Angelica Fleury

The upsurge of Generative Adversarial Networks (GANs) in the previous five years has led to advancements in unsupervised data manipulation, sourced feature translation, and precise input-output synthesis through a competitive optimization of the discriminator and generator networks. More specifically, the recent rise of cycle-consistent GANs enables style transfers from a discrete source (input A) to target domain (input B) by preprocessing object features for a multi-discriminative adversarial network. Traditionally, cyclical adversarial networks have been exploited for unpaired image-to-image translation and domain adaptation by determining mapped relationships between an input A graphic and an input B graphic. However, this integral mechanism of domain adaptation can be applied to the complex acoustical features of human speech. Although well-established datasets, such as the 2018 Voice Conversion Challenge repository, paved way for female-male voice transformation, cycle-GANs have rarely been re-engineered for voices outside the datasets. More critically, cycle-GANs have massive potential to extract surface-level and hidden feature to distort an input A source into a texturally unrelated target voice. By preprocessing, compressing, and packaging unique acoustical voice properties, CycleGANs can learn to decompose speech signals and implement new translation models while preserving emotion, the intent of words, rhythm, and accents. Due to the potential of CycleGAN’s autoencoder in realistic unsupervised voice-voice conversion/feature adaptation, the researchers raise the ethical implications of controlling source input A to manipulate target voice B, particularly in cases of defamation and sabotage of target B’s words. This paper analyzes the potential of cycle-consistent GANs in deceptive voice-voice conversion by manipulating interview excerpts of political candidates.


2019 ◽  
Vol 9 (11) ◽  
pp. 2192 ◽  
Author(s):  
Simone Bonechi ◽  
Paolo Andreini ◽  
Monica Bianchini ◽  
Akshay Pai ◽  
Franco Scarselli

In recent years, Deep Neural Networks (DNNs) have led to impressive results in a wide variety of machine learning tasks, typically relying on the existence of a huge amount of supervised data. However, in many applications (e.g., bio–medical image analysis), gathering large sets of labeled data can be very difficult and costly. Unsupervised domain adaptation exploits data from a source domain, where annotations are available, to train a model able to generalize also to a target domain, where labels are unavailable. Recent research has shown that Generative Adversarial Networks (GANs) can be successfully employed for domain adaptation, although deciding when to stop learning is a major concern for GANs. In this work, we propose some confidence measures that can be used to early stop the GAN training, also showing how such measures can be employed to predict the reliability of the network output. The effectiveness of the proposed approach has been tested in two domain adaptation tasks, with very promising results.


2020 ◽  
Vol 10 (3) ◽  
pp. 1092 ◽  
Author(s):  
Bilel Benjdira ◽  
Adel Ammar ◽  
Anis Koubaa ◽  
Kais Ouni

Despite the significant advances noted in semantic segmentation of aerial imagery, a considerable limitation is blocking its adoption in real cases. If we test a segmentation model on a new area that is not included in its initial training set, accuracy will decrease remarkably. This is caused by the domain shift between the new targeted domain and the source domain used to train the model. In this paper, we addressed this challenge and proposed a new algorithm that uses Generative Adversarial Networks (GAN) architecture to minimize the domain shift and increase the ability of the model to work on new targeted domains. The proposed GAN architecture contains two GAN networks. The first GAN network converts the chosen image from the target domain into a semantic label. The second GAN network converts this generated semantic label into an image that belongs to the source domain but conserves the semantic map of the target image. This resulting image will be used by the semantic segmentation model to generate a better semantic label of the first chosen image. Our algorithm is tested on the ISPRS semantic segmentation dataset and improved the global accuracy by a margin up to 24% when passing from Potsdam domain to Vaihingen domain. This margin can be increased by addition of other labeled data from the target domain. To minimize the cost of supervision in the translation process, we proposed a methodology to use these labeled data efficiently.


2021 ◽  
Vol 32 (1) ◽  
Author(s):  
Subhankar Roy ◽  
Aliaksandr Siarohin ◽  
Enver Sangineto ◽  
Nicu Sebe ◽  
Elisa Ricci

AbstractMost domain adaptation methods consider the problem of transferring knowledge to the target domain from a single-source dataset. However, in practical applications, we typically have access to multiple sources. In this paper we propose the first approach for multi-source domain adaptation (MSDA) based on generative adversarial networks. Our method is inspired by the observation that the appearance of a given image depends on three factors: the domain, the style (characterized in terms of low-level features variations) and the content. For this reason, we propose to project the source image features onto a space where only the dependence from the content is kept, and then re-project this invariant representation onto the pixel space using the target domain and style. In this way, new labeled images can be generated which are used to train a final target classifier. We test our approach using common MSDA benchmarks, showing that it outperforms state-of-the-art methods.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 325
Author(s):  
Ángel González-Prieto ◽  
Alberto Mozo ◽  
Edgar Talavera ◽  
Sandra Gómez-Canaval

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.


2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
David Vint ◽  
Matthew Anderson ◽  
Yuhao Yang ◽  
Christos Ilioudis ◽  
Gaetano Di Caterina ◽  
...  

In recent years, the technological advances leading to the production of high-resolution Synthetic Aperture Radar (SAR) images has enabled more and more effective target recognition capabilities. However, high spatial resolution is not always achievable, and, for some particular sensing modes, such as Foliage Penetrating Radars, low resolution imaging is often the only option. In this paper, the problem of automatic target recognition in Low Resolution Foliage Penetrating (FOPEN) SAR is addressed through the use of Convolutional Neural Networks (CNNs) able to extract both low and high level features of the imaged targets. Additionally, to address the issue of limited dataset size, Generative Adversarial Networks are used to enlarge the training set. Finally, a Receiver Operating Characteristic (ROC)-based post-classification decision approach is used to reduce classification errors and measure the capability of the classifier to provide a reliable output. The effectiveness of the proposed framework is demonstrated through the use of real SAR FOPEN data.


2020 ◽  
Vol 7 (4) ◽  
pp. 191569
Author(s):  
Edoardo Lisi ◽  
Mohammad Malekzadeh ◽  
Hamed Haddadi ◽  
F. Din-Houn Lau ◽  
Seth Flaxman

Conditional generative adversarial networks (CGANs) are a recent and popular method for generating samples from a probability distribution conditioned on latent information. The latent information often comes in the form of a discrete label from a small set. We propose a novel method for training CGANs which allows us to condition on a sequence of continuous latent distributions f (1) , …, f ( K ) . This training allows CGANs to generate samples from a sequence of distributions. We apply our method to paintings from a sequence of artistic movements, where each movement is considered to be its own distribution. Exploiting the temporal aspect of the data, a vector autoregressive (VAR) model is fitted to the means of the latent distributions that we learn, and used for one-step-ahead forecasting, to predict the latent distribution of a future art movement f ( K +1) . Realizations from this distribution can be used by the CGAN to generate ‘future’ paintings. In experiments, this novel methodology generates accurate predictions of the evolution of art. The training set consists of a large dataset of past paintings. While there is no agreement on exactly what current art period we find ourselves in, we test on plausible candidate sets of present art, and show that the mean distance to our predictions is small.


2020 ◽  
Vol 34 (07) ◽  
pp. 11386-11393 ◽  
Author(s):  
Shuang Li ◽  
Chi Liu ◽  
Qiuxia Lin ◽  
Binhui Xie ◽  
Zhengming Ding ◽  
...  

Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-specific layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-specific feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the first work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-specific layers, which will explicitly correct the domain discrepancy. Extensive experiments on three cross-domain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.


Sign in / Sign up

Export Citation Format

Share Document