Deep Convolutional Sum-Product Networks

We give conditions under which convolutional neural networks (CNNs) define valid sum-product networks (SPNs). One subclass, called convolutional SPNs (CSPNs), can be implemented using tensors, but also can suffer from being too shallow. Fortunately, tensors can be augmented while maintaining valid SPNs. This yields a larger subclass of CNNs, which we call deep convolutional SPNs (DCSPNs), where the convolutional and sum-pooling layers form rich directed acyclic graph structures. One salient feature of DCSPNs is that they are a rigorous probabilistic model. As such, they can exploit multiple kinds of probabilistic reasoning, including marginal inference and most probable explanation (MPE) inference. This allows an alternative method for learning DCSPNs using vectorized differentiable MPE, which plays a similar role to the generator in generative adversarial networks (GANs). Image sampling is yet another application demonstrating the robustness of DCSPNs. Our preliminary results on image sampling are encouraging, since the DCSPN sampled images exhibit variability. Experiments on image completion show that DCSPNs significantly outperform competing methods by achieving several state-of-the-art mean squared error (MSE) scores in both left-completion and bottom-completion in benchmark datasets.

Download Full-text

Deep image synthesis from intuitive user input: A review and perspectives

Computational Visual Media ◽

10.1007/s41095-021-0234-8 ◽

2021 ◽

Vol 8 (1) ◽

pp. 3-31

Author(s):

Yuan Xue ◽

Yuan-Chen Guo ◽

Han Zhang ◽

Tao Xu ◽

Song-Hai Zhang ◽

...

Keyword(s):

Image Synthesis ◽

Generative Models ◽

Generative Adversarial Networks ◽

Image Generation ◽

Art And Design ◽

User Input ◽

Adversarial Networks ◽

Benchmark Datasets ◽

Deep Image ◽

Realistic Images

AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.

Download Full-text

Keyphrase Generation for Scientific Articles Using GANs (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7238 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13931-13932

Author(s):

Avinash Swaminathan ◽

Raj Kuwar Gupta ◽

Haimin Zhang ◽

Debanjan Mahata ◽

Rakesh Gosangi ◽

...

Keyword(s):

State Of The Art ◽

Generative Adversarial Networks ◽

Scientific Article ◽

Adversarial Networks ◽

Benchmark Datasets ◽

Art Performance

In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available1.

Download Full-text

Various Generative Adversarial Networks Model for Synthetic Prohibitory Sign Image Generation

Applied Sciences ◽

10.3390/app11072913 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2913

Author(s):

Christine Dewi ◽

Rung-Ching Chen ◽

Yan-Ting Liu ◽

Hui Yu

Keyword(s):

Mean Squared Error ◽

Vision System ◽

Similarity Index ◽

Training Data ◽

Generative Adversarial Networks ◽

Synthetic Image ◽

Real Image ◽

Traffic Sign ◽

Research Issues ◽

Adversarial Networks

A synthetic image is a critical issue for computer vision. Traffic sign images synthesized from standard models are commonly used to build computer recognition algorithms for acquiring more knowledge on various and low-cost research issues. Convolutional Neural Network (CNN) achieves excellent detection and recognition of traffic signs with sufficient annotated training data. The consistency of the entire vision system is dependent on neural networks. However, locating traffic sign datasets from most countries in the world is complicated. This work uses various generative adversarial networks (GAN) models to construct intricate images, such as Least Squares Generative Adversarial Networks (LSGAN), Deep Convolutional Generative Adversarial Networks (DCGAN), and Wasserstein Generative Adversarial Networks (WGAN). This paper also discusses, in particular, the quality of the images produced by various GANs with different parameters. For processing, we use a picture with a specific number and scale. The Structural Similarity Index (SSIM) and Mean Squared Error (MSE) will be used to measure image consistency. Between the generated image and the corresponding real image, the SSIM values will be compared. As a result, the images display a strong similarity to the real image when using more training images. LSGAN outperformed other GAN models in the experiment with maximum SSIM values achieved using 200 images as inputs, 2000 epochs, and size 32 × 32.

Download Full-text

Distilling Portable Generative Adversarial Networks for Image Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5765 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3585-3592 ◽

Cited By ~ 1

Author(s):

Hanting Chen ◽

Yunhe Wang ◽

Han Shu ◽

Changyuan Wen ◽

Chunjing Xu ◽

...

Keyword(s):

Generative Models ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Image Translation ◽

Level Information ◽

Benchmark Datasets ◽

Knowledge Distillation ◽

Network Compression ◽

High Level ◽

And Storage

Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance.

Download Full-text

One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/344 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tao He ◽

Yuan-Fang Li ◽

Lianli Gao ◽

Dongxiang Zhang ◽

Jingkuan Song

Keyword(s):

Adaptive Learning ◽

Digital Data ◽

Generative Adversarial Networks ◽

Target Domain ◽

Source Domain ◽

Learning Framework ◽

Common Space ◽

Adversarial Networks ◽

Shared Space ◽

Benchmark Datasets

With the recent explosive increase of digital data, image recognition and retrieval become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an end-to-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes two domains images into a semantic common space, followed by two independent generative adversarial networks arming at crosswise reconstructing two domains’ images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on four public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.

Download Full-text

Recurrent Relational Memory Network for Unsupervised Image Captioning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/128 ◽

2020 ◽

Author(s):

Dan Guo ◽

Yang Wang ◽

Peipei Song ◽

Meng Wang

Keyword(s):

Generative Adversarial Networks ◽

Relational Reasoning ◽

Relational Memory ◽

Image Captioning ◽

Adversarial Networks ◽

The Arts ◽

Benchmark Datasets ◽

Unsupervised Training ◽

Visual Concepts ◽

Memory Network

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R2M). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, R2M implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. R2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of R2M than state-of-the-arts on all benchmark datasets.

Download Full-text

Can Cross Entropy Loss Be Robust to Label Noise?

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/305 ◽

2020 ◽

Author(s):

Lei Feng ◽

Senlin Shu ◽

Zhuoyi Lin ◽

Fengmao Lv ◽

Li Li ◽

...

Keyword(s):

Mean Squared Error ◽

Absolute Error ◽

Training Data ◽

Cross Entropy ◽

Loss Functions ◽

Label Noise ◽

Entropy Loss ◽

Squared Error ◽

Great Performance ◽

Benchmark Datasets

Trained with the standard cross entropy loss, deep neural networks can achieve great performance on correctly labeled data. However, if the training data is corrupted with label noise, deep models tend to overfit the noisy labels, thereby achieving poor generation performance. To remedy this issue, several loss functions have been proposed and demonstrated to be robust to label noise. Although most of the robust loss functions stem from Categorical Cross Entropy (CCE) loss, they fail to embody the intrinsic relationships between CCE and other loss functions. In this paper, we propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise. Specifically, our framework enables to weight the extent of fitting the training labels by controlling the order of Taylor Series for CCE, hence it can be robust to label noise. In addition, our framework clearly reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE). Moreover, we present a detailed theoretical analysis to certify the robustness of this framework. Extensive experimental results on benchmark datasets demonstrate that our proposed approach significantly outperforms the state-of-the-art counterparts.

Download Full-text

RIS-GAN: Explore Residual and Illumination with Generative Adversarial Networks for Shadow Removal

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6979 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12829-12836 ◽

Cited By ~ 4

Author(s):

Ling Zhang ◽

Chengjiang Long ◽

Xiaolong Zhang ◽

Chunxia Xiao

Keyword(s):

Ground Truth ◽

Superior Performance ◽

Generative Adversarial Networks ◽

Shadow Removal ◽

Illumination Estimation ◽

Adversarial Networks ◽

The Arts ◽

Benchmark Datasets ◽

Coarse To Fine ◽

Ground Truth Information

Residual images and illumination estimation have been proved very helpful in image enhancement. In this paper, we propose a general and novel framework RIS-GAN which explores residual and illumination with Generative Adversarial Networks for shadow removal. Combined with the coarse shadow-removal image, the estimated negative residual images and inverse illumination maps can be used to generate indirect shadow-removal images to refine the coarse shadow-removal result to the fine shadow-free image in a coarse-to-fine fashion. Three discriminators are designed to distinguish whether the predicted negative residual images, shadow-removal images, and the inverse illumination maps are real or fake jointly compared with the corresponding ground-truth information. To our best knowledge, we are the first one to explore residual and illumination for shadow removal. We evaluate our proposed method on two benchmark datasets, i.e., SRD and ISTD, and the extensive experiments demonstrate that our proposed method achieves the superior performance to state-of-the-arts, although we have no particular shadow-aware components designed in our generators.

Download Full-text

Super-resolution in Structural Geological Models.

10.5194/egusphere-egu21-10333 ◽

2021 ◽

Author(s):

Mustaeen Ur Rehman Qazi ◽

Florian Wellmann

Keyword(s):

High Resolution ◽

Structural Model ◽

Mean Squared Error ◽

Super Resolution ◽

Generative Adversarial Networks ◽

Low Resolution ◽

Generative Adversarial Network ◽

Data Set ◽

Adversarial Networks ◽

Geological Models

<p>Structural geological models are often calculated on a specific spatial resolution &#8211; for example in the form of grid representations, or when surfaces are extracted from implicit fields. However, the structural inventory in these models is limited by the underlying mathematical formulations. It is therefore logical that, above a certain resolution, no additional information is added to the representation.</p><p>We evaluate here if Deep Neural Networks can be trained to obtain a high-resolution representation based on a low-resolution structural model, at different levels of resolution. More specifically, we test the use of state-of-the-art Generative Adversarial Networks (GAN&#8217;s) for image superresolution in the context of 2-D geological model sections. These techniques aim to learn the hidden structure or information in high resolution image data set and then reproduce highly detailed and super resolved image from its low resolution counterpart. We propose the use of Generative Adversarial Networks GANS for super resolution of geological images and 2D geological models represented as images. In this work a generative adversarial network called SRGAN has been used which uses a perceptual loss function consisting of an adversarial loss, mean squared error loss and content loss for photo realistic image super resolution. First results are promising, but challenges remain due to the different interpretation of color in images for which these GAN&#8217;s are typically used, whereas we are mostly interested in structures.</p>

Download Full-text

Cycle-SUM: Cycle-Consistent Adversarial LSTM Networks for Unsupervised Video Summarization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019143 ◽

2019 ◽

Vol 33 ◽

pp. 9143-9150 ◽

Cited By ~ 10

Author(s):

Li Yuan ◽

Francis EH Tay ◽

Ping Li ◽

Li Zhou ◽

Jiashi Feng

Keyword(s):

Video Summarization ◽

Close Relation ◽

Generative Adversarial Networks ◽

Original Video ◽

Adversarial Networks ◽

Video Frames ◽

Learning Procedure ◽

Information Maximization ◽

Benchmark Datasets ◽

Lstm Network

In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycleconsistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and “supervises” the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-theart performance and superiority of the Cycle-SUM model over previous baselines.

Download Full-text