scholarly journals Recurrent Relational Memory Network for Unsupervised Image Captioning

Author(s):  
Dan Guo ◽  
Yang Wang ◽  
Peipei Song ◽  
Meng Wang

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R2M). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, R2M implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. R2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of R2M than state-of-the-arts on all benchmark datasets.

2020 ◽  
Vol 34 (07) ◽  
pp. 12829-12836 ◽  
Author(s):  
Ling Zhang ◽  
Chengjiang Long ◽  
Xiaolong Zhang ◽  
Chunxia Xiao

Residual images and illumination estimation have been proved very helpful in image enhancement. In this paper, we propose a general and novel framework RIS-GAN which explores residual and illumination with Generative Adversarial Networks for shadow removal. Combined with the coarse shadow-removal image, the estimated negative residual images and inverse illumination maps can be used to generate indirect shadow-removal images to refine the coarse shadow-removal result to the fine shadow-free image in a coarse-to-fine fashion. Three discriminators are designed to distinguish whether the predicted negative residual images, shadow-removal images, and the inverse illumination maps are real or fake jointly compared with the corresponding ground-truth information. To our best knowledge, we are the first one to explore residual and illumination for shadow removal. We evaluate our proposed method on two benchmark datasets, i.e., SRD and ISTD, and the extensive experiments demonstrate that our proposed method achieves the superior performance to state-of-the-arts, although we have no particular shadow-aware components designed in our generators.


2021 ◽  
Vol 8 (1) ◽  
pp. 3-31
Author(s):  
Yuan Xue ◽  
Yuan-Chen Guo ◽  
Han Zhang ◽  
Tao Xu ◽  
Song-Hai Zhang ◽  
...  

AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.


Author(s):  
Cory J. Butz ◽  
Jhonatan S. Oliveira ◽  
André E. Dos Santos ◽  
André L. Teixeira

We give conditions under which convolutional neural networks (CNNs) define valid sum-product networks (SPNs). One subclass, called convolutional SPNs (CSPNs), can be implemented using tensors, but also can suffer from being too shallow. Fortunately, tensors can be augmented while maintaining valid SPNs. This yields a larger subclass of CNNs, which we call deep convolutional SPNs (DCSPNs), where the convolutional and sum-pooling layers form rich directed acyclic graph structures. One salient feature of DCSPNs is that they are a rigorous probabilistic model. As such, they can exploit multiple kinds of probabilistic reasoning, including marginal inference and most probable explanation (MPE) inference. This allows an alternative method for learning DCSPNs using vectorized differentiable MPE, which plays a similar role to the generator in generative adversarial networks (GANs). Image sampling is yet another application demonstrating the robustness of DCSPNs. Our preliminary results on image sampling are encouraging, since the DCSPN sampled images exhibit variability. Experiments on image completion show that DCSPNs significantly outperform competing methods by achieving several state-of-the-art mean squared error (MSE) scores in both left-completion and bottom-completion in benchmark datasets.


2020 ◽  
Vol 34 (10) ◽  
pp. 13931-13932
Author(s):  
Avinash Swaminathan ◽  
Raj Kuwar Gupta ◽  
Haimin Zhang ◽  
Debanjan Mahata ◽  
Rakesh Gosangi ◽  
...  

In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available1.


2020 ◽  
Vol 34 (04) ◽  
pp. 3585-3592 ◽  
Author(s):  
Hanting Chen ◽  
Yunhe Wang ◽  
Han Shu ◽  
Changyuan Wen ◽  
Chunjing Xu ◽  
...  

Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance.


Author(s):  
Tao He ◽  
Yuan-Fang Li ◽  
Lianli Gao ◽  
Dongxiang Zhang ◽  
Jingkuan Song

With the recent explosive increase of digital data, image recognition and retrieval become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an end-to-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes two domains images into a semantic common space, followed by two independent generative adversarial networks arming at crosswise reconstructing two domains’ images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on four public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.


2020 ◽  
Vol 34 (07) ◽  
pp. 11588-11595 ◽  
Author(s):  
Junhao Liu ◽  
Kai Wang ◽  
Chunpu Xu ◽  
Zhou Zhao ◽  
Ruifeng Xu ◽  
...  

Image captioning is usually built on either generation-based or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually benefit from each other's complementary targets that are learned from two dual adversarial discriminators. Specifically, the generation- and retrieval-based generators provide improved synthetic and retrieved candidate captions with informative feedback signals from the two respective discriminators that are trained to distinguish the generated captions from the true captions and assign top rankings to true captions respectively, thus featuring the merits of both retrieval-based and generation-based approaches. Extensive experiments on MSCOCO dataset demonstrate that the proposed IDGAN model significantly outperforms the compared methods for image captioning.


Sign in / Sign up

Export Citation Format

Share Document