scholarly journals Cycle-SUM: Cycle-Consistent Adversarial LSTM Networks for Unsupervised Video Summarization

Author(s):  
Li Yuan ◽  
Francis EH Tay ◽  
Ping Li ◽  
Li Zhou ◽  
Jiashi Feng

In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycleconsistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and “supervises” the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-theart performance and superiority of the Cycle-SUM model over previous baselines.

2021 ◽  
Vol 8 (1) ◽  
pp. 3-31
Author(s):  
Yuan Xue ◽  
Yuan-Chen Guo ◽  
Han Zhang ◽  
Tao Xu ◽  
Song-Hai Zhang ◽  
...  

AbstractIn many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.


Author(s):  
Cory J. Butz ◽  
Jhonatan S. Oliveira ◽  
André E. Dos Santos ◽  
André L. Teixeira

We give conditions under which convolutional neural networks (CNNs) define valid sum-product networks (SPNs). One subclass, called convolutional SPNs (CSPNs), can be implemented using tensors, but also can suffer from being too shallow. Fortunately, tensors can be augmented while maintaining valid SPNs. This yields a larger subclass of CNNs, which we call deep convolutional SPNs (DCSPNs), where the convolutional and sum-pooling layers form rich directed acyclic graph structures. One salient feature of DCSPNs is that they are a rigorous probabilistic model. As such, they can exploit multiple kinds of probabilistic reasoning, including marginal inference and most probable explanation (MPE) inference. This allows an alternative method for learning DCSPNs using vectorized differentiable MPE, which plays a similar role to the generator in generative adversarial networks (GANs). Image sampling is yet another application demonstrating the robustness of DCSPNs. Our preliminary results on image sampling are encouraging, since the DCSPN sampled images exhibit variability. Experiments on image completion show that DCSPNs significantly outperform competing methods by achieving several state-of-the-art mean squared error (MSE) scores in both left-completion and bottom-completion in benchmark datasets.


2020 ◽  
Vol 34 (10) ◽  
pp. 13931-13932
Author(s):  
Avinash Swaminathan ◽  
Raj Kuwar Gupta ◽  
Haimin Zhang ◽  
Debanjan Mahata ◽  
Rakesh Gosangi ◽  
...  

In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available1.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 389
Author(s):  
Esteban Piacentino ◽  
Alvaro Guarner ◽  
Cecilio Angulo

In personalized healthcare, an ecosystem for the manipulation of reliable and safe private data should be orchestrated. This paper describes an approach for the generation of synthetic electrocardiograms (ECGs) based on Generative Adversarial Networks (GANs) with the objective of anonymizing users’ information for privacy issues. This is intended to create valuable data that can be used both in educational and research areas, while avoiding the risk of a sensitive data leakage. As GANs are mainly exploited on images and video frames, we are proposing general raw data processing after transformation into an image, so it can be managed through a GAN, then decoded back to the original data domain. The feasibility of our transformation and processing hypothesis is primarily demonstrated. Next, from the proposed procedure, main drawbacks for each step in the procedure are addressed for the particular case of ECGs. Hence, a novel research pathway on health data anonymization using GANs is opened and further straightforward developments are expected.


2020 ◽  
Vol 34 (04) ◽  
pp. 3585-3592 ◽  
Author(s):  
Hanting Chen ◽  
Yunhe Wang ◽  
Han Shu ◽  
Changyuan Wen ◽  
Chunjing Xu ◽  
...  

Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance.


Author(s):  
Tao He ◽  
Yuan-Fang Li ◽  
Lianli Gao ◽  
Dongxiang Zhang ◽  
Jingkuan Song

With the recent explosive increase of digital data, image recognition and retrieval become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an end-to-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes two domains images into a semantic common space, followed by two independent generative adversarial networks arming at crosswise reconstructing two domains’ images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on four public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7471
Author(s):  
Shuozhi Wang ◽  
Jianqiang Mei ◽  
Lichao Yang ◽  
Yifan Zhao

The measurement accuracy and reliability of thermography is largely limited by a relatively low spatial-resolution of infrared (IR) cameras in comparison to digital cameras. Using a high-end IR camera to achieve high spatial-resolution can be costly or sometimes infeasible due to the high sample rate required. Therefore, there is a strong demand to improve the quality of IR images, particularly on edges, without upgrading the hardware in the context of surveillance and industrial inspection systems. This paper proposes a novel Conditional Generative Adversarial Networks (CGAN)-based framework to enhance IR edges by learning high-frequency features from corresponding visual images. A dual-discriminator, focusing on edge and content/background, is introduced to guide the cross imaging modality learning procedure of the U-Net generator in high and low frequencies respectively. Results demonstrate that the proposed framework can effectively enhance barely visible edges in IR images without introducing artefacts, meanwhile the content information is well preserved. Different from most similar studies, this method only requires IR images for testing, which will increase the applicability of some scenarios where only one imaging modality is available, such as active thermography.


Author(s):  
Dan Guo ◽  
Yang Wang ◽  
Peipei Song ◽  
Meng Wang

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R2M). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, R2M implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. R2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of R2M than state-of-the-arts on all benchmark datasets.


Author(s):  
Xiangyang Li ◽  
Shuqiang Jiang ◽  
Jungong Han

Dense captioning is a challenging task which not only detects visual elements in images but also generates natural language sentences to describe them. Previous approaches do not leverage object information in images for this task. However, objects provide valuable cues to help predict the locations of caption regions as caption regions often highly overlap with objects (i.e. caption regions are usually parts of objects or combinations of them). Meanwhile, objects also provide important information for describing a target caption region as the corresponding description not only depicts its properties, but also involves its interactions with objects in the image. In this work, we propose a novel scheme with an object context encoding Long Short-Term Memory (LSTM) network to automatically learn complementary object context for each caption region, transferring knowledge from objects to caption regions. All contextual objects are arranged as a sequence and progressively fed into the context encoding module to obtain context features. Then both the learned object context features and region features are used to predict the bounding box offsets and generate the descriptions. The context learning procedure is in conjunction with the optimization of both location prediction and caption generation, thus enabling the object context encoding LSTM to capture and aggregate useful object context. Experiments on benchmark datasets demonstrate the superiority of our proposed approach over the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document