Combination of Variational Autoencoders and Generative Adversarial Network into an Unsupervised Generative Model

2021 ◽  
pp. 101-110
Author(s):  
Ali Jaber Almalki ◽  
Pawel Wocjan
Algorithms ◽  
2020 ◽  
Vol 13 (12) ◽  
pp. 319
Author(s):  
Wang Xi ◽  
Guillaume Devineau ◽  
Fabien Moutarde ◽  
Jie Yang

Generative models for images, audio, text, and other low-dimension data have achieved great success in recent years. Generating artificial human movements can also be useful for many applications, including improvement of data augmentation methods for human gesture recognition. The objective of this research is to develop a generative model for skeletal human movement, allowing to control the action type of generated motion while keeping the authenticity of the result and the natural style variability of gesture execution. We propose to use a conditional Deep Convolutional Generative Adversarial Network (DC-GAN) applied to pseudo-images representing skeletal pose sequences using tree structure skeleton image format. We evaluate our approach on the 3D skeletal data provided in the large NTU_RGB+D public dataset. Our generative model can output qualitatively correct skeletal human movements for any of the 60 action classes. We also quantitatively evaluate the performance of our model by computing Fréchet inception distances, which shows strong correlation to human judgement. To the best of our knowledge, our work is the first successful class-conditioned generative model for human skeletal motions based on pseudo-image representation of skeletal pose sequences.


2020 ◽  
Author(s):  
Michal Varga ◽  
Ján Jadlovský ◽  
Slávka Jadlovská

Abstract In this paper, we propose a methodology for generative enhancement of existing 3D image classifiers. This methodology is based on combining the strengths of both non-generative classifiers and generative modeling. Its purpose is to streamline the creation of new classifiers by embedding existing compatible classifiers in a generative network architecture. The demonstration of this process and evaluation of its effects is performed using a 3D convolutional classifier and its generative equivalent - a conditional generative adversarial network classifier. The results show that the generative model achieves greater classification performance, gaining a relative classification accuracy improvement of 7.43%. Improvement of accuracy is also present when compared to a plain convolutional classifier trained on a dataset augmented with examples produced by a trained generator. This suggests there is a desirable knowledge sharing within the hybrid discriminator-classifier network.


Author(s):  
Wang Xi ◽  
Guillaume Devineau ◽  
Fabien Moutarde ◽  
Jie Yang

Generative models for images, audio, text and other low-dimension data have achieved great success in recent years. Generating artificial human movements can also be useful for many applications, including improvement of data augmentation methods for human gesture recognition. The object of this research is to develop a generative model for skeletal human movement, allowing to control the action type of generated motion while keeping the authenticity of the result and the natural style variability of gesture execution. We propose to use a conditional Deep Convolutional Generative Adversarial Network (DC-GAN) applied to pseudo-images representing skeletal pose sequences using Tree Structure Skeleton Image format. We evaluate our approach on the 3D-skeleton data provided in the large NTU RGB+D public dataset. Our generative model can output qualitatively correct skeletal human movements for any of its 60 action classes. We also quantitatively evaluate the performance of our model by computing Frechet Inception Distances, which shows strong correlation to human judgement. Up to our knowledge, our work is the first successful class-conditioned generative model for human skeletal motions based on pseudo-image representation of skeletal pose sequences.


Electronics ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 688
Author(s):  
Sung-Wook Park ◽  
Jun-Ho Huh ◽  
Jong-Chan Kim

In the field of deep learning, the generative model did not attract much attention until GANs (generative adversarial networks) appeared. In 2014, Google’s Ian Goodfellow proposed a generative model called GANs. GANs use different structures and objective functions from the existing generative model. For example, GANs use two neural networks: a generator that creates a realistic image, and a discriminator that distinguishes whether the input is real or synthetic. If there are no problems in the training process, GANs can generate images that are difficult even for experts to distinguish in terms of authenticity. Currently, GANs are the most researched subject in the field of computer vision, which deals with the technology of image style translation, synthesis, and generation, and various models have been unveiled. The issues raised are also improving one by one. In image synthesis, BEGAN (Boundary Equilibrium Generative Adversarial Network), which outperforms the previously announced GANs, learns the latent space of the image, while balancing the generator and discriminator. Nonetheless, BEGAN also has a mode collapse wherein the generator generates only a few images or a single one. Although BEGAN-CS (Boundary Equilibrium Generative Adversarial Network with Constrained Space), which was improved in terms of loss function, was introduced, it did not solve the mode collapse. The discriminator structure of BEGAN-CS is AE (AutoEncoder), which cannot create a particularly useful or structured latent space. Compression performance is not good either. In this paper, this characteristic of AE is considered to be related to the occurrence of mode collapse. Thus, we used VAE (Variational AutoEncoder), which added statistical techniques to AE. As a result of the experiment, the proposed model did not cause mode collapse but converged to a better state than BEGAN-CS.


2021 ◽  
Author(s):  
Amin Heyrani Nobari ◽  
Wei Chen ◽  
Faez Ahmed

Abstract Typical engineering design tasks require the effort to modify designs iteratively until they meet certain constraints, i.e., performance or attribute requirements. Past work has proposed ways to solve the inverse design problem, where desired designs are directly generated from specified requirements, thus avoid the trial and error process. Among those approaches, the conditional deep generative model shows great potential since 1) it works for complex high-dimensional designs and 2) it can generate multiple alternative designs given any condition. In this work, we propose a conditional deep generative model, Range-GAN, to achieve automatic design synthesis subject to range constraints. The proposed model addresses the sparse conditioning issue in data-driven inverse design problems by introducing a label-aware self-augmentation approach. We also propose a new uniformity loss to ensure generated designs evenly cover the given requirement range. Through a real-world example of constrained 3D shape generation, we show that the label-aware self-augmentation leads to an average improvement of 14% on the constraint satisfaction for generated 3D shapes, and the uniformity loss leads to a 125% average increase on the uniformity of generated shapes’ attributes. This work laid the foundation for data-driven inverse design problems where we consider range constraints and there are sparse regions in the condition space. For further information and code for this paper please refer to http://decode.mit.edu/projects/rangegan/.


Author(s):  
Yi Yu ◽  
Abhishek Srivastava ◽  
Simon Canales

Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables us to learn and discover latent relationships between interesting lyrics and accompanying melodies. Unfortunately, the limited availability of a paired lyrics–melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory (LSTM)–Generative Adversarial Network for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Extensive experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.


Author(s):  
Lei Ren ◽  
Ying Song

AbstractAmbient occlusion (AO) is a widely-used real-time rendering technique which estimates light intensity on visible scene surfaces. Recently, a number of learning-based AO approaches have been proposed, which bring a new angle to solving screen space shading via a unified learning framework with competitive quality and speed. However, most such methods have high error for complex scenes or tend to ignore details. We propose an end-to-end generative adversarial network for the production of realistic AO, and explore the importance of perceptual loss in the generative model to AO accuracy. An attention mechanism is also described to improve the accuracy of details, whose effectiveness is demonstrated on a wide variety of scenes.


Sign in / Sign up

Export Citation Format

Share Document