scholarly journals Improving GAN with Neighbors Embedding and Gradient Matching

Author(s):  
Ngoc-Trung Tran ◽  
Tuan-Anh Bui ◽  
Ngai-Man Cheung

We propose two new techniques for training Generative Adversarial Networks (GANs) in the unsupervised setting. Our objectives are to alleviate mode collapse in GAN and improve the quality of the generated samples. First, we propose neighbor embedding, a manifold learning-based regularization to explicitly retain local structures of latent samples in the generated samples. This prevents generator from producing nearly identical data samples from different latent samples, and reduces mode collapse. We propose an inverse t-SNE regularizer to achieve this. Second, we propose a new technique, gradient matching, to align the distributions of the generated samples and the real samples. As it is challenging to work with high-dimensional sample distributions, we propose to align these distributions through the scalar discriminator scores. We constrain the difference between the discriminator scores of the real samples and generated ones. We further constrain the difference between the gradients of these discriminator scores. We derive these constraints from Taylor approximations of the discriminator function. We perform experiments to demonstrate that our proposed techniques are computationally simple and easy to be incorporated in existing systems. When Gradient matching and Neighbour embedding are applied together, our GN-GAN achieves outstanding results on 1D/2D synthetic, CIFAR-10 and STL-10 datasets, e.g. FID score of 30.80 for the STL-10 dataset. Our code is available at: https://github.com/tntrung/gan

Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1231
Author(s):  
Xiangde Zhang ◽  
Jian Zhang

Mode collapse has always been a fundamental problem in generative adversarial networks. The recently proposed Zero Gradient Penalty (0GP) regularization can alleviate the mode collapse, but it will exacerbate a discriminator’s misjudgment problem, that is the discriminator judges that some generated samples are more real than real samples. In actual training, the discriminator will direct the generated samples to point to samples with higher discriminator outputs. The serious misjudgment problem of the discriminator will cause the generator to generate unnatural images and reduce the quality of the generation. This paper proposes Real Sample Consistency (RSC) regularization. In the training process, we randomly divided the samples into two parts and minimized the loss of the discriminator’s outputs corresponding to these two parts, forcing the discriminator to output the same value for all real samples. We analyzed the effectiveness of our method. The experimental results showed that our method can alleviate the discriminator’s misjudgment and perform better with a more stable training process than 0GP regularization. Our real sample consistency regularization improved the FID score for the conditional generation of Fake-As-Real GAN (FARGAN) from 14.28 to 9.8 on CIFAR-10. Our RSC regularization improved the FID score from 23.42 to 17.14 on CIFAR-100 and from 53.79 to 46.92 on ImageNet2012. Our RSC regularization improved the average distance between the generated and real samples from 0.028 to 0.025 on synthetic data. The loss of the generator and discriminator in standard GAN with our regularization was close to the theoretical loss and kept stable during the training process.


2018 ◽  
Vol 8 (12) ◽  
pp. 2351 ◽  
Author(s):  
Caidan Zhao ◽  
Mingxian Shi ◽  
Zhibiao Cai ◽  
Caiyun Chen

Nowadays, it is more and more important to deal with the potential security issues of internet-of-things (IoT). Indeed, using the physical layer features of IoT wireless signals to achieve individual identity authentication is an effective way to enhance the security of IoT. However, traditional classifiers need to know all the categories in advance to get the recognition models. Realistically, it is difficult to collect all types of samples, which will result in some mistakes that the unknown target class may be decided as a known one. Consequently, this paper constructs an improving open-categorical classification model based on the generative adversarial networks (OCC-GAN) to solve the above problems. Here, we have modified the loss function of the generative model G and the discriminative model D. Compared to the traditional GAN model which can generate the fake sample overlapping with the real samples, our proposed G model generates the fake samples as negative samples which are evenly surrounding with the real samples, while the D model learns to distinguish between real samples and fake samples. Besides, we add auxiliary training not only to gain a better recognition result but also to improve the efficiency of the model. Furthermore, Our proposed model is verified through experimental study. Compared to other common methods, such as one-class support vector machine (OC-SVM) and one-versus-rest support vector machine (OvR-SVM), the OCC-GAN model has a better performance. The recognition rate of the OCC-GAN model can reach more than 90% with a recall rate of 97% by the data of the IoT module.


2021 ◽  
Vol 11 (2) ◽  
pp. 721
Author(s):  
Hyung Yong Kim ◽  
Ji Won Yoon ◽  
Sung Jun Cheon ◽  
Woo Hyun Kang ◽  
Nam Soo Kim

Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.


Author(s):  
Khaled ELKarazle ◽  
Valliappan Raman ◽  
Patrick Then

Age estimation models can be employed in many applications, including soft biometrics, content access control, targeted advertising, and many more. However, as some facial images are taken in unrestrained conditions, the quality relegates, which results in the loss of several essential ageing features. This study investigates how introducing a new layer of data processing based on a super-resolution generative adversarial network (SRGAN) model can influence the accuracy of age estimation by enhancing the quality of both the training and testing samples. Additionally, we introduce a novel convolutional neural network (CNN) classifier to distinguish between several age classes. We train one of our classifiers on a reconstructed version of the original dataset and compare its performance with an identical classifier trained on the original version of the same dataset. Our findings reveal that the classifier which trains on the reconstructed dataset produces better classification accuracy, opening the door for more research into building data-centric machine learning systems.


2021 ◽  
Vol 15 ◽  
Author(s):  
Jiasong Wu ◽  
Xiang Qiu ◽  
Jing Zhang ◽  
Fuzhi Wu ◽  
Youyong Kong ◽  
...  

Generative adversarial networks and variational autoencoders (VAEs) provide impressive image generation from Gaussian white noise, but both are difficult to train, since they need a generator (or encoder) and a discriminator (or decoder) to be trained simultaneously, which can easily lead to unstable training. To solve or alleviate these synchronous training problems of generative adversarial networks (GANs) and VAEs, researchers recently proposed generative scattering networks (GSNs), which use wavelet scattering networks (ScatNets) as the encoder to obtain features (or ScatNet embeddings) and convolutional neural networks (CNNs) as the decoder to generate an image. The advantage of GSNs is that the parameters of ScatNets do not need to be learned, while the disadvantage of GSNs is that their ability to obtain representations of ScatNets is slightly weaker than that of CNNs. In addition, the dimensionality reduction method of principal component analysis (PCA) can easily lead to overfitting in the training of GSNs and, therefore, affect the quality of generated images in the testing process. To further improve the quality of generated images while keeping the advantages of GSNs, this study proposes generative fractional scattering networks (GFRSNs), which use more expressive fractional wavelet scattering networks (FrScatNets), instead of ScatNets as the encoder to obtain features (or FrScatNet embeddings) and use similar CNNs of GSNs as the decoder to generate an image. Additionally, this study develops a new dimensionality reduction method named feature-map fusion (FMF) instead of performing PCA to better retain the information of FrScatNets,; it also discusses the effect of image fusion on the quality of the generated image. The experimental results obtained on the CIFAR-10 and CelebA datasets show that the proposed GFRSNs can lead to better generated images than the original GSNs on testing datasets. The experimental results of the proposed GFRSNs with deep convolutional GAN (DCGAN), progressive GAN (PGAN), and CycleGAN are also given.


2019 ◽  
Vol 9 (18) ◽  
pp. 3908 ◽  
Author(s):  
Jintae Kim ◽  
Shinhyeok Oh ◽  
Oh-Woog Kwon ◽  
Harksoo Kim

To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot model in which previous utterances participate in response generation using different weights. The proposed model calculates the contextual importance of previous utterances by using an attention mechanism. In addition, we propose a training method that uses two types of Wasserstein generative adversarial networks to improve the quality of responses. In experiments with the DailyDialog dataset, the proposed model outperformed the previous state-of-the-art models based on various performance measures.


2020 ◽  
Vol 12 (16) ◽  
pp. 2586 ◽  
Author(s):  
Pawel Burdziakowski

The visual data acquisition from small unmanned aerial vehicles (UAVs) may encounter a situation in which blur appears on the images. Image blurring caused by camera motion during exposure significantly impacts the images interpretation quality and consequently the quality of photogrammetric products. On blurred images, it is difficult to visually locate ground control points, and the number of identified feature points decreases rapidly together with an increasing blur kernel. The nature of blur can be non-uniform, which makes it hard to forecast for traditional deblurring methods. Due to the above, the author of this publication concluded that the neural methods developed in recent years were able to eliminate blur on UAV images with an unpredictable or highly variable blur nature. In this research, a new, rapid method based on generative adversarial networks (GANs) was applied for deblurring. A data set for neural network training was developed based on real aerial images collected over the last few years. More than 20 full sets of photogrammetric products were developed, including point clouds, orthoimages and digital surface models. The sets were generated from both blurred and deblurred images using the presented method. The results presented in the publication show that the method for improving blurred photo quality significantly contributed to an improvement in the general quality of typical photogrammetric products. The geometric accuracy of the products generated from deblurred photos was maintained despite the rising blur kernel. The quality of textures and input photos was increased. This research proves that the developed method based on neural networks can be used for deblur, even in highly blurred images, and it significantly increases the final geometric quality of the photogrammetric products. In practical cases, it will be possible to implement an additional feature in the photogrammetric software, which will eliminate unwanted blur and allow one to use almost all blurred images in the modelling process.


2020 ◽  
Vol 34 (04) ◽  
pp. 4852-4859
Author(s):  
Jinduo Liu ◽  
Junzhong Ji ◽  
Guangxu Xun ◽  
Liuyi Yao ◽  
Mengdi Huai ◽  
...  

Inferring effective connectivity between different brain regions from functional magnetic resonance imaging (fMRI) data is an important advanced study in neuroinformatics in recent years. However, current methods have limited usage in effective connectivity studies due to the high noise and small sample size of fMRI data. In this paper, we propose a novel framework for inferring effective connectivity based on generative adversarial networks (GAN), named as EC-GAN. The proposed framework EC-GAN infers effective connectivity via an adversarial process, in which we simultaneously train two models: a generator and a discriminator. The generator consists of a set of effective connectivity generators based on structural equation models which can generate the fMRI time series of each brain region via effective connectivity. Meanwhile, the discriminator is employed to distinguish between the joint distributions of the real and generated fMRI time series. Experimental results on simulated data show that EC-GAN can better infer effective connectivity compared to other state-of-the-art methods. The real-world experiments indicate that EC-GAN can provide a new and reliable perspective analyzing the effective connectivity of fMRI data.


Author(s):  
Bingcai Wei ◽  
Liye Zhang ◽  
Kangtao Wang ◽  
Qun Kong ◽  
Zhuang Wang

AbstractExtracting traffic information from images plays an increasingly significant role in Internet of vehicle. However, due to the high-speed movement and bumps of the vehicle, the image will be blurred during image acquisition. In addition, in rainy days, as a result of the rain attached to the lens, the target will be blocked by rain, and the image will be distorted. These problems have caused great obstacles for extracting key information from transportation images, which will affect the real-time judgment of vehicle control system on road conditions, and further cause decision-making errors of the system and even have a bearing on traffic accidents. In this paper, we propose a motion-blurred restoration and rain removal algorithm for IoV based on generative adversarial network and transfer learning. Dynamic scene deblurring and image de-raining are both among the challenging classical research directions in low-level vision tasks. For both tasks, firstly, instead of using ReLU in a conventional residual block, we designed a residual block containing three 256-channel convolutional layers, and we used the Leaky-ReLU activation function. Secondly, we used generative adversarial networks for the image deblurring task with our Resblocks, as well as the image de-raining task. Thirdly, experimental results on the synthetic blur dataset GOPRO and the real blur dataset RealBlur confirm the effectiveness of our model for image deblurring. Finally, as an image de-raining task based on transfer learning, we can fine-tune the pre-trained model with less training data and show good results on several datasets used for image rain removal.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Ruixin Ma ◽  
Junying Lou ◽  
Peng Li ◽  
Jing Gao

Generating pictures from text is an interesting, classic, and challenging task. Benefited from the development of generative adversarial networks (GAN), the generation quality of this task has been greatly improved. Many excellent cross modal GAN models have been put forward. These models add extensive layers and constraints to get impressive generation pictures. However, complexity and computation of existing cross modal GANs are too high to be deployed in mobile terminal. To solve this problem, this paper designs a compact cross modal GAN based on canonical polyadic decomposition. We replace an original convolution layer with three small convolution layers and use an autoencoder to stabilize and speed up training. The experimental results show that our model achieves 20% times of compression in both parameters and FLOPs without loss of quality on generated images.


Sign in / Sign up

Export Citation Format

Share Document