scholarly journals Is Synthetic The New Real? Performance Analysis of Time Series Generation Techniques with Focus on Network Load Forecasting

Author(s):  
Muhammad Haris Naveed ◽  
Umair Hashmi ◽  
Nayab Tajved ◽  
Neha Sultan ◽  
Ali Imran

This paper explores whether Generative Adversarial Networks (GANs) can produce realistic network load data that can be utilized to train machine learning models in lieu of real data. In this regard, we evaluate the performance of three recent GAN architectures on the Telecom Italia data set across a set of qualitative and quantitative metrics. Our results show that GAN generated synthetic data is indeed similar to real data and forecasting models trained on this data achieve similar performance to those trained on real data.

2021 ◽  
Author(s):  
Muhammad Haris Naveed ◽  
Umair Hashmi ◽  
Nayab Tajved ◽  
Neha Sultan ◽  
Ali Imran

This paper explores whether Generative Adversarial Networks (GANs) can produce realistic network load data that can be utilized to train machine learning models in lieu of real data. In this regard, we evaluate the performance of three recent GAN architectures on the Telecom Italia data set across a set of qualitative and quantitative metrics. Our results show that GAN generated synthetic data is indeed similar to real data and forecasting models trained on this data achieve similar performance to those trained on real data.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260308
Author(s):  
Mauro Castelli ◽  
Luca Manzoni ◽  
Tatiane Espindola ◽  
Aleš Popovič ◽  
Andrea De Lorenzo

Wireless networks are among the fundamental technologies used to connect people. Considering the constant advancements in the field, telecommunication operators must guarantee a high-quality service to keep their customer portfolio. To ensure this high-quality service, it is common to establish partnerships with specialized technology companies that deliver software services in order to monitor the networks and identify faults and respective solutions. A common barrier faced by these specialized companies is the lack of data to develop and test their products. This paper investigates the use of generative adversarial networks (GANs), which are state-of-the-art generative models, for generating synthetic telecommunication data related to Wi-Fi signal quality. We developed, trained, and compared two of the most used GAN architectures: the Vanilla GAN and the Wasserstein GAN (WGAN). Both models presented satisfactory results and were able to generate synthetic data similar to the real ones. In particular, the distribution of the synthetic data overlaps the distribution of the real data for all of the considered features. Moreover, the considered generative models can reproduce the same associations observed for the synthetic features. We chose the WGAN as the final model, but both models are suitable for addressing the problem at hand.


Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2220
Author(s):  
Luis Gonzalez-Abril ◽  
Cecilio Angulo ◽  
Juan-Antonio Ortega ◽  
José-Luis Lopez-Guerra

The digital twin in health care is the dynamic digital representation of the patient’s anatomy and physiology through computational models which are continuously updated from clinical data. Furthermore, used in combination with machine learning technologies, it should help doctors in therapeutic path and in minimally invasive intervention procedures. Confidentiality of medical records is a very delicate issue, therefore some anonymization process is mandatory in order to maintain patients privacy. Moreover, data availability is very limited in some health domains like lung cancer treatment. Hence, generation of synthetic data conformed to real data would solve this issue. In this paper, the use of generative adversarial networks (GAN) for the generation of synthetic data of lung cancer patients is introduced as a tool to solve this problem in the form of anonymized synthetic patients. Generated synthetic patients are validated using both statistical methods, as well as by oncologists using the indirect mortality rate obtained for patients in different stages.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Vajira Thambawita ◽  
Jonas L. Isaksen ◽  
Steven A. Hicks ◽  
Jonas Ghouse ◽  
Gustav Ahlberg ◽  
...  

AbstractRecent global developments underscore the prominent role big data have in modern medical science. But privacy issues constitute a prevalent problem for collecting and sharing data between researchers. However, synthetic data generated to represent real data carrying similar information and distribution may alleviate the privacy issue. In this study, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 10-s 12-lead electrocardiograms (ECGs). We have developed and compared two methods, named WaveGAN* and Pulse2Pulse. We trained the GANs with 7,233 real normal ECGs to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN* to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. Although these synthetic ECGs mimic the dataset used for creation, the ECGs are not linked to any individuals and may thus be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using generative adversarial neural networks on normal ECGs from two population studies, thereby addressing the relevant privacy issues in medical datasets.


2020 ◽  
pp. 1-13
Author(s):  
Yundong Li ◽  
Yi Liu ◽  
Han Dong ◽  
Wei Hu ◽  
Chen Lin

The intrusion detection of railway clearance is crucial for avoiding railway accidents caused by the invasion of abnormal objects, such as pedestrians, falling rocks, and animals. However, detecting intrusions using deep learning methods from infrared images captured at night remains a challenging task because of the lack of sufficient training samples. To address this issue, a transfer strategy that migrates daytime RGB images to the nighttime style of infrared images is proposed in this study. The proposed method consists of two stages. In the first stage, a data generation model is trained on the basis of generative adversarial networks using RGB images and a small number of infrared images, and then, synthetic samples are generated using a well-trained model. In the second stage, a single shot multibox detector (SSD) model is trained using synthetic data and utilized to detect abnormal objects from infrared images at nighttime. To validate the effectiveness of the proposed method, two groups of experiments, namely, railway and non-railway scenes, are conducted. Experimental results demonstrate the effectiveness of the proposed method, and an improvement of 17.8% is achieved for object detection at nighttime.


Geophysics ◽  
2006 ◽  
Vol 71 (5) ◽  
pp. U67-U76 ◽  
Author(s):  
Robert J. Ferguson

The possibility of improving regularization/datuming of seismic data is investigated by treating wavefield extrapolation as an inversion problem. Weighted, damped least squares is then used to produce the regularized/datumed wavefield. Regularization/datuming is extremely costly because of computing the Hessian, so an efficient approximation is introduced. Approximation is achieved by computing a limited number of diagonals in the operators involved. Real and synthetic data examples demonstrate the utility of this approach. For synthetic data, regularization/datuming is demonstrated for large extrapolation distances using a highly irregular recording array. Without approximation, regularization/datuming returns a regularized wavefield with reduced operator artifacts when compared to a nonregularizing method such as generalized phase shift plus interpolation (PSPI). Approximate regularization/datuming returns a regularized wavefield for approximately two orders of magnitude less in cost; but it is dip limited, though in a controllable way, compared to the full method. The Foothills structural data set, a freely available data set from the Rocky Mountains of Canada, demonstrates application to real data. The data have highly irregular sampling along the shot coordinate, and they suffer from significant near-surface effects. Approximate regularization/datuming returns common receiver data that are superior in appearance compared to conventional datuming.


2020 ◽  
Author(s):  
Alceu Bissoto ◽  
Sandra Avila

Melanoma is the most lethal type of skin cancer. Early diagnosis is crucial to increase the survival rate of those patients due to the possibility of metastasis. Automated skin lesion analysis can play an essential role by reaching people that do not have access to a specialist. However, since deep learning became the state-of-the-art for skin lesion analysis, data became a decisive factor in pushing the solutions further. The core objective of this M.Sc. dissertation is to tackle the problems that arise by having limited datasets. In the first part, we use generative adversarial networks to generate synthetic data to augment our classification model’s training datasets to boost performance. Our method generates high-resolution clinically-meaningful skin lesion images, that when compound our classification model’s training dataset, consistently improved the performance in different scenarios, for distinct datasets. We also investigate how our classification models perceived the synthetic samples and how they can aid the model’s generalization. Finally, we investigate a problem that usually arises by having few, relatively small datasets that are thoroughly re-used in the literature: bias. For this, we designed experiments to study how our models’ use data, verifying how it exploits correct (based on medical algorithms), and spurious (based on artifacts introduced during image acquisition) correlations. Disturbingly, even in the absence of any clinical information regarding the lesion being diagnosed, our classification models presented much better performance than chance (even competing with specialists benchmarks), highly suggesting inflated performances.


2020 ◽  
Vol 128 (10-11) ◽  
pp. 2665-2683 ◽  
Author(s):  
Grigorios G. Chrysos ◽  
Jean Kossaifi ◽  
Stefanos Zafeiriou

Abstract Conditional image generation lies at the heart of computer vision and conditional generative adversarial networks (cGAN) have recently become the method of choice for this task, owing to their superior performance. The focus so far has largely been on performance improvement, with little effort in making cGANs more robust to noise. However, the regression (of the generator) might lead to arbitrarily large errors in the output, which makes cGANs unreliable for real-world applications. In this work, we introduce a novel conditional GAN model, called RoCGAN, which leverages structure in the target space of the model to address the issue. Specifically, we augment the generator with an unsupervised pathway, which promotes the outputs of the generator to span the target manifold, even in the presence of intense noise. We prove that RoCGAN share similar theoretical properties as GAN and establish with both synthetic and real data the merits of our model. We perform a thorough experimental validation on large scale datasets for natural scenes and faces and observe that our model outperforms existing cGAN architectures by a large margin. We also empirically demonstrate the performance of our approach in the face of two types of noise (adversarial and Bernoulli).


2020 ◽  
Vol 223 (3) ◽  
pp. 1565-1583
Author(s):  
Hoël Seillé ◽  
Gerhard Visser

SUMMARY Bayesian inversion of magnetotelluric (MT) data is a powerful but computationally expensive approach to estimate the subsurface electrical conductivity distribution and associated uncertainty. Approximating the Earth subsurface with 1-D physics considerably speeds-up calculation of the forward problem, making the Bayesian approach tractable, but can lead to biased results when the assumption is violated. We propose a methodology to quantitatively compensate for the bias caused by the 1-D Earth assumption within a 1-D trans-dimensional Markov chain Monte Carlo sampler. Our approach determines site-specific likelihood functions which are calculated using a dimensionality discrepancy error model derived by a machine learning algorithm trained on a set of synthetic 3-D conductivity training images. This is achieved by exploiting known geometrical dimensional properties of the MT phase tensor. A complex synthetic model which mimics a sedimentary basin environment is used to illustrate the ability of our workflow to reliably estimate uncertainty in the inversion results, even in presence of strong 2-D and 3-D effects. Using this dimensionality discrepancy error model we demonstrate that on this synthetic data set the use of our workflow performs better in 80 per cent of the cases compared to the existing practice of using constant errors. Finally, our workflow is benchmarked against real data acquired in Queensland, Australia, and shows its ability to detect the depth to basement accurately.


Sign in / Sign up

Export Citation Format

Share Document