scholarly journals Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?

Author(s):  
Jeremy Geogres-Filteau ◽  
Elisa Cirillo

Abstract After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it. Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging. The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above. Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics. We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.

2020 ◽  
Author(s):  
Jeremy Geogres-Filteau ◽  
Elisa Cirillo

Abstract After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it.Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging.The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above.Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics.We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.


Author(s):  
Jeremy Georges-Filteau ◽  
Elisa Cirillo

After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulation about its distribution. Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking approach to learn generative models efficiently that produce realistic Synthetic Data (SD). They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, simulations in the and marketing industrial sectors known as digital twins, and medical imaging. The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses a multitude of capabilities relevant to common problems in the healthcare: augmenting small dataset, correcting class imbalance, domain translation for rare diseases, let alone preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the COVID-19’s midst, the healthcare system is facing unprecedented challenges, many of which of are data related and could be alleviated by the capabilities of GANs. Considering these facts, publications concerning the development of GAN applied to OHD seemed to be severely lacking. To uncover the reasons for the slow adoption ofGANs for OHD, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD and eval-uating the SD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the choice of metrics ambiguous. We find many publications on the subject, starting slowly in 2017and since then being published at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation,consistency, benchmarking, data modeling, and reproducibility.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Stefan Lenz ◽  
Moritz Hess ◽  
Harald Binder

Abstract Background The best way to calculate statistics from medical data is to use the data of individual patients. In some settings, this data is difficult to obtain due to privacy restrictions. In Germany, for example, it is not possible to pool routine data from different hospitals for research purposes without the consent of the patients. Methods The DataSHIELD software provides an infrastructure and a set of statistical methods for joint, privacy-preserving analyses of distributed data. The contained algorithms are reformulated to work with aggregated data from the participating sites instead of the individual data. If a desired algorithm is not implemented in DataSHIELD or cannot be reformulated in such a way, using artificial data is an alternative. Generating artificial data is possible using so-called generative models, which are able to capture the distribution of given data. Here, we employ deep Boltzmann machines (DBMs) as generative models. For the implementation, we use the package “BoltzmannMachines” from the Julia programming language and wrap it for use with DataSHIELD, which is based on R. Results We present a methodology together with a software implementation that builds on DataSHIELD to create artificial data that preserve complex patterns from distributed individual patient data. Such data sets of artificial patients, which are not linked to real patients, can then be used for joint analyses. As an exemplary application, we conduct a distributed analysis with DBMs on a synthetic data set, which simulates genetic variant data. Patterns from the original data can be recovered in the artificial data using hierarchical clustering of the virtual patients, demonstrating the feasibility of the approach. Additionally, we compare DBMs, variational autoencoders, generative adversarial networks, and multivariate imputation as generative approaches by assessing the utility and disclosure of synthetic data generated from real genetic variant data in a distributed setting with data of a small sample size. Conclusions Our implementation adds to DataSHIELD the ability to generate artificial data that can be used for various analyses, e.g., for pattern recognition with deep learning. This also demonstrates more generally how DataSHIELD can be flexibly extended with advanced algorithms from languages other than R.


2021 ◽  
Vol 25 (1) ◽  
pp. 147-151
Author(s):  
I. V. Lantukh ◽  
N. F. Merkulova ◽  
V. M. Ostapenko

Annotation. The article examines the problem of medical researches, which is so relevant and necessary especially today, during the COVID-19 pandemic. It turns out that medical researches have an ethical nature, due to two interrelated aspects – the first aspect relates to professional medical practice, the second – to the patient's personality. Human medical research is based on the "rule of consent". This is necessary to protect the subject of medical research against various threats. The ethical implications of medical research stem from the need to comply with social requirements. The ratio of internal (professional) and external (public) control over medical research is both moral and social problems. Public control over medical research should be limited to such an extent as to leave room for the professional work of scientists. One aspect of this problem is related to the physical well-being of the subject of medical research: an adequate balance between risk and success is determined solely by the physician. The second aspect is related to the well-being of the person being studied as an individual and comes down to the question of who should determine this balance. Physicians attribute this right exclusively to themselves: only they can obtain the necessary information, without putting pressure on their patients. It is important to affirm the "principle of support" for medical research: the only one who can assess the human aspect of research is the subject himself. At first, the patient usually trusts his doctor, but later he must be able to decide how justified this trust was. The scientist-physician must realize that his future as a researcher depends not only on scientific but also moral qualities. On the other hand, fear of the sad consequences of the experiment should not be an obstacle to scientific progress. Important characteristics of the experiment are its reliability and validity. Therefore, medical experiments are an important tool for the development of medical knowledge about a person, about his health.


Information ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 386
Author(s):  
Şahan Yoruç Selçuk ◽  
Perin Ünal ◽  
Özlem Albayrak ◽  
Moez Jomâa

Digital twins, virtual representations of real-life physical objects or processes, are becoming widely used in many different industrial sectors. One of the main uses of digital twins is predictive maintenance, and these technologies are being adapted to various new applications and datatypes in many industrial processes. The aim of this study was to propose a methodology to generate synthetic vibration data using a digital twin model and a predictive maintenance workflow, consisting of preprocessing, feature engineering, and classification model training, to classify faulty and healthy vibration data for state estimation. To assess the success of the proposed workflow, the mentioned steps were applied to a publicly available vibration dataset and the synthetic data from the digital twin, using five different state-of-the-art classification algorithms. For several of the classification algorithms, the accuracy result for the classification of healthy and faulty data achieved on the public dataset reached approximately 86%, and on the synthetic data, approximately 98%. These results showed the great potential for the proposed methodology, and future work in the area.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260308
Author(s):  
Mauro Castelli ◽  
Luca Manzoni ◽  
Tatiane Espindola ◽  
Aleš Popovič ◽  
Andrea De Lorenzo

Wireless networks are among the fundamental technologies used to connect people. Considering the constant advancements in the field, telecommunication operators must guarantee a high-quality service to keep their customer portfolio. To ensure this high-quality service, it is common to establish partnerships with specialized technology companies that deliver software services in order to monitor the networks and identify faults and respective solutions. A common barrier faced by these specialized companies is the lack of data to develop and test their products. This paper investigates the use of generative adversarial networks (GANs), which are state-of-the-art generative models, for generating synthetic telecommunication data related to Wi-Fi signal quality. We developed, trained, and compared two of the most used GAN architectures: the Vanilla GAN and the Wasserstein GAN (WGAN). Both models presented satisfactory results and were able to generate synthetic data similar to the real ones. In particular, the distribution of the synthetic data overlaps the distribution of the real data for all of the considered features. Moreover, the considered generative models can reproduce the same associations observed for the synthetic features. We chose the WGAN as the final model, but both models are suitable for addressing the problem at hand.


Author(s):  
Thomas P. Trappenberg

This chapter presents an introduction to the important topic of building generative models. These are models that are aimed to understand the variety of a class such as cars or trees. A generative mode should be able to generate feature vectors for instances of the class they represent, and such models should therefore be able to characterize the class with all its variations. The subject is discussed both in a Bayesian and in a deep learning context, and also within a supervised and unsupervised context. This area is related to important algorithms such as k-means clustering, expectation maximization (EM), naïve Bayes, generative adversarial networks (GANs), and variational autoencoders (VAE).


Author(s):  
Judy Simon

Computer vision, also known as computational visual perception, is a branch of artificial intelligence that allows computers to interpret digital pictures and videos in a manner comparable to biological vision. It entails the development of techniques for simulating biological vision. The aim of computer vision is to extract more meaningful information from visual input than that of a biological vision. Computer vision is exploding due to the avalanche of data being produced today. Powerful generative models, such as Generative Adversarial Networks (GANs), are responsible for significant advances in the field of picture creation. The focus of this research is to concentrate on textual content descriptors in the images used by GANs to generate synthetic data from the MNIST dataset to either supplement or replace the original data while training classifiers. This can provide better performance than other traditional image enlarging procedures due to the good handling of synthetic data. It shows that training classifiers on synthetic data are as effective as training them on pure data alone, and it also reveals that, for small training data sets, supplementing the dataset by first training GANs on the data may lead to a significant increase in classifier performance.


2019 ◽  
Vol 214 ◽  
pp. 06003 ◽  
Author(s):  
Kamil Deja ◽  
Tomasz Trzcin´ski ◽  
Łukasz Graczykowski

Simulating the detector response is a key component of every highenergy physics experiment. The methods used currently for this purpose provide high-fidelity results. However, this precision comes at a price of a high computational cost. In this work, we introduce our research aiming at fast generation of the possible responses of detector clusters to particle collisions. We present the results for the real-life example of the Time Projection Chamber in the ALICE experiment at CERN. The essential component of our solution is a generative model that allows to simulate synthetic data points that bear high similarity to the real data. Leveraging recent advancements in machine learning, we propose to use conditional Generative Adversarial Networks. In this work we present a method to simulate data samples possible to record in the detector based on the initial information about particles. We propose and evaluate several models based on convolutional or recursive networks. The main advantage offered by the proposed method is a significant speed-up in the execution time, reaching up to the factor of 102 with respect to the currently used simulation tool. Nevertheless, this speed-up comes at a price of a lower simulation quality. In this work we adapt available methods and show their quantitative and qualitative limitations.


2021 ◽  
Vol 13 (23) ◽  
pp. 13396
Author(s):  
Ghufran Ahmed ◽  
Rauf Ahmed Shams Malick ◽  
Adnan Akhunzada ◽  
Sumaiyah Zahid ◽  
Muhammad Rabeet Sagri ◽  
...  

The poultry industry contributes majorly to the food industry. The demand for poultry chickens raises across the world quality concerns of the poultry chickens. The quality measures in the poultry industry contribute towards the production and supply of their eggs and their meat. With the increasing demand for poultry meat, the precautionary measures towards the well-being of the chickens raises the concerns of the industry stakeholders. The modern technological advancements help the poultry industry in monitoring and tracking the health of poultry chicken. These advancements include the identification of the chickens’ sickness and well-being using video surveillance, voice observations, ans feces examinations by using IoT-based wearable sensing devices such as accelerometers and gyro devices. These motion-sensing devices are placed over a chicken and transmit the chicken’s movement data to the cloud for further analysis. Analyzing such data and providing more accurate predictions about chicken health is a challenging issue. In this paper, an IoT based predictive service framework for the early detection of diseases in poultry chicken is proposed. The proposed study contributes by extending the dataset through generating the synthetic data using Generative Adversarial Networks (GAN). The experimental results classify the sick and healthy chicken in a poultry farms using machine learning classification modeling on the synthetic data and the real dataset. Theoretical analysis and experimental results show that the proposed system has achieved an accuracy of 97%. Moreover, the accuracy of the different classification models are compared in the proposed study to provide more accurate and best performing classification technique. The proposed study is mainly focused on proposing an Industrial IoT-based predictive service framework that can classify poultry chickens more accurately in real time.


Sign in / Sign up

Export Citation Format

Share Document