Grouped Correlational Generative Adversarial Networks for Discrete Electronic Health Records

Author(s):  
Fan Yang ◽  
Zhongping Yu ◽  
Yunfan Liang ◽  
Xiaolu Gan ◽  
Kaibiao Lin ◽  
...  
2018 ◽  
Vol 26 (3) ◽  
pp. 228-241 ◽  
Author(s):  
Mrinal Kanti Baowaly ◽  
Chia-Ching Lin ◽  
Chao-Lin Liu ◽  
Kuan-Ta Chen

AbstractObjectiveThe aim of this study was to generate synthetic electronic health records (EHRs). The generated EHR data will be more realistic than those generated using the existing medical Generative Adversarial Network (medGAN) method.Materials and MethodsWe modified medGAN to obtain two synthetic data generation models—designated as medical Wasserstein GAN with gradient penalty (medWGAN) and medical boundary-seeking GAN (medBGAN)—and compared the results obtained using the three models. We used 2 databases: MIMIC-III and National Health Insurance Research Database (NHIRD), Taiwan. First, we trained the models and generated synthetic EHRs by using these three 3 models. We then analyzed and compared the models’ performance by using a few statistical methods (Kolmogorov–Smirnov test, dimension-wise probability for binary data, and dimension-wise average count for count data) and 2 machine learning tasks (association rule mining and prediction).ResultsWe conducted a comprehensive analysis and found our models were adequately efficient for generating synthetic EHR data. The proposed models outperformed medGAN in all cases, and among the 3 models, boundary-seeking GAN (medBGAN) performed the best.DiscussionTo generate realistic synthetic EHR data, the proposed models will be effective in the medical industry and related research from the viewpoint of providing better services. Moreover, they will eliminate barriers including limited access to EHR data and thus accelerate research on medical informatics.ConclusionThe proposed models can adequately learn the data distribution of real EHRs and efficiently generate realistic synthetic EHRs. The results show the superiority of our models over the existing model.


2020 ◽  
Vol 27 (9) ◽  
pp. 1411-1419 ◽  
Author(s):  
Dongha Lee ◽  
Hwanjo Yu ◽  
Xiaoqian Jiang ◽  
Deevakar Rogith ◽  
Meghana Gudala ◽  
...  

Abstract Objective Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients’ independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. Materials and Methods We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Results Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients’ data. Conclusions DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.


2016 ◽  
Vol 34 (2) ◽  
pp. 163-165 ◽  
Author(s):  
William B. Ventres ◽  
Richard M. Frankel

Sign in / Sign up

Export Citation Format

Share Document