Creating Artificial Human Genomes Using Generative Models
AbstractGenerative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics.Yet a known limitation of this field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the high dimensional distributions of real genomic datasets and create high quality artificial genomes (AGs) with none to little privacy loss. To illustrate the promising outcomes of our method, we showed that (i) imputation quality for low frequency alleles can be improved by augmenting reference panels with AGs, (ii) scores obtained from selection tests on AGs and real genomes are highly correlated and (iii) AGs can inherit genotype-phenotype associations. AGs have the potential to become valuable assets in genetic studies by providing high quality anonymous substitutes for private databases.