Genotype Imputation with Homomorphic Encryption

2021 ◽  
Author(s):  
Fook Mun Chan ◽  
Ahmad Qaisar Ahmad Al Badawi ◽  
Jun Jie Sim ◽  
Benjamin Hong Meng Tan ◽  
Foo Chuan Sheng ◽  
...  
Cell Systems ◽  
2021 ◽  
Author(s):  
Miran Kim ◽  
Arif Ozgun Harmanci ◽  
Jean-Philippe Bossuat ◽  
Sergiu Carpov ◽  
Jung Hee Cheon ◽  
...  

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Esha Sarkar ◽  
Eduardo Chielle ◽  
Gamze Gursoy ◽  
Oleg Mazonka ◽  
Mark Gerstein ◽  
...  

2020 ◽  
Author(s):  
Gamze Gürsoy ◽  
Eduardo Chielle ◽  
Charlotte M. Brannon ◽  
Michail Maniatakos ◽  
Mark Gerstein

AbstractGenotype imputation is the statistical inference of unknown genotypes using known population haplotype structures observed in large genomic datasets, such as HapMap and 1000 genomes project. Genotype imputation can help further our understanding of the relationships between genotypes and traits, and is extremely useful for analyses such as genome-wide association studies and expression quantitative loci inference. Increasing the number of genotyped genomes will increase the statistical power for inferring genotype-phenotype relationships, but the amount of data required and the compute-intense nature of the genotype imputation problem overwhelms servers. Hence, many institutions are moving towards outsourcing cloud services to scale up research in a cost effective manner. This raises privacy concerns, which we propose to address via homomorphic encryption. Homomorphic encryption is a type of encryption that allows data analysis on cipher texts, and would thereby avoid the decryption of private genotypes in the cloud. Here we develop an efficient, privacy-preserving genotype imputation algorithm, p-Impute, using homomorphic encryption. Our results showed that the performance of p-Impute is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, and requiring a scalable amount of memory and computational time.


2020 ◽  
Author(s):  
Miran Kim ◽  
Arif Harmanci ◽  
Jean-Philippe Bossuat ◽  
Sergiu Carpov ◽  
Jung Hee Cheon ◽  
...  

ABSTRACTGenotype imputation is a fundamental step in genomic data analysis such as GWAS, where missing variant genotypes are predicted using the existing genotypes of nearby ‘tag’ variants. Imputation greatly decreases the genotyping cost and provides high-quality estimates of common variant genotypes. As population panels increase, e.g., the TOPMED Project, genotype imputation is becoming more accurate, but it requires high computational power. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. To address this problem, we developed the first fully secure genotype imputation by utilizing ultra-fast homomorphic encryption (HE) techniques that can evaluate millions of imputation models in seconds. In HE-based methods, the genotype data is end-to-end encrypted, i.e., encrypted in transit, at rest, and, most importantly, in analysis, and can be decrypted only by the data owner. We compared secure imputation with three other state-of-the-art non-secure methods under different settings. We found that HE-based methods provide full genetic data security with comparable or slightly lower accuracy. In addition, HE-based methods have time and memory requirements that are comparable and even lower than the non-secure methods. We provide five different implementations and workflows that make use of three cutting-edge HE schemes (BFV, CKKS, TFHE) developed by the top contestants of the iDASH19 Genome Privacy Challenge. Our results provide strong evidence that HE-based methods can practically perform resource-intensive computations for high throughput genetic data analysis. In addition, the publicly available codebases provide a reference for the development of secure genomic data analysis methods.


Cell Systems ◽  
2021 ◽  
Author(s):  
Gamze Gürsoy ◽  
Eduardo Chielle ◽  
Charlotte M. Brannon ◽  
Michail Maniatakos ◽  
Mark Gerstein

2013 ◽  
Author(s):  
Tal Rabin ◽  
Nigel Smart ◽  
Daniel Wichs ◽  
Craig Gentry ◽  
Zvika Brakerski ◽  
...  

2020 ◽  
Author(s):  
Megha Kolhekar ◽  
Ashish Pandey ◽  
Ayushi Raina ◽  
Rijin Thomas ◽  
Vaibhav Tiwari ◽  
...  

2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


Sign in / Sign up

Export Citation Format

Share Document