Genotype Imputation with Homomorphic Encryption

AbstractGenotype imputation is the statistical inference of unknown genotypes using known population haplotype structures observed in large genomic datasets, such as HapMap and 1000 genomes project. Genotype imputation can help further our understanding of the relationships between genotypes and traits, and is extremely useful for analyses such as genome-wide association studies and expression quantitative loci inference. Increasing the number of genotyped genomes will increase the statistical power for inferring genotype-phenotype relationships, but the amount of data required and the compute-intense nature of the genotype imputation problem overwhelms servers. Hence, many institutions are moving towards outsourcing cloud services to scale up research in a cost effective manner. This raises privacy concerns, which we propose to address via homomorphic encryption. Homomorphic encryption is a type of encryption that allows data analysis on cipher texts, and would thereby avoid the decryption of private genotypes in the cloud. Here we develop an efficient, privacy-preserving genotype imputation algorithm, p-Impute, using homomorphic encryption. Our results showed that the performance of p-Impute is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, and requiring a scalable amount of memory and computational time.

Download Full-text

Ultra-Fast Homomorphic Encryption Models enable Secure Outsourcing of Genotype Imputation

10.1101/2020.07.02.183459 ◽

2020 ◽

Author(s):

Miran Kim ◽

Arif Harmanci ◽

Jean-Philippe Bossuat ◽

Sergiu Carpov ◽

Jung Hee Cheon ◽

...

Keyword(s):

Data Analysis ◽

Homomorphic Encryption ◽

Genomic Data ◽

Genetic Data ◽

Genotype Imputation ◽

Privacy Concerns ◽

Data Owner ◽

Lower Accuracy ◽

Genomic Data Analysis ◽

In Transit

ABSTRACTGenotype imputation is a fundamental step in genomic data analysis such as GWAS, where missing variant genotypes are predicted using the existing genotypes of nearby ‘tag’ variants. Imputation greatly decreases the genotyping cost and provides high-quality estimates of common variant genotypes. As population panels increase, e.g., the TOPMED Project, genotype imputation is becoming more accurate, but it requires high computational power. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. To address this problem, we developed the first fully secure genotype imputation by utilizing ultra-fast homomorphic encryption (HE) techniques that can evaluate millions of imputation models in seconds. In HE-based methods, the genotype data is end-to-end encrypted, i.e., encrypted in transit, at rest, and, most importantly, in analysis, and can be decrypted only by the data owner. We compared secure imputation with three other state-of-the-art non-secure methods under different settings. We found that HE-based methods provide full genetic data security with comparable or slightly lower accuracy. In addition, HE-based methods have time and memory requirements that are comparable and even lower than the non-secure methods. We provide five different implementations and workflows that make use of three cutting-edge HE schemes (BFV, CKKS, TFHE) developed by the top contestants of the iDASH19 Genome Privacy Challenge. Our results provide strong evidence that HE-based methods can practically perform resource-intensive computations for high throughput genetic data analysis. In addition, the publicly available codebases provide a reference for the development of secure genomic data analysis methods.

Download Full-text

Privacy-preserving genotype imputation with fully homomorphic encryption

Cell Systems ◽

10.1016/j.cels.2021.10.003 ◽

2021 ◽

Author(s):

Gamze Gürsoy ◽

Eduardo Chielle ◽

Charlotte M. Brannon ◽

Michail Maniatakos ◽

Mark Gerstein

Keyword(s):

Homomorphic Encryption ◽

Privacy Preserving ◽

Genotype Imputation ◽

Fully Homomorphic Encryption

Download Full-text

Quantum Error Correction Code Scheme used for Homomorphic Encryption like Quantum Computation

Jouranl of Information and Security ◽

10.33778/kcsa.2019.19.3.061 ◽

2019 ◽

Vol 19 (3) ◽

pp. 61-70

Author(s):

Il Kwon Sohn ◽

◽

Jonghyun Lee ◽

Wonhyuk Lee ◽

Woojin Seok ◽

...

Keyword(s):

Error Correction ◽

Quantum Computation ◽

Homomorphic Encryption ◽

Quantum Error Correction ◽

Code Scheme ◽

Error Correction Code ◽

Quantum Error

Download Full-text

Secure Outsourced Association Rule Mining using Homomorphic Encryption

International Journal of Engineering Research and Science ◽

10.25125/engineering-journal-ijoer-sep-2017-22 ◽

2017 ◽

Vol 3 (9) ◽

pp. 70-76

Author(s):

Sandeep Varma ◽

LijiP I

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Homomorphic Encryption ◽

Rule Mining

Download Full-text

Advanced Homomorphic Encryption its Applications and Derivatives (AHEAD)

10.21236/ada590003 ◽

2013 ◽

Author(s):

Tal Rabin ◽

Nigel Smart ◽

Daniel Wichs ◽

Craig Gentry ◽

Zvika Brakerski ◽

...

Keyword(s):

Homomorphic Encryption

Download Full-text

A Neural Network Application of Fully Homomorphic Encryption for Cloud Computing

SSRN Electronic Journal ◽

10.2139/ssrn.3565268 ◽

2020 ◽

Author(s):

Megha Kolhekar ◽

Ashish Pandey ◽

Ayushi Raina ◽

Rijin Thomas ◽

Vaibhav Tiwari ◽

...

Keyword(s):

Neural Network ◽

Cloud Computing ◽

Homomorphic Encryption ◽

Fully Homomorphic Encryption ◽

Network Application

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text