scholarly journals Ultra-Fast Homomorphic Encryption Models enable Secure Outsourcing of Genotype Imputation

2020 ◽  
Author(s):  
Miran Kim ◽  
Arif Harmanci ◽  
Jean-Philippe Bossuat ◽  
Sergiu Carpov ◽  
Jung Hee Cheon ◽  
...  

ABSTRACTGenotype imputation is a fundamental step in genomic data analysis such as GWAS, where missing variant genotypes are predicted using the existing genotypes of nearby ‘tag’ variants. Imputation greatly decreases the genotyping cost and provides high-quality estimates of common variant genotypes. As population panels increase, e.g., the TOPMED Project, genotype imputation is becoming more accurate, but it requires high computational power. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. To address this problem, we developed the first fully secure genotype imputation by utilizing ultra-fast homomorphic encryption (HE) techniques that can evaluate millions of imputation models in seconds. In HE-based methods, the genotype data is end-to-end encrypted, i.e., encrypted in transit, at rest, and, most importantly, in analysis, and can be decrypted only by the data owner. We compared secure imputation with three other state-of-the-art non-secure methods under different settings. We found that HE-based methods provide full genetic data security with comparable or slightly lower accuracy. In addition, HE-based methods have time and memory requirements that are comparable and even lower than the non-secure methods. We provide five different implementations and workflows that make use of three cutting-edge HE schemes (BFV, CKKS, TFHE) developed by the top contestants of the iDASH19 Genome Privacy Challenge. Our results provide strong evidence that HE-based methods can practically perform resource-intensive computations for high throughput genetic data analysis. In addition, the publicly available codebases provide a reference for the development of secure genomic data analysis methods.

2018 ◽  
Author(s):  
Can Kockan ◽  
Kaiyuan Zhu ◽  
Natnatee Dokmai ◽  
Nikolai Karpov ◽  
Oguzhan Kulekci ◽  
...  

Current practices in collaborative genomic data analysis (e.g. PCAWG) necessitate all involved parties to exchange individual patient data and perform all analysis locally, or use a trusted server for maintaining all data to perform analysis in a single site (e.g. the Cancer Genome Collaboratory). Since both approaches involve sharing genomic sequence data - which is typically not feasible due to privacy issues, collaborative data analysis remains to be a rarity in genomic medicine. In order to facilitate efficient and effective collaborative or remote genomic computation we introduce SkSES (Sketching algorithms for Secure Enclave based genomic data analysiS), a computational framework for performing data analysis and querying on multiple, individually encrypted genomes from several institutions in an untrusted cloud environment. Unlike other techniques for secure/privacy preserving genomic data analysis, which typically rely on sophisticated cryptographic techniques with prohibitively large computational overheads, SkSES utilizes the secure enclaves supported by current generation microprocessor architectures such as Intel's SGX. The key conceptual contribution of SkSES is its use of sketching data structures that can fit in the limited memory available in a secure enclave. While streaming/sketching algorithms have been developed for many applications in computer science, their feasibility in genomics has remained largely unexplored. On the other hand, even though privacy and security issues are becoming critical in genomic medicine, available cryptographic techniques based on, e.g. homomorphic encryption or garbled circuits, fail to address the performance demands of this rapidly growing field. The alternative offered by Intel's SGX, a combination of hardware and software solutions for secure data analysis, is severely limited by the relatively small size of a secure enclave, a private region of the memory protected from other processes. SkSES addresses this limitation through the use of sketching data structures to support efficient secure and privacy preserving SNP analysis across individually encrypted VCF files from multiple institutions. In particular SkSES provides the users the ability to query for the "k" most significant SNPs among any set of user specified SNPs and any value of "k" - even when the total number of SNPs to be maintained is far beyond the memory capacity of the secure enclave. Results: We tested SkSES on the complete iDASH-2017 competition data set comprised of 1000 case and 1000 control samples related to an unknown phenotype. SkSES was able to identify the top SNPs with respect to the chi-squared statistic, among any user specified subset of SNPs across this data set of 2000 individually encrypted complete human genomes quickly and accurately - demonstrating the feasibility of secure and privacy preserving computation for genomic medicine via Intel's SGX. Availability: https://github.com/ndokmai/sgx-genome-variants-search Contact: [email protected]


Patterns ◽  
2020 ◽  
Vol 1 (6) ◽  
pp. 100093
Author(s):  
Silu Huang ◽  
Charles Blatti ◽  
Saurabh Sinha ◽  
Aditya Parameswaran

Sign in / Sign up

Export Citation Format

Share Document