scholarly journals Citizen-Centered, Auditable, and Privacy-Preserving Population Genomics

2019 ◽  
Author(s):  
Dennis Grishin ◽  
Jean Louis Raisaro ◽  
Juan Ramón Troncoso-Pastoriza ◽  
Kamal Obbad ◽  
Kevin Quinn ◽  
...  

AbstractThe growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal-genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a novel approach that combines cryptographic privacy-preserving technologies, such as homomorphic encryption and secure multi-party computation, with the auditability of blockchains. This approach provides strong security guarantees against realistic threat models by empowering individual citizens to decide who can query and access their genomic data and by ensuring end-to-end data confidentiality. Our open-source implementation supports queries on the encrypted genomic data of hundreds of thousands of individuals, with minimal overhead. Our work opens a path towards multi-functional, privacy-preserving genomic-data analysis.One Sentence SummaryA citizen-centered open-source response to the privacy concerns that hinder population genomics, based on modern cryptography.

2019 ◽  
Author(s):  
Nour Almadhoun ◽  
Erman Ayday ◽  
Özgür Ulusoy

Abstract Motivation The rapid progress in genome sequencing has led to high availability of genomic data. However, due to growing privacy concerns about the participant’s sensitive information, accessing results and data of genomic studies is restricted to only trusted individuals. On the other hand, paving the way to biomedical discoveries requires granting open access to genomic databases. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach but it does not consider the dependence between tuples in a database, which may degrade the privacy guarantees offered by the DP. Results In this work, focusing on genomic databases, we show this drawback of DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an inference attack on differentially private query results by utilizing the correlations between the tuples in the dataset. The results show that the adversary can infer sensitive genomic data about a user from the differentially private query results by exploiting correlations between genomes of family members. Second, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions. Availability https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Neelu khare ◽  
Kumaran U.

The tremendous growth of social networking systems enables the active participation of a wide variety of users. This has led to an increased probability of security and privacy concerns. In order to solve the issue, the article defines a secure and privacy-preserving approach to protect user data across Cloud-based online social networks. The proposed approach models social networks as a directed graph, such that a user can share sensitive information with other users only if there exists a directed edge from one user to another. The connectivity between data users data is efficiently shared using an attribute-based encryption (ABE) with different data access levels. The proposed ABE technique makes use of a trapdoor function to re-encrypt the data without the use of proxy re-encryption techniques. Experimental evaluation states that the proposed approach provides comparatively better results than the existing techniques.


2018 ◽  
Author(s):  
Can Kockan ◽  
Kaiyuan Zhu ◽  
Natnatee Dokmai ◽  
Nikolai Karpov ◽  
Oguzhan Kulekci ◽  
...  

Current practices in collaborative genomic data analysis (e.g. PCAWG) necessitate all involved parties to exchange individual patient data and perform all analysis locally, or use a trusted server for maintaining all data to perform analysis in a single site (e.g. the Cancer Genome Collaboratory). Since both approaches involve sharing genomic sequence data - which is typically not feasible due to privacy issues, collaborative data analysis remains to be a rarity in genomic medicine. In order to facilitate efficient and effective collaborative or remote genomic computation we introduce SkSES (Sketching algorithms for Secure Enclave based genomic data analysiS), a computational framework for performing data analysis and querying on multiple, individually encrypted genomes from several institutions in an untrusted cloud environment. Unlike other techniques for secure/privacy preserving genomic data analysis, which typically rely on sophisticated cryptographic techniques with prohibitively large computational overheads, SkSES utilizes the secure enclaves supported by current generation microprocessor architectures such as Intel's SGX. The key conceptual contribution of SkSES is its use of sketching data structures that can fit in the limited memory available in a secure enclave. While streaming/sketching algorithms have been developed for many applications in computer science, their feasibility in genomics has remained largely unexplored. On the other hand, even though privacy and security issues are becoming critical in genomic medicine, available cryptographic techniques based on, e.g. homomorphic encryption or garbled circuits, fail to address the performance demands of this rapidly growing field. The alternative offered by Intel's SGX, a combination of hardware and software solutions for secure data analysis, is severely limited by the relatively small size of a secure enclave, a private region of the memory protected from other processes. SkSES addresses this limitation through the use of sketching data structures to support efficient secure and privacy preserving SNP analysis across individually encrypted VCF files from multiple institutions. In particular SkSES provides the users the ability to query for the "k" most significant SNPs among any set of user specified SNPs and any value of "k" - even when the total number of SNPs to be maintained is far beyond the memory capacity of the secure enclave. Results: We tested SkSES on the complete iDASH-2017 competition data set comprised of 1000 case and 1000 control samples related to an unknown phenotype. SkSES was able to identify the top SNPs with respect to the chi-squared statistic, among any user specified subset of SNPs across this data set of 2000 individually encrypted complete human genomes quickly and accurately - demonstrating the feasibility of secure and privacy preserving computation for genomic medicine via Intel's SGX. Availability: https://github.com/ndokmai/sgx-genome-variants-search Contact: [email protected]


2021 ◽  
Vol 13 (11) ◽  
pp. 2221
Author(s):  
Munirah Alkhelaiwi ◽  
Wadii Boulila ◽  
Jawad Ahmad ◽  
Anis Koubaa ◽  
Maha Driss

Satellite images have drawn increasing interest from a wide variety of users, including business and government, ever since their increased usage in important fields ranging from weather, forestry and agriculture to surface changes and biodiversity monitoring. Recent updates in the field have also introduced various deep learning (DL) architectures to satellite imagery as a means of extracting useful information. However, this new approach comes with its own issues, including the fact that many users utilize ready-made cloud services (both public and private) in order to take advantage of built-in DL algorithms and thus avoid the complexity of developing their own DL architectures. However, this presents new challenges to protecting data against unauthorized access, mining and usage of sensitive information extracted from that data. Therefore, new privacy concerns regarding sensitive data in satellite images have arisen. This research proposes an efficient approach that takes advantage of privacy-preserving deep learning (PPDL)-based techniques to address privacy concerns regarding data from satellite images when applying public DL models. In this paper, we proposed a partially homomorphic encryption scheme (a Paillier scheme), which enables processing of confidential information without exposure of the underlying data. Our method achieves robust results when applied to a custom convolutional neural network (CNN) as well as to existing transfer learning methods. The proposed encryption scheme also allows for training CNN models on encrypted data directly, which requires lower computational overhead. Our experiments have been performed on a real-world dataset covering several regions across Saudi Arabia. The results demonstrate that our CNN-based models were able to retain data utility while maintaining data privacy. Security parameters such as correlation coefficient (−0.004), entropy (7.95), energy (0.01), contrast (10.57), number of pixel change rate (4.86), unified average change intensity (33.66), and more are in favor of our proposed encryption scheme. To the best of our knowledge, this research is also one of the first studies that applies PPDL-based techniques to satellite image data in any capacity.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rastislav Hekel ◽  
Jaroslav Budis ◽  
Marcel Kucharik ◽  
Jan Radvanszky ◽  
Zuzana Pös ◽  
...  

Abstract Background The current and future applications of genomic data may raise ethical and privacy concerns. Processing and storing of this data introduce a risk of abuse by potential offenders since the human genome contains sensitive personal information. For this reason, we have developed a privacy-preserving method, named Varlock providing secure storage of sequenced genomic data. We used a public set of population allele frequencies to mask the personal alleles detected in genomic reads. Each personal allele described by the public set is masked by a randomly selected population allele with respect to its frequency. Masked alleles are preserved in an encrypted confidential file that can be shared in whole or in part using public-key cryptography. Results Our method masked the personal variants and introduced new variants detected in a personal masked genome. Alternative alleles with lower population frequency were masked and introduced more often. We performed a joint PCA analysis of personal and masked VCFs, showing that the VCFs between the two groups cannot be trivially mapped. Moreover, the method is reversible and personal alleles in specific genomic regions can be unmasked on demand. Conclusion Our method masks personal alleles within genomic reads while preserving valuable non-sensitive properties of sequenced DNA fragments for further research. Personal alleles in the desired genomic regions may be restored and shared with patients, clinics, and researchers. We suggest that the method can provide an additional security layer for storing and sharing of the raw aligned reads.


2020 ◽  
Author(s):  
Miran Kim ◽  
Arif Harmanci ◽  
Jean-Philippe Bossuat ◽  
Sergiu Carpov ◽  
Jung Hee Cheon ◽  
...  

ABSTRACTGenotype imputation is a fundamental step in genomic data analysis such as GWAS, where missing variant genotypes are predicted using the existing genotypes of nearby ‘tag’ variants. Imputation greatly decreases the genotyping cost and provides high-quality estimates of common variant genotypes. As population panels increase, e.g., the TOPMED Project, genotype imputation is becoming more accurate, but it requires high computational power. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. To address this problem, we developed the first fully secure genotype imputation by utilizing ultra-fast homomorphic encryption (HE) techniques that can evaluate millions of imputation models in seconds. In HE-based methods, the genotype data is end-to-end encrypted, i.e., encrypted in transit, at rest, and, most importantly, in analysis, and can be decrypted only by the data owner. We compared secure imputation with three other state-of-the-art non-secure methods under different settings. We found that HE-based methods provide full genetic data security with comparable or slightly lower accuracy. In addition, HE-based methods have time and memory requirements that are comparable and even lower than the non-secure methods. We provide five different implementations and workflows that make use of three cutting-edge HE schemes (BFV, CKKS, TFHE) developed by the top contestants of the iDASH19 Genome Privacy Challenge. Our results provide strong evidence that HE-based methods can practically perform resource-intensive computations for high throughput genetic data analysis. In addition, the publicly available codebases provide a reference for the development of secure genomic data analysis methods.


2020 ◽  
Author(s):  
Rastislav Hekel ◽  
Jaroslav Budiš ◽  
Marcel Kucharík ◽  
Jan Radvanszky ◽  
Tomáš Szemes

AbstractIntroductionCurrent and future applications of genomic data may raise ethical and privacy concerns. Processing and storing genomic data introduces a risk of abuse by a potential adversary since the human genome contains information about sensitive personal traits. For this reason, we developed a privacy preserving method, called Varlock, for secure storage and dissemination of sequenced genomic data.Materials and methodsThe Varlock uses a set of population allele frequencies to mask personal alleles detected in genomic reads. Each detected allele is replaced by a randomly selected population allele concerning its frequency. Masked alleles are preserved in an encrypted confidential file that can be shared, in whole or in part, using public-key cryptography.ResultsOur method masked personal variants and introduced new variants called on an individual’s genome, while alternative alleles with lower population frequency were masked and introduced more often. We performed joint PCA analysis of personal and masked VCFs, showing that the VCFs between the two groups can not be trivially mapped. Moreover, the method is reversible; therefore, personal alleles can be unmasked in specific genomic regions on demand.ConclusionOur method masks personal alleles within mapped reads while preserving valuable non-sensitive properties of sequenced DNA fragments for further research. Accordingly, masked reads can be stored publicly, since they are deprived of sensitive personal information. Personal alleles may be restored in arbitrary genomic regions for interested parties: patients, medical units, and researchers.


2020 ◽  
Vol 13 (S7) ◽  
Author(s):  
Tsung-Ting Kuo ◽  
Xiaoqian Jiang ◽  
Haixu Tang ◽  
XiaoFeng Wang ◽  
Tyler Bath ◽  
...  

2018 ◽  
Vol 1 ◽  
pp. 1-23 ◽  
Author(s):  
Dennis Grishin ◽  
Kamal Obbad ◽  
Preston Estep ◽  
Kevin Quinn ◽  
Sarah Wait Zaranek ◽  
...  

Author(s):  
Neelu khare ◽  
Kumaran U.

The tremendous growth of social networking systems enables the active participation of a wide variety of users. This has led to an increased probability of security and privacy concerns. In order to solve the issue, the article defines a secure and privacy-preserving approach to protect user data across Cloud-based online social networks. The proposed approach models social networks as a directed graph, such that a user can share sensitive information with other users only if there exists a directed edge from one user to another. The connectivity between data users data is efficiently shared using an attribute-based encryption (ABE) with different data access levels. The proposed ABE technique makes use of a trapdoor function to re-encrypt the data without the use of proxy re-encryption techniques. Experimental evaluation states that the proposed approach provides comparatively better results than the existing techniques.


Sign in / Sign up

Export Citation Format

Share Document