iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching

AbstractThe growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal-genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a novel approach that combines cryptographic privacy-preserving technologies, such as homomorphic encryption and secure multi-party computation, with the auditability of blockchains. This approach provides strong security guarantees against realistic threat models by empowering individual citizens to decide who can query and access their genomic data and by ensuring end-to-end data confidentiality. Our open-source implementation supports queries on the encrypted genomic data of hundreds of thousands of individuals, with minimal overhead. Our work opens a path towards multi-functional, privacy-preserving genomic-data analysis.One Sentence SummaryA citizen-centered open-source response to the privacy concerns that hinder population genomics, based on modern cryptography.

Download Full-text

Towards scalable genomic data access

Nature Computational Science ◽

10.1038/s43588-021-00089-w ◽

2021 ◽

Author(s):

Mikel Hernaez

Keyword(s):

Genomic Data ◽

Data Access

Download Full-text

Efficient Private Conjunctive Query Protocol Over Encrypted Data

Cryptography ◽

10.3390/cryptography5010002 ◽

2021 ◽

Vol 5 (1) ◽

pp. 2

Author(s):

Tushar Kanti Saha ◽

Takeshi Koshiba

Keyword(s):

Homomorphic Encryption ◽

Data Access ◽

Conjunctive Query ◽

Conjunctive Queries ◽

Private Data ◽

Binary Format ◽

Main Technique ◽

Batch Technique ◽

Outsourced Database ◽

Packing Method

Conjunctive queries play a key role in retrieving data from a database. In a database, a query containing many conditions in its predicate, connected by an “and/&/∧” operator, is called a conjunctive query. Retrieving the outcome of a conjunctive query from thousands of records is a heavy computational task. Private data access to an outsourced database is required to keep the database secure from adversaries; thus, private conjunctive queries (PCQs) are indispensable. Cheon, Kim, and Kim (CKK) proposed a PCQ protocol using search-and-compute circuits in which they used somewhat homomorphic encryption (SwHE) for their protocol security. As their protocol is far from being able to be used practically, we propose a practical batch private conjunctive query (BPCQ) protocol by applying a batch technique for processing conjunctive queries over an outsourced database, in which both database and queries are encoded in binary format. As a main technique in our protocol, we develop a new data-packing method to pack many data into a single polynomial with the batch technique. We further enhance the performances of the binary-encoded BPCQ protocol by replacing the binary encoding with N-ary encoding. Finally, we compare the performance to assess the results obtained by the binary-encoded BPCQ protocol and the N-ary-encoded BPCQ protocol.

Download Full-text

Genomic Data-Sharing Practices

The Journal of Law Medicine & Ethics ◽

10.1177/1073110519840482 ◽

2019 ◽

Vol 47 (1) ◽

pp. 31-40 ◽

Cited By ~ 4

Author(s):

Angela G. Villanueva ◽

Robert Cook-Deegan ◽

Jill O. Robinson ◽

Amy L. McGuire ◽

Mary A. Majumder

Keyword(s):

Data Sharing ◽

Medical Information ◽

Public Information ◽

Genomic Data ◽

Data Access ◽

Privacy And Security ◽

Participant Engagement ◽

Information Commons ◽

Derived Data

Making data broadly accessible is essential to creating a medical information commons (MIC). Transparency about data-sharing practices can cultivate trust among prospective and existing MIC participants. We present an analysis of 34 initiatives sharing DNA-derived data based on public information. We describe data-sharing practices captured, including practices related to consent, privacy and security, data access, oversight, and participant engagement. Our results reveal that data-sharing initiatives have some distance to go in achieving transparency.

Download Full-text

Plotgardener: Cultivating precise multi-panel figures in R

10.1101/2021.09.08.459338 ◽

2021 ◽

Author(s):

Nicole E Kramer ◽

Eric S Davis ◽

Craig D Wenger ◽

Erika M Deoudes ◽

Sarah M Parker ◽

...

Keyword(s):

Programming Languages ◽

Genomic Data ◽

Data Access ◽

Manuscript Preparation ◽

Data Sets ◽

New Paradigm ◽

Link Type ◽

Bioconductor Project ◽

Invaluable Tool ◽

R Programming

The R programming language is one of the most widely used programming languages for transforming raw genomic data sets into meaningful biological conclusions through analysis and visualization, which has been largely facilitated by infrastructure and tools developed by the Bioconductor project. However, existing plotting packages rely on relative positioning and sizing of plots, which is often sufficient for exploratory analysis but is poorly suited for the creation of publication-quality multi-panel images inherent to scientific manuscript preparation. We present plotgardener, a coordinate-based genomic data visualization package that offers a new paradigm for multi-plot figure generation in R. Plotgardener allows precise, programmatic control over the placement, aesthetics, and arrangements of plots while maximizing user experience through fast and memory-efficient data access, support for a wide variety of data and file types, and tight integration with the Bioconductor environment. Plotgardener also allows precise placement and sizing of ggplot2 plots, making it an invaluable tool for R users and data scientists from virtually any discipline.AvailabilityPackage: https://bioconductor.org/packages/plotgardenerCode: https://github.com/PhanstielLab/plotgardenerDocumentation: https://phanstiellab.github.io/plotgardener/

Download Full-text

Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave

10.1101/468355 ◽

2018 ◽

Author(s):

Can Kockan ◽

Kaiyuan Zhu ◽

Natnatee Dokmai ◽

Nikolai Karpov ◽

Oguzhan Kulekci ◽

...

Keyword(s):

Data Analysis ◽

Data Structures ◽

Homomorphic Encryption ◽

Genomic Medicine ◽

Genomic Data ◽

Privacy Preserving ◽

Snp Analysis ◽

Data Set ◽

Genomic Data Analysis ◽

Cryptographic Techniques

Current practices in collaborative genomic data analysis (e.g. PCAWG) necessitate all involved parties to exchange individual patient data and perform all analysis locally, or use a trusted server for maintaining all data to perform analysis in a single site (e.g. the Cancer Genome Collaboratory). Since both approaches involve sharing genomic sequence data - which is typically not feasible due to privacy issues, collaborative data analysis remains to be a rarity in genomic medicine. In order to facilitate efficient and effective collaborative or remote genomic computation we introduce SkSES (Sketching algorithms for Secure Enclave based genomic data analysiS), a computational framework for performing data analysis and querying on multiple, individually encrypted genomes from several institutions in an untrusted cloud environment. Unlike other techniques for secure/privacy preserving genomic data analysis, which typically rely on sophisticated cryptographic techniques with prohibitively large computational overheads, SkSES utilizes the secure enclaves supported by current generation microprocessor architectures such as Intel's SGX. The key conceptual contribution of SkSES is its use of sketching data structures that can fit in the limited memory available in a secure enclave. While streaming/sketching algorithms have been developed for many applications in computer science, their feasibility in genomics has remained largely unexplored. On the other hand, even though privacy and security issues are becoming critical in genomic medicine, available cryptographic techniques based on, e.g. homomorphic encryption or garbled circuits, fail to address the performance demands of this rapidly growing field. The alternative offered by Intel's SGX, a combination of hardware and software solutions for secure data analysis, is severely limited by the relatively small size of a secure enclave, a private region of the memory protected from other processes. SkSES addresses this limitation through the use of sketching data structures to support efficient secure and privacy preserving SNP analysis across individually encrypted VCF files from multiple institutions. In particular SkSES provides the users the ability to query for the "k" most significant SNPs among any set of user specified SNPs and any value of "k" - even when the total number of SNPs to be maintained is far beyond the memory capacity of the secure enclave. Results: We tested SkSES on the complete iDASH-2017 competition data set comprised of 1000 case and 1000 control samples related to an unknown phenotype. SkSES was able to identify the top SNPs with respect to the chi-squared statistic, among any user specified subset of SNPs across this data set of 2000 individually encrypted complete human genomes quickly and accurately - demonstrating the feasibility of secure and privacy preserving computation for genomic medicine via Intel's SGX. Availability: https://github.com/ndokmai/sgx-genome-variants-search Contact: [email protected]

Download Full-text

From the principles of genomic data sharing to the practices of data access committees

EMBO Molecular Medicine ◽

10.15252/emmm.201405002 ◽

2015 ◽

Vol 7 (5) ◽

pp. 507-509 ◽

Cited By ~ 30

Author(s):

Mahsa Shabani ◽

Bartha Maria Knoppers ◽

Pascal Borry

Keyword(s):

Data Sharing ◽

Genomic Data ◽

Data Access

Download Full-text

Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study

Journal of Medical Internet Research ◽

10.2196/14384 ◽

2019 ◽

Vol 21 (8) ◽

pp. e14384 ◽

Cited By ~ 5

Author(s):

Kerina H Jones ◽

Helen Daniels ◽

Emma Squires ◽

David V Ford

Keyword(s):

Open Access ◽

Large Scale ◽

Genomic Data ◽

Data Access ◽

Health Data ◽

Social Acceptability ◽

Safe Haven ◽

Use Of Data ◽

Lack Of Information ◽

Public Views

Background The literature abounds with increasing numbers of research studies using genomic data in combination with health data (eg, health records and phenotypic and lifestyle data), with great potential for large-scale research and precision medicine. However, concerns have been raised about social acceptability and risks posed for individuals and their kin. Although there has been public engagement on various aspects of this topic, there is a lack of information about public views on data access models. Objective This study aimed to address the lack of information on the social acceptability of access models for reusing genomic data collected for research in conjunction with health data. Models considered were open web-based access, released externally to researchers, and access within a data safe haven. Methods Views were ascertained using a series of 8 public workshops (N=116). The workshops included an explanation of benefits and risks in using genomic data with health data, a facilitated discussion, and an exit questionnaire. The resulting quantitative data were analyzed using descriptive and inferential statistics, and the qualitative data were analyzed for emerging themes. Results Respondents placed a high value on the reuse of genomic data but raised concerns including data misuse, information governance, and discrimination. They showed a preference for giving consent and use of data within a safe haven over external release or open access. Perceived risks with open access included data being used by unscrupulous parties, with external release included data security, and with safe havens included the need for robust safeguards. Conclusions This is the first known study exploring public views of access models for reusing anonymized genomic and health data in research. It indicated that people are generally amenable but prefer data safe havens because of perceived sensitivities. We recommend that public views be incorporated into guidance on models for the reuse of genomic and health data.

Download Full-text