scholarly journals Bringing Code to Data: Do Not Forget Governance (Preprint)

2020 ◽  
Author(s):  
Christine Suver ◽  
Adrian Thorogood ◽  
Megan Doerr ◽  
John Wilbanks ◽  
Bartha Knoppers

UNSTRUCTURED Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context.

10.2196/18087 ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. e18087
Author(s):  
Christine Suver ◽  
Adrian Thorogood ◽  
Megan Doerr ◽  
John Wilbanks ◽  
Bartha Knoppers

Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context.


2019 ◽  
Vol 16 (5) ◽  
pp. 539-546 ◽  
Author(s):  
Frank Rockhold ◽  
Christina Bromley ◽  
Erin K Wagner ◽  
Marc Buyse

Open data sharing and access has the potential to promote transparency and reproducibility in research, contribute to education and training, and prompt innovative secondary research. Yet, there are many reasons why researchers don’t share their data. These include, among others, time and resource constraints, patient data privacy issues, lack of access to appropriate funding, insufficient recognition of the data originators’ contribution, and the concern that commercial or academic competitors may benefit from analyses based on shared data. Nevertheless, there is a positive interest within and across the research and patient communities to create shared data resources. In this perspective, we will try to highlight the spectrum of “openness” and “data access” that exists at present and highlight the strengths and weakness of current data access platforms, present current examples of data sharing platforms, and propose guidelines to revise current data sharing practices going forward.


Cryptography ◽  
2019 ◽  
Vol 3 (1) ◽  
pp. 7 ◽  
Author(s):  
Karuna Pande Joshi ◽  
Agniva Banerjee

An essential requirement of any information management system is to protect data and resources against breach or improper modifications, while at the same time ensuring data access to legitimate users. Systems handling personal data are mandated to track its flow to comply with data protection regulations. We have built a novel framework that integrates semantically rich data privacy knowledge graph with Hyperledger Fabric blockchain technology, to develop an automated access-control and audit mechanism that enforces users' data privacy policies while sharing their data with third parties. Our blockchain based data-sharing solution addresses two of the most critical challenges: transaction verification and permissioned data obfuscation. Our solution ensures accountability for data sharing in the cloud by incorporating a secure and efficient system for End-to-End provenance. In this paper, we describe this framework along with the comprehensive semantically rich knowledge graph that we have developed to capture rules embedded in data privacy policy documents. Our framework can be used by organizations to automate compliance of their Cloud datasets.


2020 ◽  
Vol 43 (4) ◽  
pp. 1-23 ◽  
Author(s):  
Jessica Mozersky ◽  
Heidi Walsh ◽  
Meredith Parsons ◽  
Tristan McIntosh ◽  
Kari Baldwin ◽  
...  

Data sharing maximizes the value of data, which is time and resource intensive to collect. Major funding bodies in the United States (US), like the National Institutes of Health (NIH), require data sharing and researchers frequently share de-identified quantitative data. In contrast, qualitative data are rarely shared in the US but the increasing trend towards data sharing and open science suggest this may be required in future. Qualitative methods are often used to explore sensitive health topics raising unique ethical challenges regarding protecting confidentiality while maintaining enough contextual detail for secondary analyses. Here, we report findings from semi-structured in-depth interviews with 30 data repository curators, 30 qualitative researchers, and 30 IRB staff members to explore their experience and knowledge of QDS. Our findings indicate that all stakeholder groups lack preparedness for QDS. Researchers are the least knowledgeable and are often unfamiliar with the concept of sharing qualitative data in a repository. Curators are highly supportive of QDS, but not all have experienced curating qualitative data sets and indicated they would like guidance and standards specific to QDS. IRB members lack familiarity with QDS although they support it as long as proper legal and regulatory procedures are followed. IRB members and data curators are not prepared to advise researchers on legal and regulatory matters, potentially leaving researchers who have the least knowledge with no guidance. Ethical and productive QDS will require overcoming barriers, creating standards, and changing long held practices among all stakeholder groups.


2021 ◽  
Author(s):  
Anita Jwa ◽  
Russell Poldrack

Sharing data is a scientific imperative that accelerates scientific discoveries, reinforces open science inquiry, and allows for efficient use of public investment and research resources. Considering these benefits, data sharing has been widely promoted in diverse fields and neuroscience has been no exception to this movement. For all its promise, however, the sharing of human neuroimaging data raises critical ethical and legal issues, such as data privacy. Recently, the heightened risks to data privacy posed by the exponential development in artificial intelligence and machine learning techniques has made data sharing more challenging; the regulatory landscape around data sharing has also been evolving rapidly. Here we present an in-depth ethical and regulatory analysis that will examine how neuroimaging data are currently shared against the backdrop of the relevant regulations and policies and how advanced software tools and algorithms might undermine subjects’ privacy in neuroimaging data sharing. This analysis will inform researchers on responsible practice of neuroimaging data sharing and shed light on a regulatory framework to provide adequate protection of neuroimaging data while maximizing the benefits of data sharing.


Author(s):  
N. Bessis ◽  
T. French ◽  
M. Burakova-Lorgnier ◽  
W. Huang

This chapter is about conceptualizing the applicability of grid related technologies for supporting intelligence in decision-making. It aims to discuss how the open grid service architecture—data, access integration (OGSA-DAI) can facilitate the discovery of and controlled access to vast data-sets, to assist intelligence in decision making. Trust is also identified as one of the main challenges for intelligence in decision-making. On this basis, the implications and challenges of using grid technologies to serve this purpose are also discussed. To further the explanation of the concepts and practices associated with the process of intelligence in decision-making using grid technologies, a minicase is employed incorporating a scenario. That is to say, “Synergy Financial Solutions Ltd” is presented as the minicase, so as to provide the reader with a central and continuous point of reference.


Author(s):  
Longzhi Yang ◽  
Jie Li ◽  
Noe Elisa ◽  
Tom Prickett ◽  
Fei Chao

AbstractBig data refers to large complex structured or unstructured data sets. Big data technologies enable organisations to generate, collect, manage, analyse, and visualise big data sets, and provide insights to inform diagnosis, prediction, or other decision-making tasks. One of the critical concerns in handling big data is the adoption of appropriate big data governance frameworks to (1) curate big data in a required manner to support quality data access for effective machine learning and (2) ensure the framework regulates the storage and processing of the data from providers and users in a trustworthy way within the related regulatory frameworks (both legally and ethically). This paper proposes a framework of big data governance that guides organisations to make better data-informed business decisions within the related regularity framework, with close attention paid to data security, privacy, and accessibility. In order to demonstrate this process, the work also presents an example implementation of the framework based on the case study of big data governance in cybersecurity. This framework has the potential to guide the management of big data in different organisations for information sharing and cooperative decision-making.


2021 ◽  
pp. 174077452110385
Author(s):  
Enrique Vazquez ◽  
Henri Gouraud ◽  
Florian Naudet ◽  
Cary P Gross ◽  
Harlan M Krumholz ◽  
...  

Background/Aims: Over the past decade, numerous data sharing platforms have been launched, providing access to de-identified individual patient-level data and supporting documentation. We evaluated the characteristics of prominent clinical data sharing platforms, including types of studies listed as available for request, data requests received, and rates of dissemination of research findings from data requests. Methods: We reviewed publicly available information listed on the websites of six prominent clinical data sharing platforms: Biological Specimen and Data Repository Information Coordinating Center, ClinicalStudyDataRequest.com , Project Data Sphere, Supporting Open Access to Researchers–Bristol Myers Squibb, Vivli, and the Yale Open Data Access Project. We recorded key platform characteristics, including listed studies and available supporting documentation, information on the number and status of data requests, and rates of dissemination of research findings from data requests (i.e. publications in a peer-reviewed journals, preprints, conference abstracts, or results reported on the platform’s website). Results: The number of clinical studies listed as available for request varied among five data sharing platforms: Biological Specimen and Data Repository Information Coordinating Center (n = 219), ClinicalStudyDataRequest.com (n = 2,897), Project Data Sphere (n = 154), Vivli (n = 5426), and the Yale Open Data Access Project (n = 395); Supporting Open Access to Researchers did not provide a list of Bristol Myers Squibb studies available for request. Individual patient-level data were nearly always reported as being available for request, as opposed to only Clinical Study Reports (Biological Specimen and Data Repository Information Coordinating Center = 211/219 (96.3%); ClinicalStudyDataRequest.com  = 2884/2897 (99.6%); Project Data Sphere = 154/154 (100.0%); and the Yale Open Data Access Project = 355/395 (89.9%)); Vivli did not provide downloadable study metadata. Of 1201 data requests listed on ClinicalStudyDataRequest.com , Supporting Open Access to Researchers–Bristol Myers Squibb, Vivli, and the Yale Open Data Access Project platforms, 586 requests (48.8%) were approved (i.e. data access granted). The majority were for secondary analyses and/or developing/validating methods ( ClinicalStudyDataRequest.com  = 262/313 (83.7%); Supporting Open Access to Researchers–Bristol Myers Squibb = 22/30 (73.3%); Vivli = 63/84 (75.0%); the Yale Open Data Access Project = 111/159 (69.8%)); four were for re-analyses or corroborations of previous research findings ( ClinicalStudyDataRequest.com  = 3/313 (1.0%) and the Yale Open Data Access Project = 1/159 (0.6%)). Ninety-five (16.1%) approved data requests had results disseminated via peer-reviewed publications ( ClinicalStudyDataRequest.com  = 61/313 (19.5%); Supporting Open Access to Researchers–Bristol Myers Squibb = 3/30 (10.0%); Vivli = 4/84 (4.8%); the Yale Open Data Access Project = 27/159 (17.0%)). Forty-two (6.8%) additional requests reported results through preprints, conference abstracts, or on the platform’s website ( ClinicalStudyDataRequest.com  = 12/313 (3.8%); Supporting Open Access to Researchers–Bristol Myers Squibb = 3/30 (10.0%); Vivli = 2/84 (2.4%); Yale Open Data Access Project = 25/159 (15.7%)). Conclusion: Across six prominent clinical data sharing platforms, information on studies and request metrics varied in availability and format. Most data requests focused on secondary analyses and approximately one-quarter of all approved requests publicly disseminated their results. To further promote the use of shared clinical data, platforms should increase transparency, consistently clarify the availability of the listed studies and supporting documentation, and ensure that research findings from data requests are disseminated.


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Jennifer A Brody ◽  
Alanna Morrison ◽  
Joshua C Bis ◽  
Jeffrey O’Connell ◽  
Jennifer Huffman ◽  
...  

The growing volume and complexity of whole-genome sequence (WGS) and multi-omic data requires new analytic approaches beyond those developed for the GWAS era. In response to this challenge, we present an Analysis Commons, which brings together genotype and phenotype data from multiple studies along with a suite of powerful and validated analysis tools into a secure cloud-computing framework that is equitably accessible by associated investigators. This framework is designed to address the emerging challenges of multi-center WGS analyses—data sharing mechanisms, phenotype harmonization, -omics integration, annotation—and the need for flexible, secure, efficient, high-performance computing for numerous users. The Analysis Commons is built on the DNAnexus cloud platform, which provides large parallel compute resources and robust security protocols. To permit multi-center data sharing, we implemented two parallel data sharing approaches: (1) a multi-lateral consortium agreement that enables data sharing across multiple studies, and (2) coordinated dbGaP applications among groups of institutions. Investigators with detailed knowledge of the phenotypes and contributing studies harmonize data from multiple sources for maintenance in a central database. The Analysis Commons supports multiple association-analysis software packages, as well as tools for annotation and visualization. Importantly, approved investigators have full access to the combined data sets, facilitating the rapid development and deployment of new methods. We demonstrate the Analysis Commons model with an analysis of fibrinogen in 3999 participants from the Old Order Amish Study and the Framingham Heart Study with WGS from the Trans-Omics for Precision Medicine (TOPMed) Program. We performed and validated single-variant and SKAT analyses using GENESIS and MMAP pipelines, accounting for relatedness with linear mixed models. We confirmed a known association of a nonsynonymous variant in FGG (p=2.5e-9, MAF=0.34%, rs148685782) . No other single variant or SKAT association was significant after correcting for the number of tests. Analyses were run in parallel across 1408 cores and took less than one hour of wall-clock time. The Analysis Commons offers the necessary infrastructure support for analysis of WGS and multi-omic data in a setting that empowers phenotype, analytic, and computational experts to transform raw data into knowledge of the determinants of cardiovascular health.


2018 ◽  
Vol 27 (01) ◽  
pp. 005-006 ◽  
Author(s):  
John Holmes ◽  
Lina Soualmia ◽  
Brigitte Séroussi

Objectives: To provide an introduction to the 2018 International Medical Informatics Association (IMIA) Yearbook by the editors. Methods: This editorial provides an overview and introduction to the 2018 IMIA Yearbook which special topic is: “Between access and privacy: Challenges in sharing health data”. The special topic editors and section are discussed, and the new section of the 2018 Yearbook, Cancer Informatics, is introduced. Changes in the Yearbook editorial team are also described. Results: With the exponential burgeoning of health-related data, and attendant demands for sharing and using these data, the special topic for 2018 is noteworthy for its timeliness. Data sharing brings responsibility for preservation of data privacy, and for this, patient perspectives are of paramount importance in understanding how patients view their health data and how their privacy should be protected. Conclusion: With the increase in availability of health-related data from many different sources and contexts, there is an urgent need for informaticians to become aware of their role in maintaining the balance between data sharing and privacy.


Sign in / Sign up

Export Citation Format

Share Document