scholarly journals Design and operation of a distributed health data network

Author(s):  
Jeffrey Brown

IntroductionSeveral large health data networks such as FDA Sentinel, PCORnet, and the Canadian Network of Observational Drug Effect Studies (CNODES) facilitate multi-site research using real-world electronic health data such administrative claims data, electronic health record data and registries. Experience in operation of mutliple health data networks will described. Objectives and ApproachOver the past 15 years substantial progress has been made in developing the optimal network operational design, governance, and technical architecture to facilitate the creation and operation of large-scale distributed health data networks. The design, architecture, and operation of a sustainable health data network requires balancing the needs of the network stakeholders such as funders, data sources, investigators, and regulatory bodies while enabling rapid and efficient use of data to support evidence generation and decision making. Important topics include protection of patient privacy, security, data autonomy, distributed analytics, data quality, and protection of confidential information. ResultsThe design and architecture of existing distributed health data networks provides guidance regarding the potential operational model for new networks and identifies areas of research to improve network functionality and capabilities. Most health data network adopt a common data model approach to facilitate multi-site querying and data quality assessment. This approach is coupled with distributed querying in which data partners maintain physical and operational control of their data. This design maximizes protection of confidential and proprietary information and minimizes the need to share patient-level data. Privacy-preserving distributed regression approaches and methods that obviate the need to share person-level data while generating robust results help to ensure network participation. Strong security and governance structures are also necessary for effective operation of a distributed network. Conclusion/ImplicationsDistributed health data networks offer the opportunity to use real-world data for public health surveillance and comparative safety and effectiveness research across large populations. The operational design, technical and analytic architecture, and governance models of networks drive their acceptance and success.

2019 ◽  
Vol 26 (12) ◽  
pp. 1664-1674 ◽  
Author(s):  
Sophia R Newcomer ◽  
Stan Xu ◽  
Martin Kulldorff ◽  
Matthew F Daley ◽  
Bruce Fireman ◽  
...  

Abstract Objective In health informatics, there have been concerns with reuse of electronic health data for research, including potential bias from incorrect or incomplete outcome ascertainment. In this tutorial, we provide a concise review of predictive value–based quantitative bias analysis (QBA), which comprises epidemiologic methods that use estimates of data quality accuracy to quantify the bias caused by outcome misclassification. Target Audience Health informaticians and investigators reusing large, electronic health data sources for research. Scope When electronic health data are reused for research, validation of outcome case definitions is recommended, and positive predictive values (PPVs) are the most commonly reported measure. Typically, case definitions with high PPVs are considered to be appropriate for use in research. However, in some studies, even small amounts of misclassification can cause bias. In this tutorial, we introduce methods for quantifying this bias that use predictive values as inputs. Using epidemiologic principles and examples, we first describe how multiple factors influence misclassification bias, including outcome misclassification levels, outcome prevalence, and whether outcome misclassification levels are the same or different by exposure. We then review 2 predictive value–based QBA methods and why outcome PPVs should be stratified by exposure for bias assessment. Using simulations, we apply and evaluate the methods in hypothetical electronic health record–based immunization schedule safety studies. By providing an overview of predictive value–based QBA, we hope to bridge the disciplines of health informatics and epidemiology to inform how the impact of data quality issues can be quantified in research using electronic health data sources.


10.2196/16810 ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. e16810 ◽  
Author(s):  
Benjamin Scott Glicksberg ◽  
Shohei Burns ◽  
Rob Currie ◽  
Ann Griffin ◽  
Zhen Jane Wang ◽  
...  

Background Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments, but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of patients with cancer to share their treatment experiences to fuel research, despite potential risks to privacy. Objective The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for the dissemination of deidentified clinical and genomic data with a focus on late-stage cancer. Methods We created and piloted a blockchain-authenticated system to enable secure sharing of deidentified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHRs), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (N=18), which we uploaded to the CGT. EHR data were extracted from both a hospital cancer registry and a common data model (CDM) format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold standard source documentation for patients with available data (n=17). Results Although the total completeness scores were greater for the registry reports than those for the CDM, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting CDM. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world data of patients with cancer in a more clinically useful time frame. We also developed an open-source Web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions Our pilot demonstrates the willingness of patients with cancer to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third-party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e13554-e13554
Author(s):  
Bethany Levick ◽  
Sue Cheeseman ◽  
Eun Ji Nam ◽  
Haewon Doh ◽  
Subin Lim ◽  
...  

e13554 Background: The value of real-world evidence derived from the care of patients managed outside the context of clinical trials is well recognised. However, the ability to link data from multiple centres, especially those from different countries, is complicated by complex legal and information governance differences. The Oncology Evidence Network is a collaboration of large hospital centres, with strong clinical informatics capabilities in six countries in Europe and Asia working with the support of an industrial partner to provide high quality, real world data reflecting routine clinical care. We have developed an efficient workflow based on a study-specific common data model (CDM) clinically validated at each site and analysed with a single analysis script, which embeds a set of data quality rules. Local implementation allows each centre to generate analytical outputs aligned across the different sites without the need for any patient level data to leave the participating site. This approach has been designed and tested in Epithelial Ovarian Cancer (EOC) patients. Methods: A CDM was agreed using expert advisors from each centre. Clinical alignment was achieved through iterative assessment of clinical vignettes, to ensure common definitions of clinical assessment, prognosis, and treatment algorithms in EOC patients. A data guide detailing variable level derivations and validation rules, general data coding principles, and conversions/codes from international coding systems was developed. The analysis scripts were implemented as a bespoke package (OpenOvary) in R. The package includes functions to validate the data against the CDM, and generate a standard output including tables, numerical summaries and Kaplan-Meier analysis of progression and overall survival. Results: 2,925 patient records from 6 centres across 6 countries were included in the study with 27 key data items curated by each centre. Treatment data is available detailing relevant surgical procedures and their outcomes, and regimens of SACT throughout patients’ care from diagnosis to death. Data completeness was generally high for key data items, with missing data ranging from 0-16% for FIGO stage at diagnosis and 0-14% for tumour morphology. The CDM and R script will be made publicly available for other centres to adopt and facilitate analysis of their local data. Conclusions: This collaboration has brought together a substantial body of data describing the care and outcomes for EOC patients. A CDM and flexible shared analysis approach enabled unified analysis and reporting whilst avoiding the transfer of patient level data and its pooling into a common database. The process of clinical and data alignment has generated a replicable model for rapid extension to other study centres to join the EOC study, or application to other disease areas.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Changgee Chang ◽  
Yi Deng ◽  
Xiaoqian Jiang ◽  
Qi Long

Abstract Distributed health data networks (DHDNs) leverage data from multiple sources or sites such as electronic health records (EHRs) from multiple healthcare systems and have drawn increasing interests in recent years, as they do not require sharing of subject-level data and hence lower the hurdles for collaboration between institutions considerably. However, DHDNs face a number of challenges in data analysis, particularly in the presence of missing data. The current state-of-the-art methods for handling incomplete data require pooling data into a central repository before analysis, which is not feasible in DHDNs. In this paper, we address the missing data problem in distributed environments such as DHDNs that has not been investigated previously. We develop communication-efficient distributed multiple imputation methods for incomplete data that are horizontally partitioned. Since subject-level data are not shared or transferred outside of each site in the proposed methods, they enhance protection of patient privacy and have the potential to strengthen public trust in analysis of sensitive health data. We investigate, through extensive simulation studies, the performance of these methods. Our methods are applied to the analysis of an acute stroke dataset collected from multiple hospitals, mimicking a DHDN where health data are horizontally partitioned across hospitals and subject-level data cannot be shared or sent to a central data repository.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e19358-e19358
Author(s):  
Shohei Burns ◽  
Eric Andrew Collisson

e19358 Background: Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of cancer patients to share their treatment experiences to fuel research, despite potential risks to privacy. The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for dissemination of de-identified clinical and genomic data with a focus on late stage cancer. Methods: We created and piloted a blockchain-authenticated system to enable securely sharing de-identified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHR), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (n = 18), which we uploaded to CGT. EHR data were extracted from both a hospital cancer registry and a common data model format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold-standard source documentation for patients with available data (n = 17). Results: While the total completeness scores were greater for the registry reports than the common data model, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting common data model. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world cancer patient data in a more clinically useful timeframe. We also developed an open-source web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions: Our pilot demonstrates the willingness of cancer patients to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.


2014 ◽  
Vol 33 (7) ◽  
pp. 1178-1186 ◽  
Author(s):  
Lesley H. Curtis ◽  
Jeffrey Brown ◽  
Richard Platt

2020 ◽  
Vol 27 (10) ◽  
pp. 1520-1528 ◽  
Author(s):  
Andrew P Reimer ◽  
Alex Milinovich

Abstract Objective Patients that undergo medical transfer represent 1 patient population that remains infrequently studied due to challenges in aggregating data across multiple domains and sources that are necessary to capture the entire episode of patient care. To facilitate access to and secondary use of transport patient data, we developed the Transport Data Repository that combines data from 3 separate domains and many sources within our health system. Methods The repository is a relational database anchored by the Unified Medical Language System unique concept identifiers to integrate, map, and standardize the data into a common data model. Primary data domains included sending and receiving hospital encounters, medical transport record, and custom hospital transport log data. A 4-step mapping process was developed: 1) automatic source code match, 2) exact text match, 3) fuzzy matching, and 4) manual matching. Results 431 090 total mappings were generated in the Transport Data Repository, consisting of 69 010 unique concepts with 77% of the data being mapped automatically. Transport Source Data yielded significantly lower mapping results with only 8% of data entities automatically mapped and a significant amount (43%) remaining unmapped. Discussion The multistep mapping process resulted in a majority of data been automatically mapped. Poor matching of transport medical record data is due to the third-party vendor data being generated and stored in a nonstandardized format. Conclusion The multistep mapping process developed and implemented is necessary to normalize electronic health data from multiple domains and sources into a common data model to support secondary use of data.


Sign in / Sign up

Export Citation Format

Share Document