Methods for enhancing the reproducibility of clinical epidemiology research in linked electronic health records: results and lessons learned from the CALIBER platform

ABSTRACTObjectivesElectronic health records (EHR) across primary, secondary, and tertiary care are increasingly being linked for research at a population level. The increasing volume, variety, velocity, and veracity of big biomedical data makes research reproducibility challenging. Research reproducibility and replicability is essential for the external validity and generalizability of scientific findings and the lack of standardized approaches and tools and relative opaqueness of data manipulation methods is detrimental to their integrity. The objective of this study was to explore, evaluate and propose methods, tools and approaches for addressing some of the challenges associated with reproducibility when using linked national electronic health records for research. ApproachWe systematically searched literature and internet resources for well-established and appropriate methods, tools, and approaches used in related scientific disciplines. The identified techniques were systematically evaluated in terms of their capacity to facilitate reproducible research in routinely collected health data across the life course of a research project: from protocol creation and raw data curation to data transformation and statistical analysis though to finding dissemination and impact. Most importantly, the identified techniques were tested and applied in a contemporary database of linked electronic health records. CALIBER is a research data platform of linked national electronic health records from primary care (Clinical Practice Research Datalink), secondary care (Hospital Episode Statistics), acute coronary syndrome disease registry (Myocardial Ischaemia National Audit Project) and cause-specific mortality (Office for National Statistics) for roughly 2 million adults. ResultsFirstly, we present the review of methods and approaches which we identified through our search. Secondly, we propose a set of recommendations for applying them within the context of research projects making use of linked routinely collected health data. Focal interests included: a) documentation of data (attributes, relationships, and interpretation), b) data processing (source code, instructions, and parameters), c) results (visualizations, figures), and any supplementary material. Thirdly, we present approaches around a) raw data curation using international metadata standards, b) study protocol encoding, c) provenance and sharing of data transformation and statistical analysis operations, d) public and private data retention, and e) computable EHR-driven phenotypes. ConclusionThe complexity and size of routinely collected health data is increasing through linkages across distributed data sources. The scientific community benefits from findings which can be replicated. This study presents a number of methods, tools and approaches across the project life course for ensuring that their research studies are reproducible and replicable from the wider scientific community.

Download Full-text

Patterns of care for people presenting to Australian general practice with musculoskeletal complaints based on routinely collected data: protocol for an observational cohort study using the Population Level Analysis and Reporting (POLAR) database

BMJ Open ◽

10.1136/bmjopen-2021-055528 ◽

2021 ◽

Vol 11 (9) ◽

pp. e055528

Author(s):

Romi Haas ◽

Ljoudmila Busija ◽

Alexandra Gorelik ◽

Denise A O'Connor ◽

Christopher Pearce ◽

...

Keyword(s):

General Practice ◽

Cohort Study ◽

Electronic Health Records ◽

Population Level ◽

Health Data ◽

Patterns Of Care ◽

Health Records ◽

Routinely Collected Health Data ◽

Electronic Health ◽

Level Analysis

IntroductionGeneral practice is integral to the Australian healthcare system. Outcome Health’s POpulation Level Analysis and Reporting (POLAR) database uses de-identified electronic health records to analyse general practice data in Australia. Previous studies using routinely collected health data for research have not consistently reported the codes and algorithms used to describe the population, exposures, interventions and outcomes in sufficient detail to allow replication. This paper reports a study protocol investigating patterns of care for people presenting with musculoskeletal conditions to general practice in Victoria, Australia. Its focus is on the systematic approach used to classify and select eligible records from the POLAR database to facilitate replication. This will be useful for other researchers using routinely collected health data for research.Methods and analysisThis is a retrospective cohort study. Patient-related data will be obtained through electronic health records from a subset of general practices across three primary health networks (PHN) in southeastern Victoria. Data for patients with a low back, neck, shoulder and/or knee condition and who received at least one general practitioner (GP) face-to-face consultation between 1 January 2014 and 31 December 2018 will be included. Data quality checks will be conducted to exclude patients with poor data recording and/or non-continuous follow-up. Relational data files with eligible and valid records will be merged to select the study cohort and the GP care received (consultations, imaging requests, prescriptions and referrals) between diagnosis and 31 December 2018. Number and characteristics of patients and GPs, and number, type and timing of imaging requests, prescriptions for pain relief and referrals to other health providers will be investigated.Ethics and disseminationEthics approval was obtained from the Cabrini and Monash University Human Research Ethics Committees (Reference Numbers 02-21-01-19 and 16975, respectively). Study findings will be reported to Outcome Health, participating PHNs, disseminated in academic journals and presented in conferences.

Download Full-text

Knowledge Driven Phenotyping

10.1101/19013748 ◽

2019 ◽

Author(s):

Honghan Wu ◽

Minhong Wang ◽

Qianyi Zeng ◽

Wenjun Chen ◽

Jeff Z. Pan ◽

...

Keyword(s):

Electronic Health Records ◽

Heterogeneous Data ◽

Health Data ◽

Data Sources ◽

Health Records ◽

Knowledge And Skills ◽

Preliminary Results ◽

Routinely Collected Health Data ◽

Electronic Health

AbstractExtracting patient phenotypes from routinely collected health data (such as Electronic Health Records) requires translating clinically-sound phenotype definitions into queries/computations executable on the underlying data sources by clinical researchers. This requires significant knowledge and skills to deal with heterogeneous and often imperfect data. Translations are time-consuming, error-prone and, most importantly, hard to share and reproduce across different settings. This paper proposes a knowledge driven framework that (1) decouples the specification of phenotype semantics from underlying data sources; (2) can automatically populate and conduct phenotype computations on heterogeneous data spaces. We report preliminary results of deploying this framework on five Scottish health datasets.

Download Full-text

The process of sourcing and preparing electronic health records data to implement a machine-learning algorithm for early identification of maternal cardiovascular risk (Preprint)

10.2196/preprints.34932 ◽

2021 ◽

Author(s):

Nawar Shara ◽

Kelley M. Anderson ◽

Noor Falah ◽

Maryam F. Ahmad ◽

Darya Tavazoei ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Risk ◽

Electronic Health Records ◽

Patient Care ◽

Early Identification ◽

Health Data ◽

Patient Specific ◽

Patient Records ◽

Health Records ◽

Electronic Health

BACKGROUND Healthcare data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes [6]. However, the differences that exist in each individual’s health records, combined with the lack of health-data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. While these problems exist throughout healthcare, they are especially prevalent within maternal health, and exacerbate the maternal morbidity and mortality (MMM) crisis in the United States. OBJECTIVE Maternal patient records were extracted from the electronic health records (EHRs) of a large tertiary healthcare system and made into patient-specific, complete datasets through a systematic method so that a machine-learning-based (ML-based) risk-assessment algorithm could effectively identify maternal cardiovascular risk prior to evidence of diagnosis or intervention within the patient’s record. METHODS We outline the effort that was required to define the specifications of the computational systems, the dataset, and access to relevant systems, while ensuring data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for its use by a proprietary risk-stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient’s baselines to inform early interventions. RESULTS Patient records can be made actionable for the goal of effectively employing machine learning (ML), specifically to identify cardiovascular risk in pregnant patients. CONCLUSIONS Upon acquiring data, including the concatenation, anonymization, and normalization of said data across multiple EHRs, the use of a machine-learning-based (ML-based) tool can provide early identification of cardiovascular risk in pregnant patients. CLINICALTRIAL N/A

Download Full-text

Exchanging personal health data with electronic health records: A standardized information model for patient generated health data and observations of daily living

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2018.10.006 ◽

2018 ◽

Vol 120 ◽

pp. 116-125 ◽

Cited By ~ 3

Author(s):

Panagiotis Plastiras ◽

Dympna O’Sullivan

Keyword(s):

Electronic Health Records ◽

Daily Living ◽

Information Model ◽

Health Data ◽

Personal Health ◽

Health Records ◽

Electronic Health

Download Full-text

Determinants of Privacy Concerns and Intention to Share Personal Health Data on Electronic Health Records--Model

PsycTESTS Dataset ◽

10.1037/t82057-000 ◽

2021 ◽

Author(s):

Emna Cherif ◽

Nora Bezaz ◽

Manel Mzoughi

Keyword(s):

Electronic Health Records ◽

Health Data ◽

Personal Health ◽

Health Records ◽

Privacy Concerns ◽

Electronic Health

Download Full-text

Archetype-Based Electronic Health Records: A Literature Review and Evaluation of Their Applicability to Health Data Interoperability and Access

Health Information Management Journal ◽

10.1177/183335830903800202 ◽

2009 ◽

Vol 38 (2) ◽

pp. 7-17 ◽

Cited By ~ 23

Author(s):

Dennis Wollersheim ◽

Anny Sari ◽

Wenny Rahayu

Keyword(s):

Literature Review ◽

Electronic Health Records ◽

Health Data ◽

Data Interoperability ◽

Health Records ◽

Electronic Health

Download Full-text

What is the Impact of Electronic Health Records on the Quality of Health Data?

Health Information Management Journal ◽

10.1177/183335831404300106 ◽

2014 ◽

Vol 43 (1) ◽

pp. 42-43 ◽

Cited By ~ 6

Author(s):

Joanne Callen

Keyword(s):

Electronic Health Records ◽

Health Data ◽

Health Records ◽

Electronic Health ◽

The Impact

Download Full-text

Facilitating the ethical use of health data for the benefit of society: electronic health records, consent and the duty of easy rescue

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2016.0130 ◽

2016 ◽

Vol 374 (2083) ◽

pp. 20160130 ◽

Cited By ~ 29

Author(s):

Sebastian Porsdam Mann ◽

Julian Savulescu ◽

Barbara J. Sahakian

Keyword(s):

Informed Consent ◽

Electronic Health Records ◽

Data Science ◽

Real Life ◽

Health Data ◽

Sensitive Information ◽

Minimal Risk ◽

Sensitive Data ◽

Health Records ◽

Electronic Health

Advances in data science allow for sophisticated analysis of increasingly large datasets. In the medical context, large volumes of data collected for healthcare purposes are contained in electronic health records (EHRs). The real-life character and sheer amount of data contained in them make EHRs an attractive resource for public health and biomedical research. However, medical records contain sensitive information that could be misused by third parties. Medical confidentiality and respect for patients' privacy and autonomy protect patient data, barring access to health records unless consent is given by the data subject. This creates a situation in which much of the beneficial records-based research is prevented from being used or is seriously undermined, because the refusal of consent by some patients introduces a systematic deviation, known as selection bias, from a representative sample of the general population, thus distorting research findings. Although research exemptions for the requirement of informed consent exist, they are rarely used in practice due to concerns over liability and a general culture of caution. In this paper, we argue that the problem of research access to sensitive data can be understood as a tension between the medical duties of confidentiality and beneficence. We attempt to show that the requirement of informed consent is not appropriate for all kinds of records-based research by distinguishing studies involving minimal risk from those that feature moderate or greater risks. We argue that the duty of easy rescue—the principle that persons should benefit others when this can be done at no or minimal risk to themselves—grounds the removal of consent requirements for minimally risky records-based research. Drawing on this discussion, we propose a risk-adapted framework for the facilitation of ethical uses of health data for the benefit of society. This article is part of the themed issue ‘The ethical impact of data science’.

Download Full-text

Robust measurement of the real world effectiveness of Tofacitinib for the treatment of Ulcerative Colitis using electronic health records: a protocol and statistical analysis plan v1 (protocols.io.2bqgamw)

protocols.io ◽

10.17504/protocols.io.2bqgamw ◽

2019 ◽

Author(s):

Vivek A ◽

Atul J

Keyword(s):

Ulcerative Colitis ◽

Statistical Analysis ◽

Electronic Health Records ◽

Real World ◽

Statistical Analysis Plan ◽

Health Records ◽

The Real ◽

Electronic Health

Download Full-text

Healthchain: A novel framework on privacy preservation of electronic health records using blockchain technology

PLoS ONE ◽

10.1371/journal.pone.0243043 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243043

Author(s):

Shekha Chenthara ◽

Khandakar Ahmed ◽

Hua Wang ◽

Frank Whittaker ◽

Zhenxiang Chen

Keyword(s):

Electronic Health Records ◽

Data Security ◽

Data Privacy ◽

Health Data ◽

Cyber Attacks ◽

Security Risk ◽

Patient Privacy ◽

Health Records ◽

Blockchain Technology ◽

Electronic Health

The privacy of Electronic Health Records (EHRs) is facing a major hurdle with outsourcing private health data in the cloud as there exists danger of leaking health information to unauthorized parties. In fact, EHRs are stored on centralized databases that increases the security risk footprint and requires trust in a single authority which cannot effectively protect data from internal attacks. This research focuses on ensuring the patient privacy and data security while sharing the sensitive data across same or different organisations as well as healthcare providers in a distributed environment. This research develops a privacy-preserving framework viz Healthchain based on Blockchain technology that maintains security, privacy, scalability and integrity of the e-health data. The Blockchain is built on Hyperledger fabric, a permissioned distributed ledger solutions by using Hyperledger composer and stores EHRs by utilizing InterPlanetary File System (IPFS) to build this healthchain framework. Moreover, the data stored in the IPFS is encrypted by using a unique cryptographic public key encryption algorithm to create a robust blockchain solution for electronic health data. The objective of the research is to provide a foundation for developing security solutions against cyber-attacks by exploiting the inherent features of the blockchain, and thus contribute to the robustness of healthcare information sharing environments. Through the results, the proposed model shows that the healthcare records are not traceable to unauthorized access as the model stores only the encrypted hash of the records that proves effectiveness in terms of data security, enhanced data privacy, improved data scalability, interoperability and data integrity while sharing and accessing medical records among stakeholders across the healthchain network.

Download Full-text