Learning About Missing Data Mechanisms in Electronic Health Records-based Research

While electronic health records data provide unique opportunities for research, numerous methodological issues must be considered. Among these, selection bias due to incomplete/missing data has received far less attention than other issues. Unfortunately, standard missing data approaches (e.g. inverse-probability weighting and multiple imputation) generally fail to acknowledge the complex interplay of heterogeneous decisions made by patients, providers, and health systems that govern whether specific data elements in the electronic health records are observed. This, in turn, renders the missing-at-random assumption difficult to believe in standard approaches. In the clinical literature, the collection of decisions that gives rise to the observed data is referred to as the data provenance. Building on a recently-proposed framework for modularizing the data provenance, we develop a general and scalable framework for estimation and inference with respect to regression models based on inverse-probability weighting that allows for a hierarchy of missingness mechanisms to better align with the complex nature of electronic health records data. We show that the proposed estimator is consistent and asymptotically Normal, derive the form of the asymptotic variance, and propose two consistent estimators. Simulations show that naïve application of standard methods may yield biased point estimates, that the proposed estimators have good small-sample properties, and that researchers may have to contend with a bias-variance trade-off as they consider how to handle missing data. The proposed methods are motivated by an on-going, electronic health records-based study of bariatric surgery.

Download Full-text

Handling the Missing Data Problem in Electronic Health Records for Cancer Prediction

Spring Simulation Conference (SpringSim 2020) ◽

10.22360/springsim.2020.msm.006 ◽

2020 ◽

Keyword(s):

Missing Data ◽

Electronic Health Records ◽

Health Records ◽

Cancer Prediction ◽

Missing Data Problem ◽

Data Problem ◽

Electronic Health

Download Full-text

Interpatient Similarity-based Imputation of Missing Data in Electronic Health Records

2019 IEEE International Conference on Healthcare Informatics (ICHI) ◽

10.1109/ichi.2019.8904868 ◽

2019 ◽

Author(s):

Ali Jazayeri ◽

Ou Stella Liang ◽

Christopher C. Yang

Keyword(s):

Missing Data ◽

Electronic Health Records ◽

Health Records ◽

Electronic Health

Download Full-text

Challenges associated with missing data in electronic health records: A case study of a risk prediction model for diabetes using data from Slovenian primary care

Health Informatics Journal ◽

10.1177/1460458217733288 ◽

2017 ◽

Vol 25 (3) ◽

pp. 951-959 ◽

Cited By ~ 4

Author(s):

Gregor Stiglic ◽

Primoz Kocbek ◽

Nino Fijacko ◽

Aziz Sheikh ◽

Majda Pajnkihar

Keyword(s):

Missing Data ◽

Electronic Health Records ◽

Prediction Model ◽

Electronic Health Record ◽

Health Record ◽

Electronic Health Record Data ◽

Health Records ◽

Record Data ◽

Electronic Health

The increasing availability of data stored in electronic health records brings substantial opportunities for advancing patient care and population health. This is, however, fundamentally dependant on the completeness and quality of data in these electronic health records. We sought to use electronic health record data to populate a risk prediction model for identifying patients with undiagnosed type 2 diabetes mellitus. We, however, found substantial (up to 90%) amounts of missing data in some healthcare centres. Attempts at imputing for these missing data or using reduced dataset by removing incomplete records resulted in a major deterioration in the performance of the prediction model. This case study illustrates the substantial wasted opportunities resulting from incomplete records by simulation of missing and incomplete records in predictive modelling process. Government and professional bodies need to prioritise efforts to address these data shortcomings in order to ensure that electronic health record data are maximally exploited for patient and population benefit.

Download Full-text