Deriving Weight from Big Data: A Comparison of Body Weight Measurement Cleaning Algorithms (Preprint)
BACKGROUND Patient body weight is a frequently utilized measure in biomedical studies, yet there are exist no standard methods for processing and cleaning weight data. Conflicting documentation on constructing body weight measurements presents challenges for research and program evaluation. OBJECTIVE We sought to describe and compare methods for extracting and cleaning weight data from electronic health record (EHR) databases to develop guidelines for standardized approaches that promote reproducibility. METHODS We conducted a systematic review of studies that used Veterans Health Administration (VHA) EHR weight data, published from 2008 – 2018 and documented the algorithms for constructing patient weight. We applied these algorithms to a cohort of veterans with at least one Primary Care visit in 2016. The resulting weight measures were compared at the patient and site levels. RESULTS We identified 496 studies and included 62 that utilized weight as outcome variables; 48% included a replicable algorithm. Algorithms varied from cut-offs of implausible weights to complex models using measures within patient over time. We found differences in the number of weight values after applying the algorithms (86% to 99% of raw data) and decreased variance (SD = 68 to 54), but little difference in average weights across methods (216 to 220 lbs.). The percent of patients with at least 5% weight loss over one year ranged from 18% to 24%. CONCLUSIONS Determining the best method to assess weight using EHR data can be computationally demanding. Our results suggest that for many studies, applying simple cut-offs that require fewer computing resources and are easier to understand may be sufficient. We present guidelines for situations where more complex approaches may be warranted.