scholarly journals Immortal time bias for life-long conditions in retrospective observational studies using electronic health records

Author(s):  
Freya Tyrer ◽  
Krishnan Bhaskaran ◽  
Mark J Rutherford

Abstract Background Immortal time bias is common in observational studies but is typically described for pharmacoepidemiology studies where there is a delay between cohort entry and treatment initiation. Methods This study used the Clinical Practice Research Datalink (CPRD) and linked national mortality data in England from 2000–2019 to investigate immortal time bias for a specific life-long condition, intellectual disability. Life expectancy (Chiang’s abridged life table approach) was compared for 33,867 exposed and 980,586 unexposed individuals aged 10+ years using five methods: (1) treating immortal time as observation time; (2) excluding time before date of first exposure diagnosis; (3) matching cohort entry to first exposure diagnosis; (4) excluding time before proxy date of entering first exposure diagnosis (by the physician); and (5) treating exposure as a time-dependent measure. Results When not considered in the design or analysis (Method 1), immortal time bias led to disproportionately high life expectancy for the exposed population during earlier calendar periods (additional years expected to live: 2000–2004: 65.6 [95% CI: 63.6,67.6]; 2005–2009: 59.9 [58.8,60.9]; 2010–2014: 58.0 [57.1,58.9]; 2015–2019: 58.2 [56.8,59.7]). Date of entry of diagnosis (Method 4) was unreliable in this CPRD cohort. The final methods (Method 2, 3 and 5) appeared to solve the main theoretical problem but residual bias may have remained. Conclusions We conclude that immortal time bias is a significant issue for studies of life-long conditions that use electronic health record data and requires careful consideration of how clinical diagnoses are entered onto electronic health record systems.

2013 ◽  
Vol 20 (e1) ◽  
pp. e118-e124 ◽  
Author(s):  
Jason Scott Mathias ◽  
Ankit Agrawal ◽  
Joe Feinglass ◽  
Andrew J Cooper ◽  
David William Baker ◽  
...  

2011 ◽  
Vol 4 (0) ◽  
Author(s):  
Michael Klompas ◽  
Chaim Kirby ◽  
Jason McVetta ◽  
Paul Oppedisano ◽  
John Brownstein ◽  
...  

Author(s):  
José Carlos Ferrão ◽  
Mónica Duarte Oliveira ◽  
Daniel Gartner ◽  
Filipe Janela ◽  
Henrique M. G. Martins

BMJ Open ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. e037405
Author(s):  
Daniel Dedman ◽  
Melissa Cabecinha ◽  
Rachael Williams ◽  
Stephen J W Evans ◽  
Krishnan Bhaskaran ◽  
...  

ObjectiveTo identify observational studies which used data from more than one primary care electronic health record (EHR) database, and summarise key characteristics including: objective and rationale for using multiple data sources; methods used to manage, analyse and (where applicable) combine data; and approaches used to assess and report heterogeneity between data sources.DesignA systematic review of published studies.Data sourcesPubmed and Embase databases were searched using list of named primary care EHR databases; supplementary hand searches of reference list of studies were retained after initial screening.Study selectionObservational studies published between January 2000 and May 2018 were selected, which included at least two different primary care EHR databases.Results6054 studies were identified from database and hand searches, and 109 were included in the final review, the majority published between 2014 and 2018. Included studies used 38 different primary care EHR data sources. Forty-seven studies (44%) were descriptive or methodological. Of 62 analytical studies, 22 (36%) presented separate results from each database, with no attempt to combine them; 29 (48%) combined individual patient data in a one-stage meta-analysis and 21 (34%) combined estimates from each database using two-stage meta-analysis. Discussion and exploration of heterogeneity was inconsistent across studies.ConclusionsComparing patterns and trends in different populations, or in different primary care EHR databases from the same populations, is important and a common objective for multi-database studies. When combining results from several databases using meta-analysis, provision of separate results from each database is helpful for interpretation. We found that these were often missing, particularly for studies using one-stage approaches, which also often lacked details of any statistical adjustment for heterogeneity and/or clustering. For two-stage meta-analysis, a clear rationale should be provided for choice of fixed effect and/or random effects or other models.


Author(s):  
Jeffrey G Klann ◽  
Griffin M Weber ◽  
Hossein Estiri ◽  
Bertrand Moal ◽  
Paul Avillach ◽  
...  

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


2020 ◽  
Vol 41 (S1) ◽  
pp. s39-s39
Author(s):  
Pontus Naucler ◽  
Suzanne D. van der Werff ◽  
John Valik ◽  
Logan Ward ◽  
Anders Ternhag ◽  
...  

Background: Healthcare-associated infection (HAI) surveillance is essential for most infection prevention programs and continuous epidemiological data can be used to inform healthcare personal, allocate resources, and evaluate interventions to prevent HAIs. Many HAI surveillance systems today are based on time-consuming and resource-intensive manual reviews of patient records. The objective of HAI-proactive, a Swedish triple-helix innovation project, is to develop and implement a fully automated HAI surveillance system based on electronic health record data. Furthermore, the project aims to develop machine-learning–based screening algorithms for early prediction of HAI at the individual patient level. Methods: The project is performed with support from Sweden’s Innovation Agency in collaboration among academic, health, and industry partners. Development of rule-based and machine-learning algorithms is performed within a research database, which consists of all electronic health record data from patients admitted to the Karolinska University Hospital. Natural language processing is used for processing free-text medical notes. To validate algorithm performance, manual annotation was performed based on international HAI definitions from the European Center for Disease Prevention and Control, Centers for Disease Control and Prevention, and Sepsis-3 criteria. Currently, the project is building a platform for real-time data access to implement the algorithms within Region Stockholm. Results: The project has developed a rule-based surveillance algorithm for sepsis that continuously monitors patients admitted to the hospital, with a sensitivity of 0.89 (95% CI, 0.85–0.93), a specificity of 0.99 (0.98–0.99), a positive predictive value of 0.88 (0.83–0.93), and a negative predictive value of 0.99 (0.98–0.99). The healthcare-associated urinary tract infection surveillance algorithm, which is based on free-text analysis and negations to define symptoms, had a sensitivity of 0.73 (0.66–0.80) and a positive predictive value of 0.68 (0.61–0.75). The sensitivity and positive predictive value of an algorithm based on significant bacterial growth in urine culture only was 0.99 (0.97–1.00) and 0.39 (0.34–0.44), respectively. The surveillance system detected differences in incidences between hospital wards and over time. Development of surveillance algorithms for pneumonia, catheter-related infections and Clostridioides difficile infections, as well as machine-learning–based models for early prediction, is ongoing. We intend to present results from all algorithms. Conclusions: With access to electronic health record data, we have shown that it is feasible to develop a fully automated HAI surveillance system based on algorithms using both structured data and free text for the main healthcare-associated infections.Funding: Sweden’s Innovation Agency and Stockholm County CouncilDisclosures: None


Sign in / Sign up

Export Citation Format

Share Document