Unlocking the Potential of Electronic Health Records for Health Research

Electronic health records (EHRs), originally designed to facilitate health care delivery, are becoming a valuable data source for health research. EHR systems have two components: the front end, where the data is entered by healthcare workers including physicians and nurses, and the back-end electronic data warehouse where the data is stored in a relational database. EHR data elements can be of many types, which can be categorized as structured, unstructured free-text, and imaging data. The Sunrise Clinical Manager (SCM) EHR is one example of an inpatient EHR system, which covers the city of Calgary (Alberta, Canada). This system, under the management of Alberta Health Services, is now being explored for research use. The purpose of the present paper is to describe the SCM EHR for research purposes, showing how this generalizes to EHRs in general. We further discuss advantages, challenges (e.g. potential bias and data quality issues), and analytical capacities and requirements associated with using EHRs.

Download Full-text

Documentation of social determinants in electronic health records with and without standardized terminologies: A comparative study

Proceedings of Singapore Healthcare ◽

10.1177/2010105818785641 ◽

2018 ◽

Vol 28 (1) ◽

pp. 39-47 ◽

Cited By ~ 1

Author(s):

Karen A Monsen ◽

Joyce M Rudenick ◽

Nicole Kapinos ◽

Kathryn Warmbold ◽

Siobhan K McMahon ◽

...

Keyword(s):

Electronic Health Records ◽

Free Text ◽

Snomed Ct ◽

Health Records ◽

Behavioral Determinants ◽

Omaha System ◽

Standardized Terminology ◽

Electronic Health ◽

Data Elements ◽

Improve Health

Background: Electronic health records (EHRs) are a promising new source of population health data that may improve health outcomes. However, little is known about the extent to which social and behavioral determinants of health (SBDH) are currently documented in EHRs, including how SBDH are documented, and by whom. Standardized nursing terminologies have been developed to assess and document SBDH. Objective: We examined the documentation of SBDH in EHRs with and without standardized nursing terminologies. Methods: We carried out a review of the literature for SBDH phrases organized by topic, which were used for analyses. Key informant interviews were conducted regarding SBDH phrases. Results: In nine EHRs (six acute care, three community care) 107 SBDH phrases were documented using free text, structured text, and standardized terminologies in diverse screens and by multiple clinicians, admitting personnel, and other staff. SBDH phrases were documented using one of three standardized terminologies ( N = average number of phrases per terminology per EHR): ICD-9/10 ( N = 1); SNOMED CT ( N = 1); Omaha System ( N = 79). Most often, standardized terminology data were documented by nurses or other clinical staff versus receptionists or other non-clinical personnel. Documentation ‘unknown’ differed significantly between EHRs with and without the Omaha System (mean = 26.0 (standard deviation (SD) = 8.7) versus mean = 74.5 (SD = 16.5)) ( p = .005). SBDH documentation in EHRs differed based on the presence of a nursing terminology. Conclusions: The Omaha System enabled a more comprehensive, holistic assessment and documentation of interoperable SBDH data. Further research is needed to determine SBDH data elements that are needed across settings, the uses of SBDH data in practice, and to examine patient perspectives related to SBDH assessments.

Download Full-text

1665. Using Electronic Health Records to Describe TB in Community Health Settings: a Cohort Analysis in a Large Safety-Net Population

Open Forum Infectious Diseases ◽

10.1093/ofid/ofaa439.1843 ◽

2020 ◽

Vol 7 (Supplement_1) ◽

pp. S819-S820

Author(s):

Jonathan Todd ◽

Jon Puro ◽

Matthew Jones ◽

Jee Oakley ◽

Laura A Vonnahme ◽

...

Keyword(s):

United States ◽

Risk Factors ◽

Electronic Health Records ◽

Safety Net ◽

The United States ◽

Research Network ◽

Health Records ◽

Treatment Regimens ◽

Data Source ◽

Electronic Health

Abstract Background Over 80% of tuberculosis (TB) cases in the United States are attributed to reactivation of latent TB infection (LTBI). Eliminating TB in the United States requires expanding identification and treatment of LTBI. Centralized electronic health records (EHRs) are an unexplored data source to identify persons with LTBI. We explored EHR data to evaluate TB and LTBI screening and diagnoses within OCHIN, Inc., a U.S. practice-based research network with a high proportion of Federally Qualified Health Centers. Methods From the EHRs of patients who had an encounter at an OCHIN member clinic between January 1, 2012 and December 31, 2016, we extracted demographic variables, TB risk factors, TB screening tests, International Classification of Diseases (ICD) 9 and 10 codes, and treatment regimens. Based on test results, ICD codes, and treatment regimens, we developed a novel algorithm to classify patient records into LTBI categories: definite, probable or possible. We used multivariable logistic regression, with a referent group of all cohort patients not classified as having LTBI or TB, to identify associations between TB risk factors and LTBI. Results Among 2,190,686 patients, 6.9% (n=151,195) had a TB screening test; among those, 8% tested positive. Non-U.S. –born or non-English–speaking persons comprised 24% of our cohort; 11% were tested for TB infection, and 14% had a positive test. Risk factors in the multivariable model significantly associated with being classified as having LTBI included preferring non-English language (adjusted odds ratio [aOR] 4.20, 95% confidence interval [CI] 4.09–4.32); non-Hispanic Asian (aOR 5.17, 95% CI 4.94–5.40), non-Hispanic black (aOR 3.02, 95% CI 2.91–3.13), or Native Hawaiian/other Pacific Islander (aOR 3.35, 95% CI 2.92–3.84) race; and HIV infection (aOR 3.09, 95% CI 2.84–3.35). Conclusion This study demonstrates the utility of EHR data for understanding TB screening practices and as an important data source that can be used to enhance public health surveillance of LTBI prevalence. Increasing screening among high-risk populations remains an important step toward eliminating TB in the United States. These results underscore the importance of offering TB screening in non-U.S.–born populations. Disclosures All Authors: No reported disclosures

Download Full-text

A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data

Applied Clinical Informatics ◽

10.1055/s-0041-1733846 ◽

2021 ◽

Vol 12 (04) ◽

pp. 816-825

Author(s):

Yingcheng Sun ◽

Alex Butler ◽

Ibrahim Diallo ◽

Jae Hyun Kim ◽

Casey Ta ◽

...

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Electronic Health Records ◽

The United States ◽

Design Stage ◽

Common Data Model ◽

Free Text ◽

Eligibility Criteria ◽

Health Records ◽

Electronic Health

Abstract Background Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population. Objectives This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage. Methods We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial. Results We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness. Conclusion This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.

Download Full-text

Using Electronic Health Records for Population Health Research: A Review of Methods and Applications

Annual Review of Public Health ◽

10.1146/annurev-publhealth-032315-021353 ◽

2016 ◽

Vol 37 (1) ◽

pp. 61-81 ◽

Cited By ~ 135

Author(s):

Joan A. Casey ◽

Brian S. Schwartz ◽

Walter F. Stewart ◽

Nancy E. Adler

Keyword(s):

Electronic Health Records ◽

Population Health ◽

Health Research ◽

Health Records ◽

Electronic Health

Download Full-text

Validity of acute cardiovascular outcome diagnoses in European electronic health records: a systematic review protocol

BMJ Open ◽

10.1136/bmjopen-2019-031373 ◽

2019 ◽

Vol 9 (10) ◽

pp. e031373 ◽

Cited By ~ 1

Author(s):

Jennifer Anne Davidson ◽

Amitava Banerjee ◽

Rutendo Muzambi ◽

Liam Smeeth ◽

Charlotte Warren-Gash

Keyword(s):

Systematic Review ◽

Electronic Health Records ◽

Predictive Value ◽

Grey Literature ◽

Cochrane Library ◽

Free Text ◽

Health Records ◽

Coronary Syndrome ◽

Validation Measure ◽

Electronic Health

IntroductionCardiovascular diseases (CVDs) are among the leading causes of death globally. Electronic health records (EHRs) provide a rich data source for research on CVD risk factors, treatments and outcomes. Researchers must be confident in the validity of diagnoses in EHRs, particularly when diagnosis definitions and use of EHRs change over time. Our systematic review provides an up-to-date appraisal of the validity of stroke, acute coronary syndrome (ACS) and heart failure (HF) diagnoses in European primary and secondary care EHRs.Methods and analysisWe will systematically review the published and grey literature to identify studies validating diagnoses of stroke, ACS and HF in European EHRs. MEDLINE, EMBASE, SCOPUS, Web of Science, Cochrane Library, OpenGrey and EThOS will be searched from the dates of inception to April 2019. A prespecified search strategy of subject headings and free-text terms in the title and abstract will be used. Two reviewers will independently screen titles and abstracts to identify eligible studies, followed by full-text review. We require studies to compare clinical codes with a suitable reference standard. Additionally, at least one validation measure (sensitivity, specificity, positive predictive value or negative predictive value) or raw data, for the calculation of a validation measure, is necessary. We will then extract data from the eligible studies using standardised tables and assess risk of bias in individual studies using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. Data will be synthesised into a narrative format and heterogeneity assessed. Meta-analysis will be considered when a sufficient number of homogeneous studies are available. The overall quality of evidence will be assessed using the Grading of Recommendations, Assessment, Development and Evaluation tool.Ethics and disseminationThis is a systematic review, so it does not require ethical approval. Our results will be submitted for peer-review publication.PROSPERO registration numberCRD42019123898

Download Full-text

De-identifying Free Text of Japanese Dummy Electronic Health Records

10.18653/v1/w18-5608 ◽

2018 ◽

Author(s):

Kohei Kajiyama ◽

Hiromasa Horiguchi ◽

Takashi Okumura ◽

Mizuki Morita ◽

Yoshinobu Kano

Keyword(s):

Electronic Health Records ◽

Free Text ◽

Health Records ◽

Electronic Health

Download Full-text

Abstract MP21: Feasibility of Electronic Health Records-based community surveillance of cardiovascular disease: Findings from the Atherosclerosis Risk in Communities Study.

Circulation ◽

10.1161/circ.137.suppl_1.mp21 ◽

2018 ◽

Vol 137 (suppl_1) ◽

Author(s):

Brittany M Bogle ◽

Wayne D Rosamond ◽

Aaron R Folsom ◽

Paul Sorlie ◽

Elsayed Z Soliman ◽

...

Keyword(s):

Cardiovascular Disease ◽

Electronic Health Records ◽

Cardiac Biomarkers ◽

Free Text ◽

Health Records ◽

Efficient System ◽

Atherosclerosis Risk In Communities ◽

Atherosclerosis Risk ◽

Electronic Health ◽

Aric Study

Background: Accurate community surveillance of cardiovascular disease requires hospital record abstraction, which is typically a manual process. The costly and time-intensive nature of manual abstraction precludes its use on a regional or national scale in the US. Whether an efficient system can accurately reproduce traditional community surveillance methods by processing electronic health records (EHRs) has not been established. Objective: We sought to develop and test an EHR-based system to reproduce abstraction and classification procedures for acute myocardial infarction (MI) as defined by the Atherosclerosis Risk in Communities (ARIC) Study. Methods: Records from hospitalizations in 2014 within ARIC community surveillance areas were sampled using a broad set of ICD discharge codes likely to harbor MI. These records were manually abstracted by ARIC study personnel and used to classify MI according to ARIC protocols. We requested EHRs in a unified data structure for the same hospitalizations at 6 hospitals and built programs to convert free text and structured data into the ARIC criteria elements necessary for MI classification. Per ARIC protocol, MI was classified based on cardiac biomarkers, cardiac pain, and Minnesota-coded electrocardiogram abnormalities. We compared MI classified from manually abstracted data to (1) EHR-based classification and (2) final ICD-9 coded discharge diagnoses (410-414). Results: These preliminary results are based on hospitalizations from 1 hospital. Of 684 hospitalizations, 355 qualified for full manual abstraction; 83 (23%) of these were classified as definite MI and 78 (22%) as probable MI. Our EHR-based abstraction is sensitive (>75%) and highly specific (>83%) in classifying ARIC-defined definite MI and definite or probable MI (Table). Conclusions: Our results support the potential of a process to extract comprehensive sets of data elements from EHR from different hospitals, with completeness and accuracy sufficient for a standardized definition of hospitalized MI.

Download Full-text

Managing and Integrating Diverse Sources of Urban Data

Urban Public Health ◽

10.1093/oso/9780190885304.003.0008 ◽

2020 ◽

pp. 181-196

Author(s):

Gina S. Lovasi ◽

Steve Melly

Keyword(s):

Electronic Health Records ◽

Data Management ◽

Urban Health ◽

Health Research ◽

Health Records ◽

Electronic Health ◽

Urban Data

This chapter serves to highlight strategies and challenges in bringing together multiple types of geographically referenced data for urban health research, such as linkage of electronic health records to area-based characteristics. The discussion highlights practical considerations that arise in data management, as well as strategies safeguard confidentiality.

Download Full-text

Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records

Rheumatology ◽

10.1093/rheumatology/kez375 ◽

2019 ◽

Vol 59 (5) ◽

pp. 1059-1065 ◽

Cited By ~ 1

Author(s):

Sizheng Steven Zhao ◽

Chuan Hong ◽

Tianrun Cai ◽

Chang Xu ◽

Jie Huang ◽

...

Keyword(s):

Electronic Health Records ◽

Predictive Value ◽

Area Under The Curve ◽

Free Text ◽

Text Data ◽

Health Records ◽

Disease Concepts ◽

Icd Codes ◽

Electronic Health

Abstract Objectives To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. Methods An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. Results NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87). Conclusion Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.

Download Full-text

From screening to ascertainment of the primary outcome using electronic health records: Challenges in the STRIDE trial

Clinical Trials ◽

10.1177/1740774520920898 ◽

2020 ◽

Vol 17 (4) ◽

pp. 346-350

Author(s):

Denise Esserman

Keyword(s):

Electronic Health Records ◽

Electronic Health Record ◽

Health Care Systems ◽

Health Record ◽

Electronic Health Record Data ◽

Health Records ◽

Record Data ◽

Care Systems ◽

Electronic Health ◽

Data Elements

Electronic health record data are a rich resource and can be utilized to answer a wealth of research questions. It is important when using electronic health record data in clinical trials that systems be put in place and vetted prior to enrollment to ensure data elements can be collected consistently across all health care systems. It is often overlooked how something conceptualized on paper (e.g. use of the electronic health record in a study) can be difficult to implement in practice. This article discusses some of the challenges in using electronic health records in the conduct of the STRIDE (Strategies to Reduce Injuries and Develop Confidence in Elders) trial, how we handled those challenges, and the lessons we learned for the conduct of future trials looking to employ the electronic health record.

Download Full-text