Broadening the reach of the FDA Sentinel system: A roadmap for integrating electronic health record data in a causal analysis framework

AbstractThe Sentinel System is a major component of the United States Food and Drug Administration’s (FDA) approach to active medical product safety surveillance. While Sentinel has historically relied on large quantities of health insurance claims data, leveraging longitudinal electronic health records (EHRs) that contain more detailed clinical information, as structured and unstructured features, may address some of the current gaps in capabilities. We identify key challenges when using EHR data to investigate medical product safety in a scalable and accelerated way, outline potential solutions, and describe the Sentinel Innovation Center’s initiatives to put solutions into practice by expanding and strengthening the existing system with a query-ready, large-scale data infrastructure of linked EHR and claims data. We describe our initiatives in four strategic priority areas: (1) data infrastructure, (2) feature engineering, (3) causal inference, and (4) detection analytics, with the goal of incorporating emerging data science innovations to maximize the utility of EHR data for medical product safety surveillance.

Download Full-text

Assessing the National Prevalence of HIV Screening in the United States using Electronic Health Record Data

Cureus ◽

10.7759/cureus.5043 ◽

2019 ◽

Author(s):

Joshua D Niforatos ◽

Jonathon W Wanta ◽

Emily Durbak ◽

Jacqueline Cavendish ◽

Justin A Yax

Keyword(s):

United States ◽

Electronic Health Record ◽

The United States ◽

Health Record ◽

Hiv Screening ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health ◽

National Prevalence

Download Full-text

Structured Approach for Evaluating Strategies for Cancer Ascertainment Using Large-Scale Electronic Health Record Data

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00072 ◽

2018 ◽

pp. 1-12 ◽

Cited By ~ 6

Author(s):

Ashley Earles ◽

Lin Liu ◽

Ranier Bustamante ◽

Pat Coke ◽

Julie Lynch ◽

...

Keyword(s):

Large Scale ◽

High Sensitivity ◽

Performance Comparison ◽

Administrative Claims ◽

Electronic Health Record Data ◽

Perfect Agreement ◽

Electronic Health ◽

Record Review ◽

Structured Approach ◽

Manual Record

Purpose Cancer ascertainment using large-scale electronic health records is a challenge. Our aim was to propose and apply a structured approach for evaluating multiple candidate approaches for cancer ascertainment using colorectal cancer (CRC) ascertainment within the US Department of Veterans Affairs (VA) as a use case. Methods The proposed approach for evaluating cancer ascertainment strategies includes assessment of individual strategy performance, comparison of agreement across strategies, and review of discordant diagnoses. We applied this approach to compare three strategies for CRC ascertainment within the VA: administrative claims data consisting of International Classification of Diseases, Ninth Revision (ICD9) diagnosis codes; the VA Central Cancer Registry (VACCR); and the newly accessible Oncology Domain, consisting of cases abstracted by local cancer registrars. The study sample consisted of 1,839,043 veterans with index colonoscopy performed from 1999 to 2014. Strategy-specific performance was estimated based on manual record review of 100 candidate CRC cases and 100 colonoscopy controls. Strategies were further compared using Cohen’s κ and focused review of discordant CRC diagnoses. Results A total of 92,197 individuals met at least one CRC definition. All three strategies had high sensitivity and specificity for incident CRC. However, the ICD9-based strategy demonstrated poor positive predictive value (58%). VACCR and Oncology Domain had almost perfect agreement with each other (κ, 0.87) but only moderate agreement with ICD9-based diagnoses (κ, 0.51 and 0.57, respectively). Among discordant cases reviewed, 15% of ICD9-positive but VACCR- or Oncology Domain–negative cases had incident CRC. Conclusion Evaluating novel strategies for identifying cancer requires a structured approach, including validation against manual record review, agreement among candidate strategies, and focused review of discordant findings. Without careful assessment of ascertainment methods, analyses may be subject to bias and limited in clinical impact.

Download Full-text

Utilization of Deep Learning for Subphenotype Identification in Sepsis-Associated Acute Kidney Injury

Clinical Journal of the American Society of Nephrology ◽

10.2215/cjn.09330819 ◽

2020 ◽

Vol 15 (11) ◽

pp. 1557-1565 ◽

Cited By ~ 2

Author(s):

Kumardeep Chaudhary ◽

Akhil Vaid ◽

Áine Duffy ◽

Ishan Paranjpe ◽

Suraj Jaladanki ◽

...

Keyword(s):

Intensive Care ◽

Deep Learning ◽

Medical Information ◽

Tertiary Care ◽

Vital Signs ◽

Kidney Injury ◽

The United States ◽

Care Hospital ◽

Electronic Health Record Data ◽

Electronic Health

Background and objectivesSepsis-associated AKI is a heterogeneous clinical entity. We aimed to agnostically identify sepsis-associated AKI subphenotypes using deep learning on routinely collected data in electronic health records.Design, setting, participants, & measurementsWe used the Medical Information Mart for Intensive Care III database, which consists of electronic health record data from intensive care units in a tertiary care hospital in the United States. We included patients ≥18 years with sepsis who developed AKI within 48 hours of intensive care unit admission. We then used deep learning to utilize all available vital signs, laboratory measurements, and comorbidities to identify subphenotypes. Outcomes were mortality 28 days after AKI and dialysis requirement.ResultsWe identified 4001 patients with sepsis-associated AKI. We utilized 2546 combined features for K-means clustering, identifying three subphenotypes. Subphenotype 1 had 1443 patients, and subphenotype 2 had 1898 patients, whereas subphenotype 3 had 660 patients. Subphenotype 1 had the lowest proportion of liver disease and lowest Simplified Acute Physiology Score II scores compared with subphenotypes 2 and 3. The proportions of patients with CKD were similar between subphenotypes 1 and 3 (15%) but highest in subphenotype 2 (21%). Subphenotype 1 had lower median bilirubin levels, aspartate aminotransferase, and alanine aminotransferase compared with subphenotypes 2 and 3. Patients in subphenotype 1 also had lower median lactate, lactate dehydrogenase, and white blood cell count than patients in subphenotypes 2 and 3. Subphenotype 1 also had lower creatinine and BUN than subphenotypes 2 and 3. Dialysis requirement was lowest in subphenotype 1 (4% versus 7% [subphenotype 2] versus 26% [subphenotype 3]). The mortality 28 days after AKI was lowest in subphenotype 1 (23% versus 35% [subphenotype 2] versus 49% [subphenotype 3]). After adjustment, the adjusted odds ratio for mortality for subphenotype 3, with subphenotype 1 as a reference, was 1.9 (95% confidence interval, 1.5 to 2.4).ConclusionsUtilizing routinely collected laboratory variables, vital signs, and comorbidities, we were able to identify three distinct subphenotypes of sepsis-associated AKI with differing outcomes.

Download Full-text

Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system

Pharmacoepidemiology and Drug Safety ◽

10.1002/pds.2328 ◽

2012 ◽

Vol 21 ◽

pp. 41-49 ◽

Cited By ~ 59

Author(s):

Jeremy A. Rassen ◽

Sebastian Schneeweiss

Keyword(s):

Surveillance System ◽

Propensity Scores ◽

Medical Product ◽

Product Safety ◽

High Dimensional ◽

Safety Surveillance

Download Full-text

Stratifying risk for dementia onset using large‐scale electronic health record data: A retrospective cohort study

Alzheimer s & Dementia ◽

10.1016/j.jalz.2019.09.084 ◽

2020 ◽

Vol 16 (3) ◽

pp. 531-540 ◽

Cited By ~ 2

Author(s):

Thomas H. McCoy ◽

Larry Han ◽

Amelia M. Pellegrini ◽

Rudolph E. Tanzi ◽

Sabina Berretta ◽

...

Keyword(s):

Cohort Study ◽

Electronic Health Record ◽

Retrospective Cohort Study ◽

Retrospective Cohort ◽

Large Scale ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health ◽

Dementia Onset

Download Full-text

Statistical performance of group sequential methods for observational post-licensure medical product safety surveillance: A simulation study

Statistics and Its Interface ◽

10.4310/sii.2012.v5.n4.a1 ◽

2012 ◽

Vol 5 (4) ◽

pp. 381-390 ◽

Cited By ~ 13

Author(s):

Andrea Cook ◽

Lisa Jackson ◽

Jennifer Nelson ◽

Shanshan Zhao

Keyword(s):

Simulation Study ◽

Medical Product ◽

Product Safety ◽

Group Sequential ◽

Sequential Methods ◽

Safety Surveillance ◽

Group Sequential Methods

Download Full-text

Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning

PLoS ONE ◽

10.1371/journal.pone.0256428 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0256428

Author(s):

Aixia Guo ◽

Nikhilesh R. Mazumder ◽

Daniela P. Ladner ◽

Randi E. Foraker

Keyword(s):

Machine Learning ◽

Liver Cirrhosis ◽

Deep Learning ◽

Risk Prediction ◽

The United States ◽

Meld Score ◽

Receiver Operating Curve ◽

Categorical Variables ◽

Electronic Health Record Data ◽

Electronic Health

Objective Liver cirrhosis is a leading cause of death and effects millions of people in the United States. Early mortality prediction among patients with cirrhosis might give healthcare providers more opportunity to effectively treat the condition. We hypothesized that laboratory test results and other related diagnoses would be associated with mortality in this population. Our another assumption was that a deep learning model could outperform the current Model for End Stage Liver disease (MELD) score in predicting mortality. Materials and methods We utilized electronic health record data from 34,575 patients with a diagnosis of cirrhosis from a large medical center to study associations with mortality. Three time-windows of mortality (365 days, 180 days and 90 days) and two cases with different number of variables (all 41 available variables and 4 variables in MELD-NA) were studied. Missing values were imputed using multiple imputation for continuous variables and mode for categorical variables. Deep learning and machine learning algorithms, i.e., deep neural networks (DNN), random forest (RF) and logistic regression (LR) were employed to study the associations between baseline features such as laboratory measurements and diagnoses for each time window by 5-fold cross validation method. Metrics such as area under the receiver operating curve (AUC), overall accuracy, sensitivity, and specificity were used to evaluate models. Results Performance of models comprising all variables outperformed those with 4 MELD-NA variables for all prediction cases and the DNN model outperformed the LR and RF models. For example, the DNN model achieved an AUC of 0.88, 0.86, and 0.85 for 90, 180, and 365-day mortality respectively as compared to the MELD score, which resulted in corresponding AUCs of 0.81, 0.79, and 0.76 for the same instances. The DNN and LR models had a significantly better f1 score compared to MELD at all time points examined. Conclusion Other variables such as alkaline phosphatase, alanine aminotransferase, and hemoglobin were also top informative features besides the 4 MELD-Na variables. Machine learning and deep learning models outperformed the current standard of risk prediction among patients with cirrhosis. Advanced informatics techniques showed promise for risk prediction in patients with cirrhosis.

Download Full-text

Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data

JAMIA Open ◽

10.1093/jamiaopen/ooz056 ◽

2019 ◽

Vol 2 (4) ◽

pp. 570-579 ◽

Cited By ~ 5

Author(s):

Na Hong ◽

Andrew Wen ◽

Feichen Shen ◽

Sunghwan Sohn ◽

Chen Wang ◽

...

Keyword(s):

Electronic Health Record ◽

Language Processing ◽

Clinical Data ◽

Large Scale ◽

Structured Data ◽

Health Record ◽

Data Normalization ◽

Electronic Health Record Data ◽

Electronic Health ◽

Clinical Resource

Abstract Objective To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory). Conclusion We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.

Download Full-text

ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data

JAMIA Open ◽

10.1093/jamiaopen/ooy059 ◽

2019 ◽

Vol 2 (1) ◽

pp. 10-14 ◽

Cited By ~ 6

Author(s):

Benjamin S Glicksberg ◽

Boris Oskotsky ◽

Nicholas Giangreco ◽

Phyllis M Thangaraj ◽

Vivek Rudrapatna ◽

...

Keyword(s):

Electronic Health Record ◽

Data Science ◽

Demographic Data ◽

R Package ◽

Common Data Model ◽

Health Record ◽

Massachusetts Institute Of Technology ◽

Data Types ◽

Electronic Health Record Data ◽

Electronic Health

Abstract Objectives Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. Materials and methods We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. Results ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. Conclusion ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu).

Download Full-text