Assessing quality and agreement of structured data in automatic versus manual abstraction of the electronic health record for a clinical epidemiology study

Objective We evaluate data agreement between an electronic health record (EHR) sample abstracted by automated characterization with a standard abstracted by manual review. Study Design and Setting We obtain data for an epidemiology cohort study using standard manual abstraction of the EHR and automated identification of the same patients using a structured algorithm to query the EHR. Summary measures of agreement (e.g., Cohen’s kappa) are reported for 12 variables commonly used in epidemiological studies. Results Best agreement between abstraction methods is observed among demographic characteristics such as age, sex, and race, and for positive history of disease. Poor agreement is found in missing data and negative history, suggesting potential impact for researchers using automated EHR characterization. EHR data quality depends upon providers, who may be influenced by both institutional and federal government documentation guidelines. Conclusion Automated EHR abstraction discrepancies may decrease power and increase bias; therefore, caution is warranted when selecting variables from EHRs for epidemiological study using an automated characterization approach. Validation of automated methods must also continue to advance in sophistication with other technologies, such as machine learning and natural language processing, to extract non-structured data from the EHR, for application to EHR characterization for clinical epidemiology.

Download Full-text

Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data

JAMIA Open ◽

10.1093/jamiaopen/ooz056 ◽

2019 ◽

Vol 2 (4) ◽

pp. 570-579 ◽

Cited By ~ 5

Author(s):

Na Hong ◽

Andrew Wen ◽

Feichen Shen ◽

Sunghwan Sohn ◽

Chen Wang ◽

...

Keyword(s):

Electronic Health Record ◽

Language Processing ◽

Clinical Data ◽

Large Scale ◽

Structured Data ◽

Health Record ◽

Data Normalization ◽

Electronic Health Record Data ◽

Electronic Health ◽

Clinical Resource

Abstract Objective To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory). Conclusion We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.

Download Full-text

Automated Identification of Potential Candidates for Human Immunodeficiency Virus Pre-exposure Prophylaxis Using Electronic Health Record Data

Open Forum Infectious Diseases ◽

10.1093/ofid/ofw194.63 ◽

2016 ◽

Vol 3 (suppl_1) ◽

Cited By ~ 1

Author(s):

Douglas Krakower ◽

Susan Gruber ◽

John T. Menchaca ◽

Judith C. Maro ◽

Noelle Cocoros ◽

...

Keyword(s):

Human Immunodeficiency Virus ◽

Electronic Health Record ◽

Health Record ◽

Automated Identification ◽

Electronic Health Record Data ◽

Immunodeficiency Virus ◽

Record Data ◽

Electronic Health ◽

Exposure Prophylaxis

Download Full-text

Enhancing goals of care communication by oncologists using a pathway-based intervention.

Journal of Clinical Oncology ◽

10.1200/jco.2020.39.28_suppl.324 ◽

2021 ◽

Vol 39 (28_suppl) ◽

pp. 324-324

Author(s):

Isaac S. Chua ◽

Elise Tarbi ◽

Jocelyn H. Siegel ◽

Kate Sciacca ◽

Anne Kwok ◽

...

Keyword(s):

Electronic Health Record ◽

Advanced Cancer ◽

Language Processing ◽

Clinical Pathways ◽

Free Text ◽

Health Record ◽

Goals Of Care ◽

Progress Notes ◽

Electronic Health ◽

Pilot Sample

324 Background: Delivering goal-concordant care to patients with advanced cancer requires identifying eligible patients who would benefit from goals of care (GOC) conversations; training clinicians how to have these conversations; conducting conversations in a timely manner; and documenting GOC conversations that can be readily accessed by care teams. We used an existing, locally developed electronic cancer care clinical pathways system to guide oncologists toward these conversations. Methods: To identify eligible patients, pathways directors from 12 oncology disease centers identified therapeutic decision nodes for each pathway that corresponded to a predicted life expectancy of ≤1 year. When oncologists selected one of these pre-identified pathways nodes, the decision was captured in a relational database. From these patients, we sought evidence of GOC documentation within the electronic health record by extracting coded data from the advance care planning (ACP) module—a designated area within the electronic health record for clinicians to document GOC conversations. We also used rule-based natural language processing (NLP) to capture free text GOC documentation within these same patients’ progress notes. A domain expert reviewed all progress notes identified by NLP to confirm the presence of GOC documentation. Results: In a pilot sample obtained between March 20 and September 25, 2020, we identified a total of 21 pathway nodes conveying a poor prognosis, which represented 91 unique patients with advanced cancer. Among these patients, the mean age was 62 (SD 13.8) years old; 55 (60.4%) patients were female, and 69 (75.8%) were non-Hispanic White. The cancers most represented were thoracic (32 [35.2%]), breast (31 [34.1%]), and head and neck (13 [14.3%]). Within the 3 months leading up to the pathways decision date, a total 62 (68.1%) patients had any GOC documentation. Twenty-one (23.1%) patients had documentation in both the ACP module and NLP-identified progress notes; 5 (5.5%) had documentation in the ACP module only; and 36 (39.6%) had documentation in progress notes only. Twenty-two unique clinicians utilized the ACP module, of which 1 (4.5%) was an oncologist and 21 (95.5%) were palliative care clinicians. Conclusions: Approximately two thirds of patients had any GOC documentation. A total of 26 (28.6%) patients had any GOC documentation in the ACP module, and only 1 oncologist documented using the ACP module, where care teams can most easily retrieve GOC information. These findings provide an important baseline for future quality improvement efforts (e.g., implementing serious illness communications training, increasing support around ACP module utilization, and incorporating behavioral nudges) to enhance oncologists’ ability to conduct and to document timely, high quality GOC conversations.

Download Full-text

Development and Validation of a Natural Language Processing Algorithm to Extract Descriptors of Microbial Keratitis From the Electronic Health Record

Cornea ◽

10.1097/ico.0000000000002755 ◽

2021 ◽

Vol Publish Ahead of Print ◽

Author(s):

Maria A. Woodward ◽

Nenita Maganti ◽

Leslie M. Niziol ◽

Sejal Amin ◽

Andrew Hou ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Language Processing ◽

Processing Algorithm ◽

Health Record ◽

Microbial Keratitis ◽

Electronic Health ◽

Development And Validation ◽

Natural Language Processing Algorithm

Download Full-text

Semantic computational analysis of anticoagulation use in atrial fibrillation from real world data

10.1101/19011643 ◽

2019 ◽

Author(s):

Daniel M. Bean ◽

James Teo ◽

Honghan Wu ◽

Ricardo Oliveira ◽

Raj Patel ◽

...

Keyword(s):

Atrial Fibrillation ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Open Source ◽

Language Processing ◽

Risk Scores ◽

Free Text ◽

Health Record ◽

Electronic Health

AbstractAtrial fibrillation (AF) is the most common arrhythmia and significantly increases stroke risk. This risk is effectively managed by oral anticoagulation. Recent studies using national registry data indicate increased use of anticoagulation resulting from changes in guidelines and the availability of newer drugs.The aim of this study is to develop and validate an open source risk scoring pipeline for free-text electronic health record data using natural language processing.AF patients discharged from 1st January 2011 to 1st October 2017 were identified from discharge summaries (N=10,030, 64.6% male, average age 75.3 ± 12.3 years). A natural language processing pipeline was developed to identify risk factors in clinical text and calculate risk for ischaemic stroke (CHA2DS2-VASc) and bleeding (HAS-BLED). Scores were validated vs two independent experts for 40 patients.Automatic risk scores were in strong agreement with the two independent experts for CHA2DS2-VASc (average kappa 0.78 vs experts, compared to 0.85 between experts). Agreement was lower for HAS-BLED (average kappa 0.54 vs experts, compared to 0.74 between experts).In high-risk patients (CHA2DS2-VASc ≥2) OAC use has increased significantly over the last 7 years, driven by the availability of DOACs and the transitioning of patients from AP medication alone to OAC. Factors independently associated with OAC use included components of the CHA2DS2-VASc and HAS-BLED scores as well as discharging specialty and frailty. OAC use was highest in patients discharged under cardiology (69%).Electronic health record text can be used for automatic calculation of clinical risk scores at scale. Open source tools are available today for this task but require further validation. Analysis of routinely-collected EHR data can replicate findings from large-scale curated registries.

Download Full-text

A dynamic reaction picklist for improving allergy reaction documentation in the electronic health record

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa042 ◽

2020 ◽

Vol 27 (6) ◽

pp. 917-923

Author(s):

Liqin Wang ◽

Suzanne V Blackley ◽

Kimberly G Blumenthal ◽

Sharmitha Yerneni ◽

Foster R Goss ◽

...

Keyword(s):

Electronic Health Record ◽

Language Processing ◽

Clinical Decision ◽

Free Text ◽

Health Record ◽

Dynamic Reaction ◽

Value Set ◽

Expert Review ◽

Electronic Health ◽

Development And Validation

Abstract Objective Incomplete and static reaction picklists in the allergy module led to free-text and missing entries that inhibit the clinical decision support intended to prevent adverse drug reactions. We developed a novel, data-driven, “dynamic” reaction picklist to improve allergy documentation in the electronic health record (EHR). Materials and Methods We split 3 decades of allergy entries in the EHR of a large Massachusetts healthcare system into development and validation datasets. We consolidated duplicate allergens and those with the same ingredients or allergen groups. We created a reaction value set via expert review of a previously developed value set and then applied natural language processing to reconcile reactions from structured and free-text entries. Three association rule-mining measures were used to develop a comprehensive reaction picklist dynamically ranked by allergen. The dynamic picklist was assessed using recall at top k suggested reactions, comparing performance to the static picklist. Results The modified reaction value set contained 490 reaction concepts. Among 4 234 327 allergy entries collected, 7463 unique consolidated allergens and 469 unique reactions were identified. Of the 3 dynamic reaction picklists developed, the 1 with the optimal ranking achieved recalls of 0.632, 0.763, and 0.822 at the top 5, 10, and 15, respectively, significantly outperforming the static reaction picklist ranked by reaction frequency. Conclusion The dynamic reaction picklist developed using EHR data and a statistical measure was superior to the static picklist and suggested proper reactions for allergy documentation. Further studies might evaluate the usability and impact on allergy documentation in the EHR.

Download Full-text

Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials

Journal of Cardiovascular Translational Research ◽

10.1007/s12265-017-9752-2 ◽

2017 ◽

Vol 10 (3) ◽

pp. 313-321 ◽

Cited By ~ 17

Author(s):

Siddhartha R. Jonnalagadda ◽

Abhishek K. Adupa ◽

Ravi P. Garg ◽

Jessica Corona-Cox ◽

Sanjiv J. Shah

Keyword(s):

Clinical Trials ◽

Text Mining ◽

Electronic Health Record ◽

Information Extraction ◽

Health Record ◽

Automated Identification ◽

Electronic Health

Download Full-text

Performance of a Natural Language Processing Method to Extract Stone Composition From the Electronic Health Record

Urology ◽

10.1016/j.urology.2019.07.007 ◽

2019 ◽

Vol 132 ◽

pp. 56-62 ◽

Cited By ~ 1

Author(s):

Cosmin A. Bejan ◽

Daniel J. Lee ◽

Yaomin Xu ◽

Ryan S. Hsi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Language Processing ◽

Processing Method ◽

Health Record ◽

Stone Composition ◽

Electronic Health

Download Full-text

The role of electronic records in reporting adverse drug reactions.

Journal of Clinical Oncology ◽

10.1200/jco.2012.30.34_suppl.309 ◽

2012 ◽

Vol 30 (34_suppl) ◽

pp. 309-309

Author(s):

Alanna M. Poirier ◽

Paul Nachowicz ◽

Subhasis Misra

Keyword(s):

Adverse Drug Reaction ◽

Drug Reaction ◽

Electronic Health Record ◽

Adverse Drug Reactions ◽

Improve Patient Safety ◽

Cancer Center ◽

Health Record ◽

Drug Reactions ◽

History Of ◽

Electronic Health

309 Background: The Pharmacy and Therapeutics committee at a regional cancer center is responsible to report and trend existing adverse drug reactions. The electronic health record did not have an option to document the history of an event or have an alert function if a medication was re-ordered. The frequency of documented adverse drug reactions did not correlate to what was being observed on the units with the use of a paper document. Methods: InAugust 2010 a Lean Six Sigma project was initiated to improve adverse drug reaction reporting. An adverse drug reaction document along with standard work instructions was completed by March 2011. A report was built in the electronic health record and a computer based learning module was created and rolled out to clinical staff by October 2011. Results: The turn-around time in days to document an adverse drug reaction in the patients chart decreased from 6.8 days to 0.7 days. The documented adverse drug reactions increased by 37%; verified by the use of supportive medications. Conclusions: The root cause for under-reporting was attributed to lack of knowledge, process, and automation. The history of an adverse drug reaction can now be viewed and an automatic alert is produced requiring physician acknowledgement decreasing the chance of repeated discomfort or harm to the patient. Adverse drug reaction documentation can be retrieved within 24 hours, analyzed, trended, and used for educational purposes to improve patient safety. [Table: see text]

Download Full-text