scholarly journals Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-automated Simulation Based on the LeoPARDS Trial

2019 ◽  
Author(s):  
Hegler Tissot ◽  
Anoop Shah ◽  
Ruth Agbakoba ◽  
Amos Folarin ◽  
Luis Romao ◽  
...  

AbstractClinical trials often fail on recruiting an adequate number of appropriate patients. Identifying eligible trial participants is a resource-intensive task when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records has been explored as a way of identifying trial participants, but much of the information is in unstructured free text rather than a computable form. We developed an electronic health record pipeline that combines structured electronic health record data with free text in order to simulate recruitment into the LeoPARDS trial. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared the set of patients identified by our approach with those actually screened and recruited for the trial. We manually reviewed clinical records for a random sample of additional patients identified by the algorithm but not identified for screening in the original trial. Our approach identified 308 patients, of whom 208 were screened in the actual trial. We identified all 40 patients with CCHIC data available who were actually recruited to LeoPARDS in our centre. The algorithm identified 96 patients on the same day as manual screening and 62 patients one or two days earlier. Analysis of electronic health records incorporating natural language processing tools could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage. If implemented in real-time this could improve the efficiency of clinical trial recruitment.

2021 ◽  
Vol 39 (28_suppl) ◽  
pp. 324-324
Author(s):  
Isaac S. Chua ◽  
Elise Tarbi ◽  
Jocelyn H. Siegel ◽  
Kate Sciacca ◽  
Anne Kwok ◽  
...  

324 Background: Delivering goal-concordant care to patients with advanced cancer requires identifying eligible patients who would benefit from goals of care (GOC) conversations; training clinicians how to have these conversations; conducting conversations in a timely manner; and documenting GOC conversations that can be readily accessed by care teams. We used an existing, locally developed electronic cancer care clinical pathways system to guide oncologists toward these conversations. Methods: To identify eligible patients, pathways directors from 12 oncology disease centers identified therapeutic decision nodes for each pathway that corresponded to a predicted life expectancy of ≤1 year. When oncologists selected one of these pre-identified pathways nodes, the decision was captured in a relational database. From these patients, we sought evidence of GOC documentation within the electronic health record by extracting coded data from the advance care planning (ACP) module—a designated area within the electronic health record for clinicians to document GOC conversations. We also used rule-based natural language processing (NLP) to capture free text GOC documentation within these same patients’ progress notes. A domain expert reviewed all progress notes identified by NLP to confirm the presence of GOC documentation. Results: In a pilot sample obtained between March 20 and September 25, 2020, we identified a total of 21 pathway nodes conveying a poor prognosis, which represented 91 unique patients with advanced cancer. Among these patients, the mean age was 62 (SD 13.8) years old; 55 (60.4%) patients were female, and 69 (75.8%) were non-Hispanic White. The cancers most represented were thoracic (32 [35.2%]), breast (31 [34.1%]), and head and neck (13 [14.3%]). Within the 3 months leading up to the pathways decision date, a total 62 (68.1%) patients had any GOC documentation. Twenty-one (23.1%) patients had documentation in both the ACP module and NLP-identified progress notes; 5 (5.5%) had documentation in the ACP module only; and 36 (39.6%) had documentation in progress notes only. Twenty-two unique clinicians utilized the ACP module, of which 1 (4.5%) was an oncologist and 21 (95.5%) were palliative care clinicians. Conclusions: Approximately two thirds of patients had any GOC documentation. A total of 26 (28.6%) patients had any GOC documentation in the ACP module, and only 1 oncologist documented using the ACP module, where care teams can most easily retrieve GOC information. These findings provide an important baseline for future quality improvement efforts (e.g., implementing serious illness communications training, increasing support around ACP module utilization, and incorporating behavioral nudges) to enhance oncologists’ ability to conduct and to document timely, high quality GOC conversations.


2019 ◽  
Author(s):  
Daniel M. Bean ◽  
James Teo ◽  
Honghan Wu ◽  
Ricardo Oliveira ◽  
Raj Patel ◽  
...  

AbstractAtrial fibrillation (AF) is the most common arrhythmia and significantly increases stroke risk. This risk is effectively managed by oral anticoagulation. Recent studies using national registry data indicate increased use of anticoagulation resulting from changes in guidelines and the availability of newer drugs.The aim of this study is to develop and validate an open source risk scoring pipeline for free-text electronic health record data using natural language processing.AF patients discharged from 1st January 2011 to 1st October 2017 were identified from discharge summaries (N=10,030, 64.6% male, average age 75.3 ± 12.3 years). A natural language processing pipeline was developed to identify risk factors in clinical text and calculate risk for ischaemic stroke (CHA2DS2-VASc) and bleeding (HAS-BLED). Scores were validated vs two independent experts for 40 patients.Automatic risk scores were in strong agreement with the two independent experts for CHA2DS2-VASc (average kappa 0.78 vs experts, compared to 0.85 between experts). Agreement was lower for HAS-BLED (average kappa 0.54 vs experts, compared to 0.74 between experts).In high-risk patients (CHA2DS2-VASc ≥2) OAC use has increased significantly over the last 7 years, driven by the availability of DOACs and the transitioning of patients from AP medication alone to OAC. Factors independently associated with OAC use included components of the CHA2DS2-VASc and HAS-BLED scores as well as discharging specialty and frailty. OAC use was highest in patients discharged under cardiology (69%).Electronic health record text can be used for automatic calculation of clinical risk scores at scale. Open source tools are available today for this task but require further validation. Analysis of routinely-collected EHR data can replicate findings from large-scale curated registries.


2020 ◽  
Vol 27 (6) ◽  
pp. 917-923
Author(s):  
Liqin Wang ◽  
Suzanne V Blackley ◽  
Kimberly G Blumenthal ◽  
Sharmitha Yerneni ◽  
Foster R Goss ◽  
...  

Abstract Objective Incomplete and static reaction picklists in the allergy module led to free-text and missing entries that inhibit the clinical decision support intended to prevent adverse drug reactions. We developed a novel, data-driven, “dynamic” reaction picklist to improve allergy documentation in the electronic health record (EHR). Materials and Methods We split 3 decades of allergy entries in the EHR of a large Massachusetts healthcare system into development and validation datasets. We consolidated duplicate allergens and those with the same ingredients or allergen groups. We created a reaction value set via expert review of a previously developed value set and then applied natural language processing to reconcile reactions from structured and free-text entries. Three association rule-mining measures were used to develop a comprehensive reaction picklist dynamically ranked by allergen. The dynamic picklist was assessed using recall at top k suggested reactions, comparing performance to the static picklist. Results The modified reaction value set contained 490 reaction concepts. Among 4 234 327 allergy entries collected, 7463 unique consolidated allergens and 469 unique reactions were identified. Of the 3 dynamic reaction picklists developed, the 1 with the optimal ranking achieved recalls of 0.632, 0.763, and 0.822 at the top 5, 10, and 15, respectively, significantly outperforming the static reaction picklist ranked by reaction frequency. Conclusion The dynamic reaction picklist developed using EHR data and a statistical measure was superior to the static picklist and suggested proper reactions for allergy documentation. Further studies might evaluate the usability and impact on allergy documentation in the EHR.


2015 ◽  
Vol 22 (e1) ◽  
pp. e141-e150 ◽  
Author(s):  
Riccardo Miotto ◽  
Chunhua Weng

Abstract Objective To develop a cost-effective, case-based reasoning framework for clinical research eligibility screening by only reusing the electronic health records (EHRs) of minimal enrolled participants to represent the target patient for each trial under consideration. Materials and Methods The EHR data—specifically diagnosis, medications, laboratory results, and clinical notes—of known clinical trial participants were aggregated to profile the “target patient” for a trial, which was used to discover new eligible patients for that trial. The EHR data of unseen patients were matched to this “target patient” to determine their relevance to the trial; the higher the relevance, the more likely the patient was eligible. Relevance scores were a weighted linear combination of cosine similarities computed over individual EHR data types. For evaluation, we identified 262 participants of 13 diversified clinical trials conducted at Columbia University as our gold standard. We ran a 2-fold cross validation with half of the participants used for training and the other half used for testing along with other 30 000 patients selected at random from our clinical database. We performed binary classification and ranking experiments. Results The overall area under the ROC curve for classification was 0.95, enabling the highlight of eligible patients with good precision. Ranking showed satisfactory results especially at the top of the recommended list, with each trial having at least one eligible patient in the top five positions. Conclusions This relevance-based method can potentially be used to identify eligible patients for clinical trials by processing patient EHR data alone without parsing free-text eligibility criteria, and shows promise of efficient “case-based reasoning” modeled only on minimal trial participants.


2019 ◽  
Vol 26 (4) ◽  
pp. 364-379 ◽  
Author(s):  
Theresa A Koleck ◽  
Caitlin Dreisbach ◽  
Philip E Bourne ◽  
Suzanne Bakken

Abstract Objective Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. Materials and Methods Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study. Results Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics. Discussion NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves. Conclusion Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.


2020 ◽  
Author(s):  
Ignacio Hernández-Medrano ◽  
Marisa Serrano ◽  
Sergio Collazo ◽  
Ana López-Ballesteros ◽  
Blai Coll ◽  
...  

BACKGROUND Research efforts to develop strategies to effectively identify patients and to reduce the burden of cardiovascular diseases is essential for the future of the health system. Most research studies have used only coded parts of electronic health records (EHRs) for case-detection, obtaining missed data cases and reducing study quality. Incorporating information from free-text into case-detection through Natural Language Processing (NLP) techniques improves research quality. SAVANA was born as an innovating data-driven system based on NLP and big data techniques designed to retrieve prominent biomedical information from narratives clinic notes and to maximize the huge amount of information contained in Spanish EHRs. OBJECTIVE The aim of this work if to assess the performance of SAVANA when identifying concepts within the cardiovascular domain in Spanish EHRs. METHODS SAVANA is a platform for acceleration of clinical research, based on real-time dynamic exploitation of all the information contained in EHRs corpora that uses its own technology (EHRead) to allow unstructured information contained in EHRs to be analysed and expressed by means of medical concepts that contain the most significant information in the text. RESULTS The evaluation corpus consisted of a stratified random sample of patients from 3 Spanish sites. For site 01, the corpus contained a total of 280 mentions of cardiovascular clinical entities, where 249 were correctly identified, obtaining a P=0.93. In site 02, SAVANA correctly detected 53 mentions of cardiovascular entities among 57 annotations, achieving a P=0.98; and in site 03, among 165 manual annotations, 75 were correctly identified, yielding a P= 0.99. CONCLUSIONS This research clearly demonstrates the ability of SAVANA at identifying mentions of atherosclerotic/cardiovascular clinical phenotype in Spanish EHRs, as well as retrieving patients and records related to this pathology.


2016 ◽  
Vol 23 (4) ◽  
pp. 291-303 ◽  
Author(s):  
Kostas Pantazos ◽  
Soren Lauesen ◽  
Soren Lippert

A health record database contains structured data fields that identify the patient, such as patient ID, patient name, e-mail and phone number. These data are fairly easy to de-identify, that is, replace with other identifiers. However, these data also occur in fields with doctors’ free-text notes written in an abbreviated style that cannot be analyzed grammatically. If we replace a word that looks like a name, but isn’t, we degrade readability and medical correctness. If we fail to replace it when we should, we degrade confidentiality. We de-identified an existing Danish electronic health record database, ending up with 323,122 patient health records. We had to invent many methods for de-identifying potential identifiers in the free-text notes. The de-identified health records should be used with caution for statistical purposes because we removed health records that were so special that they couldn’t be de-identified. Furthermore, we distorted geography by replacing zip codes with random zip codes.


Sign in / Sign up

Export Citation Format

Share Document