scholarly journals 1358. Using natural language processing to optimize case ascertainment of acute otitis media in a large, state-wide pediatric practice network

2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S690-S691
Author(s):  
Joshua C Herigon ◽  
Amir Kimia ◽  
Marvin Harper

Abstract Background Antibiotics are the most commonly prescribed drugs for children and frequently inappropriately prescribed. Outpatient antimicrobial stewardship interventions aim to reduce inappropriate antibiotic use. Previous work has relied on diagnosis coding for case identification which may be inaccurate. In this study, we sought to develop automated methods for analyzing note text to identify cases of acute otitis media (AOM) based on clinical documentation. Methods We conducted a cross-sectional retrospective chart review and sampled encounters from 7/1/2018 – 6/30/2019 for patients < 5 years old presenting for a problem-focused visit. Complete note text and limited structured data were extracted for 12 randomly selected weekdays (one from each month during the study period). An additional weekday was randomly selected for validation. The primary outcome was correctly identifying encounters where AOM was present. Human review was considered the “gold standard” and was compared to ICD codes, a natural language processing (NLP) model, and a recursive partitioning (RP) model. Results A total of 2,724 encounters were included in the training cohort and 793 in the validation cohort. ICD codes and NLP had good performance overall with sensitivity 91.2% and 93.1% respectively in the training cohort. However, NLP had a significant drop-off in performance in the validation cohort (sensitivity: 83.4%). The RP model had the highest sensitivity (97.2% training cohort; 94.1% validation cohort) out of the 3 methods. Figure 1. Details of encounters included in the training and validation cohorts. Table 1. Performance of ICD coding, a natural language processing (NLP) model, and a recursive partitioning (RP) model for identifying cases of acute otitis media (AOM) Conclusion Natural language processing of outpatient pediatric visit documentation can be used successfully to create models accurately identifying cases of AOM based on clinical documentation. Combining NLP and structured data can improve automated case detection, leading to more accurate assessment of antibiotic prescribing practices. These techniques may be valuable in optimizing outpatient antimicrobial stewardship efforts. Disclosures All Authors: No reported disclosures

2021 ◽  
Vol 2 ◽  
Author(s):  
Denis Newman-Griffis ◽  
Jonathan Camacho Maldonado ◽  
Pei-Shu Ho ◽  
Maryanne Sacco ◽  
Rafael Jimenez Silva ◽  
...  

Background: Invaluable information on patient functioning and the complex interactions that define it is recorded in free text portions of the Electronic Health Record (EHR). Leveraging this information to improve clinical decision-making and conduct research requires natural language processing (NLP) technologies to identify and organize the information recorded in clinical documentation.Methods: We used natural language processing methods to analyze information about patient functioning recorded in two collections of clinical documents pertaining to claims for federal disability benefits from the U.S. Social Security Administration (SSA). We grounded our analysis in the International Classification of Functioning, Disability, and Health (ICF), and used the Activities and Participation domain of the ICF to classify information about functioning in three key areas: mobility, self-care, and domestic life. After annotating functional status information in our datasets through expert clinical review, we trained machine learning-based NLP models to automatically assign ICF categories to mentions of functional activity.Results: We found that rich and diverse information on patient functioning was documented in the free text records. Annotation of 289 documents for Mobility information yielded 2,455 mentions of Mobility activities and 3,176 specific actions corresponding to 13 ICF-based categories. Annotation of 329 documents for Self-Care and Domestic Life information yielded 3,990 activity mentions and 4,665 specific actions corresponding to 16 ICF-based categories. NLP systems for automated ICF coding achieved over 80% macro-averaged F-measure on both datasets, indicating strong performance across all ICF categories used.Conclusions: Natural language processing can help to navigate the tradeoff between flexible and expressive clinical documentation of functioning and standardizable data for comparability and learning. The ICF has practical limitations for classifying functional status information in clinical documentation but presents a valuable framework for organizing the information recorded in health records about patient functioning. This study advances the development of robust, ICF-based NLP technologies to analyze information on patient functioning and has significant implications for NLP-powered analysis of functional status information in disability benefits management, clinical care, and research.


2021 ◽  
Author(s):  
Denis R Newman-Griffis ◽  
Jonathan Camacho Maldonado ◽  
Pei-Shu Ho ◽  
Maryanne Sacco ◽  
Rafael Jimenez Silva ◽  
...  

Background: Invaluable information on patient functioning and the complex interactions that define it is recorded in free text portions of the Electronic Health Record (EHR). Leveraging this information to improve clinical decision-making and conduct research requires natural language processing (NLP) technologies to identify and organize the information recorded in clinical documentation. Methods: We used NLP methods to analyze information about patient functioning recorded in two collections of clinical documents pertaining to claims for federal disability benefits from the U.S. Social Security Administration (SSA). We grounded our analysis in the International Classification of Functioning, Disability and Health (ICF), and used the ICF's Activities and Participation domain to classify information about functioning in three key areas: Mobility, Self-Care, and Domestic Life. After annotating functional status information in our datasets through expert clinical review, we trained machine learning-based NLP models to automatically assign ICF codes to mentions of functional activity. Results: We found that rich and diverse information on patient functioning was documented in the free text records. Annotation of 289 documents for Mobility information yielded 2,455 mentions of Mobility activities and 3,176 specific actions corresponding to 13 ICF-based codes. Annotation of 329 documents for Self-Care and Domestic Life information yielded 3,990 activity mentions and 4,665 specific actions corresponding to 16 ICF-based codes. NLP systems for automated ICF coding achieved over 80% macro-averaged F-measure on both datasets, indicating strong performance across all ICF codes used. Conclusions: NLP can help to navigate the tradeoff between flexible and expressive clinical documentation of functioning and standardizable data for comparability and learning. The ICF has practical limitations for classifying functional status information in clinical documentation, but presents a valuable framework for organizing the information recorded in health records about patient functioning. This study advances the development of robust, ICF-based NLP technologies to analyze information on patient functioning, and has significant implications for NLP-powered analysis of functional status information in disability benefits management, clinical care, and research.


2019 ◽  
Author(s):  
Jason Ken Hou ◽  
Christopher C. Taylor ◽  
Ergin Soysal ◽  
Shubhada Sansgiry ◽  
Peter Richardson ◽  
...  

Abstract Background: Although practice guidelines recommend colorectal cancer surveillance for inflammatory bowel disease (IBD) patients, the natural history of patient with dysplasia is poorly described. Assembling large cohorts of IBD patients with dysplasia is difficult as administrative codes are lacking. The aim of this study was to use natural language processing (NLP) in a large electronic health records (EHR) to identify IBD patients with colonic dysplasia. Methods: We conducted a retrospective cohort study using administrative data from the national Veterans Health Administration (VHA) Corporate Data Warehouse for patients with IBD. Full-text histopathology reports from patients who underwent colonoscopy in the VHA were obtained and a validation cohort was created using a random sample of 2000 reports. An NLP algorithm to identify the presence and grade of dysplasia was developed and performance tested in a validation cohort. The final NLP algorithm was applied to the entire IBD cohort to identify all cases of colonic dysplasia. Results: We identified a total of 44,099 Veterans with IBD, with 22,431 colonoscopy related histopathology reports. NLP had an accuracy of 97.1% for detection of low grade dysplasia, with a precision of 87%, recall of 96.6%, and F- measure of 91.5%. When applied to the entire cohort, a total of 1,762 cases of colonic dysplasia were identified. Conclusions: NLP accurately identifies colonic low-grade dysplasia in IBD patients from a national EHR. NLP can be used to identify large cohorts of IBD patients with dysplasia to further study the natural history and outcomes of colonic dysplasia in patients with IBD.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Avery Chadd ◽  
Rebecca Silvola ◽  
Yana Vorontsova ◽  
Andrea Broyles ◽  
Jonathan Cummins ◽  
...  

Background/Objective: Real-world data, including electronic health records (EHRs), has shown tremendous utility in research relating to opioid use disorder (OUD). Traditional analysis of EHR data relies on explicit diagnostic codes and results in incomplete capture of cases and therefore underrepresentation of OUD rates. Machine learning can rectify this by surveying free clinical notes in addition to structured codes. This study aimed to address disparities between true OUD rates and cases identified using traditional ICD codes by developing a natural language processing (NLP) machine for identifying affected patients from EHRs. Methods: Patients (≥12 years old) who had received an opioid prescription from IU Health or Eskenazi Health between 1/1/2009 and 12/31/2015 were identified by the Regenstrief Institute. Exclusion criteria included any cancer, sickle cell anemia, or palliative care diagnoses. Cases of OUD were identified through ICD codes and NLP. The NLP machine was developed using a dictionary of key OUD terms and a training corpus of 300 patient notes. A testing corpus of 148 patient notes was constructed and validated by manual review. The NLP machine and ICD 9/10 codes were independently tested against this corpus. Results: Although ICD codes identified OUD cases with high specificity (98.08%), this method demonstrated moderate sensitivity (53.13%), accuracy (68.92%), and F1 score (68.92%). Testing using the NLP method demonstrated increased sensitivity (93.75%), increased accuracy (89.19%), and increased F1 score (91.84%); specificity mildly decreased (80.77%). Conclusion: Our revised NLP machine was more effective at capturing OUD cases in EHRs than traditional identification using ICD codes. This illustrates NLP’s enhanced capability of identifying OUD cases compared to structured data. Potential Impacts: These findings establish a role for NLP in OUD research involving large datasets. Ultimately, this is intended to improve identification of risk factors for OUD, which is of significant clinical importance during a public health crisis. 


2020 ◽  
Author(s):  
David Chang ◽  
Eric Lin ◽  
Cynthia Brandt ◽  
Richard Andrew Taylor

BACKGROUND While electronic health record systems have facilitated clinical documentation in healthcare, they also introduce new challenges such as the proliferation of redundant information through copy-and-paste commands or templates. One approach to trim down bloated clinical documentation and improve clinical summarization is to identify highly similar text snippets for the goal of removing such text. OBJECTIVE We develop a natural language processing system for the task of clinical semantic textual similarity that assigns scores to pairs of clinical text snippets based on their clinical semantic similarity. METHODS We leverage recent advances in natural language processing and graph representation learning to create a model that combines linguistic and domain knowledge information from the MedSTS dataset to assess clinical semantic textual similarity. We use Bidirectional Encoder Representation from Transformers (BERT)¬–based models as text encoders for the sentence pairs in the dataset and graph convolutional networks (GCNs) as graph encoders for corresponding concept graphs constructed based on the sentences. We also explore techniques including data augmentation, ensembling, and knowledge distillation to improve the performance as measured by Pearson correlation. RESULTS Fine–tuning BERT-base and ClinicalBERT on the MedSTS dataset provided a strong baseline (0.842 and 0.848 Pearson correlation, respectively) compared to the previous year’s submissions. Our data augmentation techniques yielded moderate gains in performance, and adding a GCN–based graph encoder to incorporate the concept graphs also boosted performance, especially when the node features were initialized with pretrained knowledge graph embeddings of the concepts (0.868). As expected, ensembling improved performance, and multi–source ensembling using different language model variants, conducting knowledge distillation on the multi–source ensemble model, and taking a final ensemble of the distilled models further improved the system’s performance (0.875, 0.878, and 0.882, respectively). CONCLUSIONS We develop a system for the MedSTS clinical semantic textual similarity benchmark task by combining BERT–based text encoders and GCN–based graph encoders in order to incorporate domain knowledge into the natural language processing pipeline. We also experiment with other techniques involving data augmentation, pretrained concept embeddings, ensembling, and knowledge distillation to further increase our performance.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Sign in / Sign up

Export Citation Format

Share Document