Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma

PURPOSE The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others. MATERIALS AND METHODS Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports. RESULTS In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels (ρ = 0.904; P < .001) but not with the frequency distribution of the variables of interest (ρ = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents. CONCLUSION This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.

Download Full-text

Semantic computational analysis of anticoagulation use in atrial fibrillation from real world data

10.1101/19011643 ◽

2019 ◽

Author(s):

Daniel M. Bean ◽

James Teo ◽

Honghan Wu ◽

Ricardo Oliveira ◽

Raj Patel ◽

...

Keyword(s):

Atrial Fibrillation ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Open Source ◽

Language Processing ◽

Risk Scores ◽

Free Text ◽

Health Record ◽

Electronic Health

AbstractAtrial fibrillation (AF) is the most common arrhythmia and significantly increases stroke risk. This risk is effectively managed by oral anticoagulation. Recent studies using national registry data indicate increased use of anticoagulation resulting from changes in guidelines and the availability of newer drugs.The aim of this study is to develop and validate an open source risk scoring pipeline for free-text electronic health record data using natural language processing.AF patients discharged from 1st January 2011 to 1st October 2017 were identified from discharge summaries (N=10,030, 64.6% male, average age 75.3 ± 12.3 years). A natural language processing pipeline was developed to identify risk factors in clinical text and calculate risk for ischaemic stroke (CHA2DS2-VASc) and bleeding (HAS-BLED). Scores were validated vs two independent experts for 40 patients.Automatic risk scores were in strong agreement with the two independent experts for CHA2DS2-VASc (average kappa 0.78 vs experts, compared to 0.85 between experts). Agreement was lower for HAS-BLED (average kappa 0.54 vs experts, compared to 0.74 between experts).In high-risk patients (CHA2DS2-VASc ≥2) OAC use has increased significantly over the last 7 years, driven by the availability of DOACs and the transitioning of patients from AP medication alone to OAC. Factors independently associated with OAC use included components of the CHA2DS2-VASc and HAS-BLED scores as well as discharging specialty and frailty. OAC use was highest in patients discharged under cardiology (69%).Electronic health record text can be used for automatic calculation of clinical risk scores at scale. Open source tools are available today for this task but require further validation. Analysis of routinely-collected EHR data can replicate findings from large-scale curated registries.

Download Full-text

Interactive NLP in Clinical Care: Identifying Incidental Findings in Radiology Reports

Applied Clinical Informatics ◽

10.1055/s-0039-1695791 ◽

2019 ◽

Vol 10 (04) ◽

pp. 655-669

Author(s):

Gaurav Trivedi ◽

Esmaeel R. Dadashzadeh ◽

Robert M. Handzel ◽

Wendy W. Chapman ◽

Shyam Visweswaran ◽

...

Keyword(s):

Language Processing ◽

Gold Standard ◽

Incidental Findings ◽

User Study ◽

Clinical Care ◽

Model Performance ◽

Data Set ◽

Radiology Reports ◽

System Usability Scale ◽

And Control

Abstract Background Despite advances in natural language processing (NLP), extracting information from clinical text is expensive. Interactive tools that are capable of easing the construction, review, and revision of NLP models can reduce this cost and improve the utility of clinical reports for clinical and secondary use. Objectives We present the design and implementation of an interactive NLP tool for identifying incidental findings in radiology reports, along with a user study evaluating the performance and usability of the tool. Methods Expert reviewers provided gold standard annotations for 130 patient encounters (694 reports) at sentence, section, and report levels. We performed a user study with 15 physicians to evaluate the accuracy and usability of our tool. Participants reviewed encounters split into intervention (with predictions) and control conditions (no predictions). We measured changes in model performance, the time spent, and the number of user actions needed. The System Usability Scale (SUS) and an open-ended questionnaire were used to assess usability. Results Starting from bootstrapped models trained on 6 patient encounters, we observed an average increase in F1 score from 0.31 to 0.75 for reports, from 0.32 to 0.68 for sections, and from 0.22 to 0.60 for sentences on a held-out test data set, over an hour-long study session. We found that tool helped significantly reduce the time spent in reviewing encounters (134.30 vs. 148.44 seconds in intervention and control, respectively), while maintaining overall quality of labels as measured against the gold standard. The tool was well received by the study participants with a very good overall SUS score of 78.67. Conclusion The user study demonstrated successful use of the tool by physicians for identifying incidental findings. These results support the viability of adopting interactive NLP tools in clinical care settings for a wider range of clinical applications.

Download Full-text

Expert guided natural language processing using one-class classification

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv010 ◽

2015 ◽

Vol 22 (5) ◽

pp. 962-966 ◽

Cited By ~ 5

Author(s):

Erel Joffe ◽

Emily J Pettigrew ◽

Jorge R Herskovic ◽

Charles F Bearden ◽

Elmer V Bernstam

Keyword(s):

Breast Cancer ◽

Language Processing ◽

Text Processing ◽

Binary Classification ◽

Model Performance ◽

Imbalanced Data ◽

Superior Performance ◽

Support Vector ◽

Free Text ◽

One Class Classification

Abstract Introduction Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing. Objectives To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text. Methods The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10 000 records with a breast cancer prevalence of 1.4%. Results When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F = 0.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F = 0.61 vs. F = 0.17 for the best performing model) attributable mainly to improved precision (p = .88 vs. p = .09 for the best performing model). Conclusions 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.

Download Full-text

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study (Preprint)

10.2196/preprints.24381 ◽

2020 ◽

Author(s):

Amy Y X Yu ◽

Zhongyu A Liu ◽

Chloe Pou-Prom ◽

Kaitlyn Lopes ◽

Moira K Kapral ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Predictive Value ◽

Large Vessel ◽

Free Text ◽

Imaging Data ◽

Large Vessel Occlusion ◽

Vessel Occlusion ◽

Radiology Reports

BACKGROUND Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews. OBJECTIVE We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports. METHODS From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data. RESULTS The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status. CONCLUSIONS NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.

Download Full-text

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

JMIR Medical Informatics ◽

10.2196/24381 ◽

2021 ◽

Vol 9 (5) ◽

pp. e24381

Author(s):

Amy Y X Yu ◽

Zhongyu A Liu ◽

Chloe Pou-Prom ◽

Kaitlyn Lopes ◽

Moira K Kapral ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Predictive Value ◽

Large Vessel ◽

Free Text ◽

Imaging Data ◽

Large Vessel Occlusion ◽

Vessel Occlusion ◽

Radiology Reports

Background Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews. Objective We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports. Methods From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data. Results The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status. Conclusions NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.

Download Full-text

A natural language processing tool for automatic identification of new disease and disease progression: Parsing text in multi-institutional radiology reports to facilitate clinical trial eligibility screening.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.1555 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 1555-1555

Author(s):

Eric J. Clayton ◽

Imon Banerjee ◽

Patrick J. Ward ◽

Maggie D Howell ◽

Beth Lohmueller ◽

...

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Natural Language Processing ◽

Natural Language ◽

Disease Progression ◽

Language Processing ◽

Free Text ◽

New Disease ◽

Radiology Reports ◽

Precision And Accuracy

1555 Background: Screening every patient for clinical trials is time-consuming, costly and inefficient. Developing an automated method for identifying patients who have potential disease progression, at the point where the practice first receives their radiology reports, but prior to the patient’s office visit, would greatly increase the efficiency of clinical trial operations and likely result in more patients being offered trial opportunities. Methods: Using Natural Language Processing (NLP) methodology, we developed a text parsing algorithm to automatically extract information about potential new disease or disease progression from multi-institutional, free-text radiology reports (CT, PET, bone scan, MRI or x-ray). We combined semantic dictionary mapping and machine learning techniques to normalize the linguistic and formatting variations in the text, training the XGBoost model particularly to achieve a high precision and accuracy to satisfy clinical trial screening requirements. In order to be comprehensive, we enhanced the model vocabulary using a multi-institutional dataset which includes reports from two academic institutions. Results: A dataset of 732 de-identified radiology reports were curated (two MDs agreed on potential new disease/dz progression vs stable) and the model was repeatedly re-trained for each fold where the folds were randomly selected. The final model achieved consistent precision (>0.87 precision) and accuracy (>0.87 accuracy). See the table for a summary of the results, by radiology report type. We are continuing work on the model to validate accuracy and precision using a new and unique set of reports. Conclusions: NLP systems can be used to identify patients who potentially have suffered new disease or disease progression and reduce the human effort in screening or clinical trials. Efforts are ongoing to integrate the NLP process into existing EHR reporting. New imaging reports sent via interface to the EHR will be extracted daily using a database query and will be provided via secure electronic transport to the NLP system. Patients with higher likelihood of disease progression will be automatically identified, and their reports routed to the clinical trials office for clinical trial screening parallel to physician EHR mailbox reporting. The over-arching goal of the project is to increase clinical trial enrollment. 5-fold cross-validation performance of the NLP model in terms of accuracy, precision and recall averaged across all the folds.[Table: see text]

Download Full-text

Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography

ACM Transactions on Computing for Healthcare ◽

10.1145/3474831 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-20

Author(s):

Amara Tariq ◽

Marly Van Assen ◽

Carlo N. De Cecco ◽

Imon Banerjee

Keyword(s):

Coronary Artery ◽

Language Processing ◽

Free Form ◽

Free Text ◽

Content Type ◽

Report Writing ◽

Coronary Ct ◽

Radiology Reports ◽

Structured Reports ◽

Artery Disease

Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of \le 1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.

Download Full-text

Predicting mortality in both diabetes and open-source clinical datasets from free text entries using machine learning (natural language processing)

Endocrine Abstracts ◽

10.1530/endoabs.65.p219 ◽

2019 ◽

Author(s):

Christopher Sainsbury ◽

Andrew Conkie ◽

Mark Buchner ◽

Ann Wales ◽

Gregory Jones

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Open Source ◽

Language Processing ◽

Free Text

Download Full-text

A Highly Generalizable Natural Language Processing Algorithm for the Diagnosis of Pulmonary Embolism from Radiology Reports

10.1101/2020.10.13.20211961 ◽

2020 ◽

Author(s):

Jacob Johnson ◽

Grace Qiu ◽

Christine Lamoureux ◽

Jennifer Ngo ◽

Lawrence Ngo

Keyword(s):

Pulmonary Embolism ◽

Deep Learning ◽

Natural Language Processing ◽

Sample Size ◽

Language Processing ◽

High Accuracy ◽

Free Text ◽

Radiology Reports ◽

Natural Language Processing Algorithm

AbstractThough sophisticated algorithms have been developed for the classification of free-text radiology reports for pulmonary embolism (PE), their overall generalizability remains unvalidated given limitations in sample size and data homogeneity. We developed and validated a highly generalizable deep-learning based NLP algorithm for this purpose with data sourced from over 2,000 hospital sites and 500 radiologists. The algorithm achieved an AUCROC of 0.995 on chest angiography studies and 0.994 on non-angiography studies for the presence or absence of PE. The high accuracy achieved on this large and heterogeneous dataset allows for the possibility of application in large multi-center radiology practices as well as for deployment at novel sites without significant degradation in performance.

Download Full-text

Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01623-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yuta Nakamura ◽

Shouhei Hanaoka ◽

Yukihiro Nomura ◽

Takahiro Nakao ◽

Soichiro Miki ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Binary Classification ◽

Characteristic Curve ◽

Gradient Boosting ◽

Free Text ◽

Order Information ◽

Radiology Reports ◽

Actionable Findings

Abstract Background It is essential for radiologists to communicate actionable findings to the referring clinicians reliably. Natural language processing (NLP) has been shown to help identify free-text radiology reports including actionable findings. However, the application of recent deep learning techniques to radiology reports, which can improve the detection performance, has not been thoroughly examined. Moreover, free-text that clinicians input in the ordering form (order information) has seldom been used to identify actionable reports. This study aims to evaluate the benefits of two new approaches: (1) bidirectional encoder representations from transformers (BERT), a recent deep learning architecture in NLP, and (2) using order information in addition to radiology reports. Methods We performed a binary classification to distinguish actionable reports (i.e., radiology reports tagged as actionable in actual radiological practice) from non-actionable ones (those without an actionable tag). 90,923 Japanese radiology reports in our hospital were used, of which 788 (0.87%) were actionable. We evaluated four methods, statistical machine learning with logistic regression (LR) and with gradient boosting decision tree (GBDT), and deep learning with a bidirectional long short-term memory (LSTM) model and a publicly available Japanese BERT model. Each method was used with two different inputs, radiology reports alone and pairs of order information and radiology reports. Thus, eight experiments were conducted to examine the performance. Results Without order information, BERT achieved the highest area under the precision-recall curve (AUPRC) of 0.5138, which showed a statistically significant improvement over LR, GBDT, and LSTM, and the highest area under the receiver operating characteristic curve (AUROC) of 0.9516. Simply coupling the order information with the radiology reports slightly increased the AUPRC of BERT but did not lead to a statistically significant improvement. This may be due to the complexity of clinical decisions made by radiologists. Conclusions BERT was assumed to be useful to detect actionable reports. More sophisticated methods are required to use order information effectively.

Download Full-text