Automated Identification and Measurement Extraction of Pancreatic Cystic Lesions from Free-Text Radiology Reports Using Natural Language Processing

BACKGROUND Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews. OBJECTIVE We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports. METHODS From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data. RESULTS The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status. CONCLUSIONS NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.

Download Full-text

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

JMIR Medical Informatics ◽

10.2196/24381 ◽

2021 ◽

Vol 9 (5) ◽

pp. e24381

Author(s):

Amy Y X Yu ◽

Zhongyu A Liu ◽

Chloe Pou-Prom ◽

Kaitlyn Lopes ◽

Moira K Kapral ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Predictive Value ◽

Large Vessel ◽

Free Text ◽

Imaging Data ◽

Large Vessel Occlusion ◽

Vessel Occlusion ◽

Radiology Reports

Background Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews. Objective We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports. Methods From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data. Results The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status. Conclusions NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.

Download Full-text

A natural language processing tool for automatic identification of new disease and disease progression: Parsing text in multi-institutional radiology reports to facilitate clinical trial eligibility screening.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.1555 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 1555-1555

Author(s):

Eric J. Clayton ◽

Imon Banerjee ◽

Patrick J. Ward ◽

Maggie D Howell ◽

Beth Lohmueller ◽

...

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Natural Language Processing ◽

Natural Language ◽

Disease Progression ◽

Language Processing ◽

Free Text ◽

New Disease ◽

Radiology Reports ◽

Precision And Accuracy

1555 Background: Screening every patient for clinical trials is time-consuming, costly and inefficient. Developing an automated method for identifying patients who have potential disease progression, at the point where the practice first receives their radiology reports, but prior to the patient’s office visit, would greatly increase the efficiency of clinical trial operations and likely result in more patients being offered trial opportunities. Methods: Using Natural Language Processing (NLP) methodology, we developed a text parsing algorithm to automatically extract information about potential new disease or disease progression from multi-institutional, free-text radiology reports (CT, PET, bone scan, MRI or x-ray). We combined semantic dictionary mapping and machine learning techniques to normalize the linguistic and formatting variations in the text, training the XGBoost model particularly to achieve a high precision and accuracy to satisfy clinical trial screening requirements. In order to be comprehensive, we enhanced the model vocabulary using a multi-institutional dataset which includes reports from two academic institutions. Results: A dataset of 732 de-identified radiology reports were curated (two MDs agreed on potential new disease/dz progression vs stable) and the model was repeatedly re-trained for each fold where the folds were randomly selected. The final model achieved consistent precision (>0.87 precision) and accuracy (>0.87 accuracy). See the table for a summary of the results, by radiology report type. We are continuing work on the model to validate accuracy and precision using a new and unique set of reports. Conclusions: NLP systems can be used to identify patients who potentially have suffered new disease or disease progression and reduce the human effort in screening or clinical trials. Efforts are ongoing to integrate the NLP process into existing EHR reporting. New imaging reports sent via interface to the EHR will be extracted daily using a database query and will be provided via secure electronic transport to the NLP system. Patients with higher likelihood of disease progression will be automatically identified, and their reports routed to the clinical trials office for clinical trial screening parallel to physician EHR mailbox reporting. The over-arching goal of the project is to increase clinical trial enrollment. 5-fold cross-validation performance of the NLP model in terms of accuracy, precision and recall averaged across all the folds.[Table: see text]

Download Full-text

A Natural Language Processing Pipeline of Chinese Free-Text Radiology Reports for Liver Cancer Diagnosis

IEEE Access ◽

10.1109/access.2020.3020138 ◽

2020 ◽

Vol 8 ◽

pp. 159110-159119

Author(s):

Honglei Liu ◽

Yan Xu ◽

Zhiqiang Zhang ◽

Ni Wang ◽

Yanqun Huang ◽

...

Keyword(s):

Natural Language Processing ◽

Liver Cancer ◽

Natural Language ◽

Language Processing ◽

Cancer Diagnosis ◽

Free Text ◽

Processing Pipeline ◽

Radiology Reports

Download Full-text

Automated Identification of Patients With Pulmonary Nodules in an Integrated Health System Using Administrative Health Plan Data, Radiology Reports, and Natural Language Processing

Journal of Thoracic Oncology ◽

10.1097/jto.0b013e31825bd9f5 ◽

2012 ◽

Vol 7 (8) ◽

pp. 1257-1262 ◽

Cited By ~ 30

Author(s):

Kim N. Danforth ◽

Megan I. Early ◽

Sharon Ngan ◽

Anne E. Kosco ◽

Chengyi Zheng ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Health System ◽

Health Plan ◽

Language Processing ◽

Pulmonary Nodules ◽

Automated Identification ◽

Radiology Reports ◽

Integrated Health

Download Full-text

Automate incidental findings in radiology reports using natural language processing and machine learning to identify and classify lung nodules.

Journal of Global Oncology ◽

10.1200/jgo.2019.5.suppl.49 ◽

2019 ◽

Vol 5 (suppl) ◽

pp. 49-49

Author(s):

Christi French ◽

Dax Kurbegov ◽

David R. Spigel ◽

Maciek Makowski ◽

Samantha Terker ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Pulmonary Nodule ◽

Incidental Findings ◽

Free Text ◽

Radiology Reports

49 Background: Pulmonary nodule incidental findings challenge providers to balance resource efficiency and high clinical quality. Incidental findings tend to be under evaluated with studies reporting appropriate follow-up rates as low as 29%. The efficient identification of patients with high risk nodules is foundational to ensuring appropriate follow-up and requires the clinical reading and classification of radiology reports. We tested the feasibility of automating this process with natural language processing (NLP) and machine learning (ML). Methods: In cooperation with Sarah Cannon, the Cancer Institute of HCA Healthcare, we conducted a series of experiments on 8,879 free-text, narrative CT radiology reports. A representative sample of health system ED, IP, and OP reports dated from Dec 2015 - April 2017 were divided into a development set for model training and validation, and a test set to evaluate model performance. A “Nodule Model” was trained to detect the reported presence of a pulmonary nodule and a rules-based “Size Model” was developed to extract the size of the nodule in mms. Reports were bucketed into three prediction groups: ≥ 6 mm, <6 mm, and no size indicated. Nodules were placed in a queue for follow-up if the nodule was predicted ≥ 6 mm, or if the nodule had no size indicated and the report contained the word “mass.” The Fleischner Society Guidelines and clinical review informed these definitions. Results: Precision and recall metrics were calculated for multiple model thresholds. A threshold was selected based on the validation set calculations and a success criterion of 90% queue precision was selected to minimize false positives. On the test dataset, the F1 measure of the entire pipeline was 72.9%, recall was 60.3%, and queue precision was 90.2%, exceeding success criteria. Conclusions: The experiments demonstrate the feasibility of technology to automate the detection and classification of pulmonary nodule incidental findings in radiology reports. This approach promises to improve healthcare quality by increasing the rate of appropriate lung nodule incidental finding follow-up and treatment without excessive labor or risking overutilization.

Download Full-text

Using Natural Language Processing of Free-Text Radiology Reports to Identify Type 1 Modic Endplate Changes

Journal of Digital Imaging ◽

10.1007/s10278-017-0013-3 ◽

2017 ◽

Vol 31 (1) ◽

pp. 84-90 ◽

Cited By ~ 10

Author(s):

Hannu T. Huhdanpaa ◽

W. Katherine Tan ◽

Sean D. Rundell ◽

Pradeep Suri ◽

Falgun H. Chokshi ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Free Text ◽

Radiology Reports ◽

Endplate Changes ◽

Modic Endplate Changes

Download Full-text

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology

Journal of Digital Imaging ◽

10.1007/s10278-020-00327-z ◽

2020 ◽

Vol 33 (4) ◽

pp. 1002-1008

Author(s):

J. Martijn Nobel ◽

Sander Puts ◽

Frans C. H. Bakers ◽

Simon G. F. Robben ◽

André L. A. J. Dekker

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Free Text ◽

Radiology Reports ◽

Language Area

Download Full-text

ToKSA - Tokenized Key Sentence Annotation - a Novel Method for Rapid Approximation of Ground Truth for Natural Language Processing

10.1101/2021.10.06.21264629 ◽

2021 ◽

Author(s):

Cameron J Fairfield ◽

William A Cambridge ◽

Lydia Cullen ◽

Thomas M Drake ◽

Stephen R Knight ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Data ◽

Ground Truth ◽

Clinical Work ◽

Abdominal Ultrasound ◽

Free Text ◽

Radiology Reports ◽

Symptom Status

ABSTRACTObjectiveIdentifying phenotypes and pathology from free text is an essential task for clinical work and research. Natural language processing (NLP) is a key tool for processing free text at scale. Developing and validating NLP models requires labelled data. Labels are generated through time-consuming and repetitive manual annotation and are hard to obtain for sensitive clinical data. The objective of this paper is to describe a novel approach for annotating radiology reports.Materials and MethodsWe implemented tokenized key sentence-specific annotation (ToKSA) for annotating clinical data. We demonstrate ToKSA using 180,050 abdominal ultrasound reports with labels generated for symptom status, gallstone status and cholecystectomy status. Firstly, individual sentences are grouped together into a term-frequency matrix. Annotation of key (i.e. the most frequently occurring) sentences is then used to generate labels for multiple reports simultaneously. We compared ToKSA-derived labels to those generated by annotating full reports. We used ToKSA-derived labels to train a document classifier using convolutional neural networks. We compared performance of the classifier to a separate classifier trained on labels based on the full reports.ResultsBy annotating only 2,000 frequent sentences, we were able to generate labels for symptom status for 70,000 reports (accuracy 98.4%), gallstone status for 85,177 reports (accuracy 99.2%) and cholecystectomy status for 85,177 reports (accuracy 100%). The accuracy of the document classifier trained on ToKSA labels was similar (0.1-1.1% more accurate) to the document classifier trained on full report labels.ConclusionToKSA offers an accurate and efficient method for annotating free text clinical data.

Download Full-text