scholarly journals Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study (Preprint)

2020 ◽  
Author(s):  
Amy Y X Yu ◽  
Zhongyu A Liu ◽  
Chloe Pou-Prom ◽  
Kaitlyn Lopes ◽  
Moira K Kapral ◽  
...  

BACKGROUND Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews. OBJECTIVE We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports. METHODS From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data. RESULTS The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status. CONCLUSIONS NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.

10.2196/24381 ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. e24381
Author(s):  
Amy Y X Yu ◽  
Zhongyu A Liu ◽  
Chloe Pou-Prom ◽  
Kaitlyn Lopes ◽  
Moira K Kapral ◽  
...  

Background Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews. Objective We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports. Methods From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data. Results The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status. Conclusions NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.


Stroke ◽  
2021 ◽  
Author(s):  
Anoop Mayampurath ◽  
Zahra Parnianpour ◽  
Christopher T. Richards ◽  
William J. Meurer ◽  
Jungwha Lee ◽  
...  

Background and Purpose: Accurate prehospital diagnosis of stroke by emergency medical services (EMS) can increase treatments rates, mitigate disability, and reduce stroke deaths. We aimed to develop a model that utilizes natural language processing of EMS reports and machine learning to improve prehospital stroke identification. Methods: We conducted a retrospective study of patients transported by the Chicago EMS to 17 regional primary and comprehensive stroke centers. Patients who were suspected of stroke by the EMS or had hospital-diagnosed stroke were included in our cohort. Text within EMS reports were converted to unigram features, which were given as input to a support-vector machine classifier that was trained on 70% of the cohort and tested on the remaining 30%. Outcomes included final diagnosis of stroke versus nonstroke, large vessel occlusion, severe stroke (National Institutes of Health Stroke Scale score >5), and comprehensive stroke center-eligible stroke (large vessel occlusion or hemorrhagic stroke). Results: Of 965 patients, 580 (60%) had confirmed acute stroke. In a test set of 289 patients, the text-based model predicted stroke nominally better than models based on the Cincinnati Prehospital Stroke Scale ( c -statistic: 0.73 versus 0.67, P =0.165) and was superior to the 3-Item Stroke Scale ( c -statistic: 0.73 versus 0.53, P <0.001) scores. Improvements in discrimination were also observed for the other outcomes. Conclusions: We derived a model that utilizes clinical text from paramedic reports to identify stroke. Our results require validation but have the potential of improving prehospital routing protocols.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 1555-1555
Author(s):  
Eric J. Clayton ◽  
Imon Banerjee ◽  
Patrick J. Ward ◽  
Maggie D Howell ◽  
Beth Lohmueller ◽  
...  

1555 Background: Screening every patient for clinical trials is time-consuming, costly and inefficient. Developing an automated method for identifying patients who have potential disease progression, at the point where the practice first receives their radiology reports, but prior to the patient’s office visit, would greatly increase the efficiency of clinical trial operations and likely result in more patients being offered trial opportunities. Methods: Using Natural Language Processing (NLP) methodology, we developed a text parsing algorithm to automatically extract information about potential new disease or disease progression from multi-institutional, free-text radiology reports (CT, PET, bone scan, MRI or x-ray). We combined semantic dictionary mapping and machine learning techniques to normalize the linguistic and formatting variations in the text, training the XGBoost model particularly to achieve a high precision and accuracy to satisfy clinical trial screening requirements. In order to be comprehensive, we enhanced the model vocabulary using a multi-institutional dataset which includes reports from two academic institutions. Results: A dataset of 732 de-identified radiology reports were curated (two MDs agreed on potential new disease/dz progression vs stable) and the model was repeatedly re-trained for each fold where the folds were randomly selected. The final model achieved consistent precision (>0.87 precision) and accuracy (>0.87 accuracy). See the table for a summary of the results, by radiology report type. We are continuing work on the model to validate accuracy and precision using a new and unique set of reports. Conclusions: NLP systems can be used to identify patients who potentially have suffered new disease or disease progression and reduce the human effort in screening or clinical trials. Efforts are ongoing to integrate the NLP process into existing EHR reporting. New imaging reports sent via interface to the EHR will be extracted daily using a database query and will be provided via secure electronic transport to the NLP system. Patients with higher likelihood of disease progression will be automatically identified, and their reports routed to the clinical trials office for clinical trial screening parallel to physician EHR mailbox reporting. The over-arching goal of the project is to increase clinical trial enrollment. 5-fold cross-validation performance of the NLP model in terms of accuracy, precision and recall averaged across all the folds.[Table: see text]


2021 ◽  
Vol 45 (10) ◽  
Author(s):  
A. W. Olthof ◽  
P. M. A. van Ooijen ◽  
L. J. Cornelissen

AbstractIn radiology, natural language processing (NLP) allows the extraction of valuable information from radiology reports. It can be used for various downstream tasks such as quality improvement, epidemiological research, and monitoring guideline adherence. Class imbalance, variation in dataset size, variation in report complexity, and algorithm type all influence NLP performance but have not yet been systematically and interrelatedly evaluated. In this study, we investigate these factors on the performance of four types [a fully connected neural network (Dense), a long short-term memory recurrent neural network (LSTM), a convolutional neural network (CNN), and a Bidirectional Encoder Representations from Transformers (BERT)] of deep learning-based NLP. Two datasets consisting of radiologist-annotated reports of both trauma radiographs (n = 2469) and chest radiographs and computer tomography (CT) studies (n = 2255) were split into training sets (80%) and testing sets (20%). The training data was used as a source to train all four model types in 84 experiments (Fracture-data) and 45 experiments (Chest-data) with variation in size and prevalence. The performance was evaluated on sensitivity, specificity, positive predictive value, negative predictive value, area under the curve, and F score. After the NLP of radiology reports, all four model-architectures demonstrated high performance with metrics up to > 0.90. CNN, LSTM, and Dense were outperformed by the BERT algorithm because of its stable results despite variation in training size and prevalence. Awareness of variation in prevalence is warranted because it impacts sensitivity and specificity in opposite directions.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 159110-159119
Author(s):  
Honglei Liu ◽  
Yan Xu ◽  
Zhiqiang Zhang ◽  
Ni Wang ◽  
Yanqun Huang ◽  
...  

2019 ◽  
Vol 5 (suppl) ◽  
pp. 49-49
Author(s):  
Christi French ◽  
Dax Kurbegov ◽  
David R. Spigel ◽  
Maciek Makowski ◽  
Samantha Terker ◽  
...  

49 Background: Pulmonary nodule incidental findings challenge providers to balance resource efficiency and high clinical quality. Incidental findings tend to be under evaluated with studies reporting appropriate follow-up rates as low as 29%. The efficient identification of patients with high risk nodules is foundational to ensuring appropriate follow-up and requires the clinical reading and classification of radiology reports. We tested the feasibility of automating this process with natural language processing (NLP) and machine learning (ML). Methods: In cooperation with Sarah Cannon, the Cancer Institute of HCA Healthcare, we conducted a series of experiments on 8,879 free-text, narrative CT radiology reports. A representative sample of health system ED, IP, and OP reports dated from Dec 2015 - April 2017 were divided into a development set for model training and validation, and a test set to evaluate model performance. A “Nodule Model” was trained to detect the reported presence of a pulmonary nodule and a rules-based “Size Model” was developed to extract the size of the nodule in mms. Reports were bucketed into three prediction groups: ≥ 6 mm, <6 mm, and no size indicated. Nodules were placed in a queue for follow-up if the nodule was predicted ≥ 6 mm, or if the nodule had no size indicated and the report contained the word “mass.” The Fleischner Society Guidelines and clinical review informed these definitions. Results: Precision and recall metrics were calculated for multiple model thresholds. A threshold was selected based on the validation set calculations and a success criterion of 90% queue precision was selected to minimize false positives. On the test dataset, the F1 measure of the entire pipeline was 72.9%, recall was 60.3%, and queue precision was 90.2%, exceeding success criteria. Conclusions: The experiments demonstrate the feasibility of technology to automate the detection and classification of pulmonary nodule incidental findings in radiology reports. This approach promises to improve healthcare quality by increasing the rate of appropriate lung nodule incidental finding follow-up and treatment without excessive labor or risking overutilization.


2017 ◽  
Vol 31 (1) ◽  
pp. 84-90 ◽  
Author(s):  
Hannu T. Huhdanpaa ◽  
W. Katherine Tan ◽  
Sean D. Rundell ◽  
Pradeep Suri ◽  
Falgun H. Chokshi ◽  
...  

2020 ◽  
Vol 33 (4) ◽  
pp. 1002-1008
Author(s):  
J. Martijn Nobel ◽  
Sander Puts ◽  
Frans C. H. Bakers ◽  
Simon G. F. Robben ◽  
André L. A. J. Dekker

2021 ◽  
Author(s):  
Cameron J Fairfield ◽  
William A Cambridge ◽  
Lydia Cullen ◽  
Thomas M Drake ◽  
Stephen R Knight ◽  
...  

ABSTRACTObjectiveIdentifying phenotypes and pathology from free text is an essential task for clinical work and research. Natural language processing (NLP) is a key tool for processing free text at scale. Developing and validating NLP models requires labelled data. Labels are generated through time-consuming and repetitive manual annotation and are hard to obtain for sensitive clinical data. The objective of this paper is to describe a novel approach for annotating radiology reports.Materials and MethodsWe implemented tokenized key sentence-specific annotation (ToKSA) for annotating clinical data. We demonstrate ToKSA using 180,050 abdominal ultrasound reports with labels generated for symptom status, gallstone status and cholecystectomy status. Firstly, individual sentences are grouped together into a term-frequency matrix. Annotation of key (i.e. the most frequently occurring) sentences is then used to generate labels for multiple reports simultaneously. We compared ToKSA-derived labels to those generated by annotating full reports. We used ToKSA-derived labels to train a document classifier using convolutional neural networks. We compared performance of the classifier to a separate classifier trained on labels based on the full reports.ResultsBy annotating only 2,000 frequent sentences, we were able to generate labels for symptom status for 70,000 reports (accuracy 98.4%), gallstone status for 85,177 reports (accuracy 99.2%) and cholecystectomy status for 85,177 reports (accuracy 100%). The accuracy of the document classifier trained on ToKSA labels was similar (0.1-1.1% more accurate) to the document classifier trained on full report labels.ConclusionToKSA offers an accurate and efficient method for annotating free text clinical data.


Sign in / Sign up

Export Citation Format

Share Document