Abstract 165: Automated Stroke-Related Information Extraction From Diagnostic Imaging Reports Using Natural Language Processing
Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.