Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach

2021 ◽  
Vol 116 ◽  
pp. 103712
Author(s):  
Linda Hammami ◽  
Alessia Paglialonga ◽  
Giancarlo Pruneri ◽  
Michele Torresani ◽  
Milena Sant ◽  
...  
2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 1557-1557
Author(s):  
Risa Liang Wong ◽  
Medha Sagar ◽  
Jacob Hoffman ◽  
Claire Huang ◽  
Angelica Lerma ◽  
...  

1557 Background: Patients with prostate cancer are diagnosed through a prostate needle biopsy (PNB). Information contained in PNB pathology reports is critical for informing clinical risk stratification and treatment; however, patient comprehension of PNB pathology reports is low, and formats vary widely by institution. Natural language processing (NLP) models trained to automatically extract key information from unstructured PNB pathology reports could be used to generate personalized educational materials for patients in a scalable fashion and expedite the process of collecting registry data or screening patients for clinical trials. As proof of concept, we trained and tested four NLP models for accuracy of information extraction. Methods: Using 403 positive PNB pathology reports from over 80 institutions, we converted portable document formats (PDFs) into text using the Tesseract optical character recognition (OCR) engine, removed protected health information using the Philter open-source tool, cleaned the text with rule-based methods, and annotated clinically relevant attributes as well as structural attributes relevant to information extraction using the Brat Rapid Annotation Tool. Text pre-processing for classification and extraction was done using Scispacy and rule-based methods. Using a 75:25 train:test split (N = 302, 101), we tested conditional random field (CRF), support vector machine (SVM), bidirectional long-short term memory network (Bi-LSTM), and Bi-LSTM-CRF models, reserving 46 training reports as a validation subset for the latter two models. Model-extracted variables were compared with values manually obtained from the unprocessed PDF reports for clinical accuracy. Results: Clinical accuracy of model-extracted variables is reported in the Table. CRF was the highest performing model, with accuracies of 97% for Gleason grade, 82% for percentage of positive cores ( < 50% vs. ≥50%), 90% for perineural or lymphovascular invasion, and 100% for presence of non-acinar carcinoma histology. On manual review of inaccurate results, model performance was limited by PDF image quality, errors in OCR processing of tables or columns, and practice variability in reporting number of biopsy cores. Conclusions: Our results demonstrate successful proof of concept for the use of NLP models in accurately extracting information from PNB pathology reports, though further optimization is needed before use in clinical practice.[Table: see text]


AERA Open ◽  
2021 ◽  
Vol 7 ◽  
pp. 233285842110286
Author(s):  
Kylie L. Anglin ◽  
Vivian C. Wong ◽  
Arielle Boguslav

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.


Sign in / Sign up

Export Citation Format

Share Document