Derivation of a natural language processing algorithm to identify febrile infants

BACKGROUND Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. OBJECTIVE This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. METHODS A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. RESULTS The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). CONCLUSIONS This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.

Download Full-text

Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm

Journal of Digital Imaging ◽

10.1007/s10278-019-00237-9 ◽

2019 ◽

Vol 32 (4) ◽

pp. 544-553 ◽

Cited By ~ 5

Author(s):

Selen Bozkurt ◽

Emel Alkim ◽

Imon Banerjee ◽

Daniel L. Rubin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automated Detection ◽

Processing Algorithm ◽

Radiology Reports ◽

Natural Language Processing Algorithm

Download Full-text

279 – Validation of a Natural Language Processing Algorithm to Identify Colonic Adenomas Across a Health System

Gastroenterology ◽

10.1016/s0016-5085(19)36923-9 ◽

2019 ◽

Vol 156 (6) ◽

pp. S-56

Author(s):

David G. Morgan ◽

Kathy Chorneyko ◽

Deepak Swain ◽

Barbara Bowes ◽

Vicki Lee ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Health System ◽

Language Processing ◽

Processing Algorithm ◽

Colonic Adenomas ◽

Natural Language Processing Algorithm

Download Full-text

SentiMental: An emotional profiling algorithm for identifying affect patterns in text

10.31219/osf.io/cun5x ◽

2018 ◽

Author(s):

Massimo Stella

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Algorithm ◽

Copyright Holder ◽

The Novel ◽

Novel Approach ◽

Technical Report ◽

Potential Applications ◽

Natural Language Processing Algorithm

This technical report outlines the mechanisms and potential applications of SentiMental, a suite of natural language processing algorithm designed and implemented by Massimo Stella, Complex Science Consulting. The following technical report briefly outlines the novel approach of SentiMental in performing sentiment and emotional analysis by directly harnessing the whole structure of the mental lexicon rather than by using affect norms. Furthermore, this technical report outlines the direct emotional profiling and the visualisations currently implemented in version 0.1 of SentiMental. Features under development and current limitations are also outlined and discussed.This technical report is not meant as a publication. The author holds full copyright and any reproduction of parts of this report must be authorised by the copyright holder. SentiMental represents a work in progress, so do not hesitate to get in touch with the author for any potential feedback.

Download Full-text

Development of a Natural Language Processing Algorithm for the Classification of Suspicious Liver Lesions from Radiology Reports

10.1101/2021.12.15.21267875 ◽

2021 ◽

Author(s):

Jacob Johnson ◽

Kaneel Senevirathne ◽

Lawrence Ngo

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Algorithm ◽

Liver Lesions ◽

Radiology Reports ◽

Natural Language Processing Algorithm

Here, we developed and validated a highly generalizable natural language processing algorithm based on deep-learning. Our algorithm was trained and tested on a highly diverse dataset from over 2,000 hospital sites and 500 radiologists. The resulting algorithm achieved an AUROC of 0.96 for the presence or absence of liver lesions while achieving a specificity of 0.99 and a sensitivity of 0.6.

Download Full-text