Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon

Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.

Download Full-text

Faculty Opinions recommendation of Detecting Evidence of Intra-abdominal Surgical Site Infections from Radiology Reports Using Natural Language Processing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733364674.793548850 ◽

2018 ◽

Author(s):

Martin Krallinger

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Surgical Site Infections ◽

Radiology Reports

Download Full-text

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

Natural Language Processing to Identify Pulmonary Nodules and Extract Nodule Characteristics from Radiology Reports

CHEST Journal ◽

10.1016/j.chest.2021.05.048 ◽

2021 ◽

Author(s):

Chengyi Zheng ◽

Brian Z. Huang ◽

Andranik A. Agazaryan ◽

Beth Creekmur ◽

Thearis Osuj ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Pulmonary Nodules ◽

Radiology Reports

Download Full-text

Natural language processing of radiology reports for the identification of patients with fracture

Archives of Osteoporosis ◽

10.1007/s11657-020-00859-5 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Nithin Kolanu ◽

A Shane Brown ◽

Amanda Beech ◽

Jacqueline R. Center ◽

Christopher P. White

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Radiology Reports

Download Full-text

Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period

Radiology ◽

10.1148/radiol.2021210043 ◽

2021 ◽

pp. 210043

Author(s):

Richard K. G. Do ◽

Kaelan Lupton ◽

Pamela I. Causa Andrieu ◽

Anisha Luthra ◽

Michio Taya ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Metastatic Disease ◽

Patients With Cancer ◽

Radiology Reports

Download Full-text

Automatic Extraction of Major Osteoporotic Fractures from Radiology Reports using Natural Language Processing

2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W) ◽

10.1109/ichi-w.2018.00021 ◽

2018 ◽

Author(s):

Yanshan Wang ◽

Saeed Mehrabi ◽

Sunghwan Sohn ◽

Elizabeth Atkinson ◽

Shreyasee Amin ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Osteoporotic Fractures ◽

Automatic Extraction ◽

Radiology Reports ◽

Major Osteoporotic Fractures

Download Full-text

Natural Language and Its Ontology

Metaphysics and Cognitive Science ◽

10.1093/oso/9780190639679.003.0009 ◽

2019 ◽

pp. 206-232

Author(s):

Friederike Moltmann

Keyword(s):

Natural Language ◽

Noun Phrases ◽

Great Range ◽

The World ◽

Ontological Categories

Natural language, it appears, reflects in part our conception of the world. Natural language displays a great range of types of referential noun phrases that seem to stand for objects of various ontological categories and types, and it also involves constructions, categories, and expressions that appear to convey ontological or notions. Natural language reflects its own ontology, an ontology that may differ from the ontology a philosopher may be willing to accept or even a nonphilosopher when thinking about what there is, and of course it may differ from the ontology of what there really is. This chapter gives a characterization of the ontology implicit in natural language and the entities it involves, situates natural language ontology within metaphysics, discusses what sorts of data may be considered reflective of the ontology of natural language, and addresses Chomsky’s dismissal of externalist semantics.

Download Full-text