scholarly journals Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Yaping Zhang ◽  
Mingqian Liu ◽  
Shundong Hu ◽  
Yao Shen ◽  
Jun Lan ◽  
...  

Abstract Background Artificial intelligence can assist in interpreting chest X-ray radiography (CXR) data, but large datasets require efficient image annotation. The purpose of this study is to extract CXR labels from diagnostic reports based on natural language processing, train convolutional neural networks (CNNs), and evaluate the classification performance of CNN using CXR data from multiple centers Methods We collected the CXR images and corresponding radiology reports of 74,082 subjects as the training dataset. The linguistic entities and relationships from unstructured radiology reports were extracted by the bidirectional encoder representations from transformers (BERT) model, and a knowledge graph was constructed to represent the association between image labels of abnormal signs and the report text of CXR. Then, a 25-label classification system were built to train and test the CNN models with weakly supervised labeling. Results In three external test cohorts of 5,996 symptomatic patients, 2,130 screening examinees, and 1,804 community clinic patients, the mean AUC of identifying 25 abnormal signs by CNN reaches 0.866 ± 0.110, 0.891 ± 0.147, and 0.796 ± 0.157, respectively. In symptomatic patients, CNN shows no significant difference with local radiologists in identifying 21 signs (p > 0.05), but is poorer for 4 signs (p < 0.05). In screening examinees, CNN shows no significant difference for 17 signs (p > 0.05), but is poorer at classifying nodules (p = 0.013). In community clinic patients, CNN shows no significant difference for 12 signs (p > 0.05), but performs better for 6 signs (p < 0.001). Conclusion We construct and validate an effective CXR interpretation system based on natural language processing.

CHEST Journal ◽  
2021 ◽  
Author(s):  
Chengyi Zheng ◽  
Brian Z. Huang ◽  
Andranik A. Agazaryan ◽  
Beth Creekmur ◽  
Thearis Osuj ◽  
...  

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Nithin Kolanu ◽  
A Shane Brown ◽  
Amanda Beech ◽  
Jacqueline R. Center ◽  
Christopher P. White

2020 ◽  
Vol 33 (5) ◽  
pp. 1194-1201
Author(s):  
Andrew L. Callen ◽  
Sara M. Dupont ◽  
Adi Price ◽  
Ben Laguna ◽  
David McCoy ◽  
...  

2020 ◽  
pp. 016555152096278
Author(s):  
Rouzbeh Ghasemi ◽  
Seyed Arad Ashrafi Asli ◽  
Saeedeh Momtazi

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.


2008 ◽  
Vol 5 (3) ◽  
pp. 197-204 ◽  
Author(s):  
Pragya A. Dang ◽  
Mannudeep K. Kalra ◽  
Michael A. Blake ◽  
Thomas J. Schultz ◽  
Markus Stout ◽  
...  

2021 ◽  
Author(s):  
Babak Afshin-Pour ◽  
Michael Qiu ◽  
Shahrzad Hosseini ◽  
Molly Stewart ◽  
Jan Horsky ◽  
...  

ABSTRACTDespite the high morbidity and mortality associated with Acute Respiratory Distress Syndrome (ARDS), discrimination of ARDS from other causes of acute respiratory failure remains challenging, particularly in the first 24 hours of mechanical ventilation. Delay in ARDS identification prevents lung protective strategies from being initiated and delays clinical trial enrolment and quality improvement interventions. Medical records from 1,263 ICU-admitted, mechanically ventilated patients at Northwell Health were retrospectively examined by a clinical team who assigned each patient a diagnosis of “ARDS” or “non-ARDS” (e.g., pulmonary edema). We then applied an iterative pre-processing and machine learning framework to construct a model that would discriminate ARDS versus non-ARDS, and examined features informative in the patient classification process. Data made available to the model included patient demographics, laboratory test results from before the initiation of mechanical ventilation, and features extracted by natural language processing of radiology reports. The resulting model discriminated well between ARDS and non-ARDS causes of respiratory failure (AUC=0.85, 89% precision at 20% recall), and highlighted features unique among ARDS patients, and among and the subset of ARDS patients who would not recover. Importantly, models built using both clinical notes and laboratory test results out-performed models built using either data source alone, akin to the retrospective clinician-based diagnostic process. This work demonstrates the feasibility of using readily available EHR data to discriminate ARDS patients prospectively in a real-world setting at a critical time in their care and highlights novel patient characteristics indicative of ARDS.


Sign in / Sign up

Export Citation Format

Share Document