scholarly journals Deep Learning–Based COVID-19 Pneumonia Classification Using Chest CT Images: Model Generalizability

2021 ◽  
Vol 4 ◽  
Author(s):  
Dan Nguyen ◽  
Fernando Kay ◽  
Jun Tan ◽  
Yulong Yan ◽  
Yee Seng Ng ◽  
...  

Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19–positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training datasets. This study aims to examine the severity of this problem by evaluating deep learning (DL) classification models trained to identify COVID-19–positive patients on 3D computed tomography (CT) datasets from different countries. We collected one dataset at UT Southwestern (UTSW) and three external datasets from different countries: CC-CCII Dataset (China), COVID-CTset (Iran), and MosMedData (Russia). We divided the data into two classes: COVID-19–positive and COVID-19–negative patients. We trained nine identical DL-based classification models by using combinations of datasets with a 72% train, 8% validation, and 20% test data split. The models trained on a single dataset achieved accuracy/area under the receiver operating characteristic curve (AUC) values of 0.87/0.826 (UTSW), 0.97/0.988 (CC-CCCI), and 0.86/0.873 (COVID-CTset) when evaluated on their own dataset. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better. However, the performance dropped close to an AUC of 0.5 (random guess) for all models when evaluated on a different dataset outside of its training datasets. Including MosMedData, which only contained positive labels, into the training datasets did not necessarily help the performance of other datasets. Multiple factors likely contributed to these results, such as patient demographics and differences in image acquisition or reconstruction, causing a data shift among different study cohorts.

Author(s):  
Zhang Li ◽  
Zheng Zhong ◽  
Yang Li ◽  
Tianyu Zhang ◽  
Liangxin Gao ◽  
...  

AbstractBackgroundThick-section CT scanners are more affordable for the developing countries. Considering the widely spread COVID-19, it is of great benefit to develop an automated and accurate system for quantification of COVID-19 associated lung abnormalities using thick-section chest CT images.PurposeTo develop a fully automated AI system to quantitatively assess the disease severity and disease progression using thick-section chest CT images.Materials and MethodsIn this retrospective study, a deep learning based system was developed to automatically segment and quantify the COVID-19 infected lung regions on thick-section chest CT images. 531 thick-section CT scans from 204 patients diagnosed with COVID-19 were collected from one appointed COVID-19 hospital from 23 January 2020 to 12 February 2020. The lung abnormalities were first segmented by a deep learning model. To assess the disease severity (non-severe or severe) and the progression, two imaging bio-markers were automatically computed, i.e., the portion of infection (POI) and the average infection HU (iHU). The performance of lung abnormality segmentation was examined using Dice coefficient, while the assessment of disease severity and the disease progression were evaluated using the area under the receiver operating characteristic curve (AUC) and the Cohen’s kappa statistic, respectively.ResultsDice coefficient between the segmentation of the AI system and the manual delineations of two experienced radiologists for the COVID-19 infected lung abnormalities were 0.74±0.28 and 0.76±0.29, respectively, which were close to the inter-observer agreement, i.e., 0.79±0.25. The computed two imaging bio-markers can distinguish between the severe and non-severe stages with an AUC of 0.9680 (p-value< 0.001). Very good agreement (κ = 0.8220) between the AI system and the radiologists were achieved on evaluating the changes of infection volumes.ConclusionsA deep learning based AI system built on the thick-section CT imaging can accurately quantify the COVID-19 associated lung abnormalities, assess the disease severity and its progressions.Key ResultsA deep learning based AI system was able to accurately segment the infected lung regions by COVID-19 using the thick-section CT scans (Dice coefficient ≥ 0.74).The computed imaging bio-markers were able to distinguish between the non-severe and severe COVID-19 stages (area under the receiver operating characteristic curve 0.968).The infection volume changes computed by the AI system was able to assess the COVID-19 progression (Cohen’s kappa 0.8220).Summary StatementA deep learning based AI system built on the thick-section CT imaging can accurately quantify the COVID-19 infected lung regions, assess patients disease severity and their disease progressions.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1127
Author(s):  
Ji Hyung Nam ◽  
Dong Jun Oh ◽  
Sumin Lee ◽  
Hyun Joo Song ◽  
Yun Jeong Lim

Capsule endoscopy (CE) quality control requires an objective scoring system to evaluate the preparation of the small bowel (SB). We propose a deep learning algorithm to calculate SB cleansing scores and verify the algorithm’s performance. A 5-point scoring system based on clarity of mucosal visualization was used to develop the deep learning algorithm (400,000 frames; 280,000 for training and 120,000 for testing). External validation was performed using additional CE cases (n = 50), and average cleansing scores (1.0 to 5.0) calculated using the algorithm were compared to clinical grades (A to C) assigned by clinicians. Test results obtained using 120,000 frames exhibited 93% accuracy. The separate CE case exhibited substantial agreement between the deep learning algorithm scores and clinicians’ assessments (Cohen’s kappa: 0.672). In the external validation, the cleansing score decreased with worsening clinical grade (scores of 3.9, 3.2, and 2.5 for grades A, B, and C, respectively, p < 0.001). Receiver operating characteristic curve analysis revealed that a cleansing score cut-off of 2.95 indicated clinically adequate preparation. This algorithm provides an objective and automated cleansing score for evaluating SB preparation for CE. The results of this study will serve as clinical evidence supporting the practical use of deep learning algorithms for evaluating SB preparation quality.


2020 ◽  
Vol 10 (4) ◽  
pp. 211 ◽  
Author(s):  
Yong Joon Suh ◽  
Jaewon Jung ◽  
Bum-Joo Cho

Mammography plays an important role in screening breast cancer among females, and artificial intelligence has enabled the automated detection of diseases on medical images. This study aimed to develop a deep learning model detecting breast cancer in digital mammograms of various densities and to evaluate the model performance compared to previous studies. From 1501 subjects who underwent digital mammography between February 2007 and May 2015, craniocaudal and mediolateral view mammograms were included and concatenated for each breast, ultimately producing 3002 merged images. Two convolutional neural networks were trained to detect any malignant lesion on the merged images. The performances were tested using 301 merged images from 284 subjects and compared to a meta-analysis including 12 previous deep learning studies. The mean area under the receiver-operating characteristic curve (AUC) for detecting breast cancer in each merged mammogram was 0.952 ± 0.005 by DenseNet-169 and 0.954 ± 0.020 by EfficientNet-B5, respectively. The performance for malignancy detection decreased as breast density increased (density A, mean AUC = 0.984 vs. density D, mean AUC = 0.902 by DenseNet-169). When patients’ age was used as a covariate for malignancy detection, the performance showed little change (mean AUC, 0.953 ± 0.005). The mean sensitivity and specificity of the DenseNet-169 (87 and 88%, respectively) surpassed the mean values (81 and 82%, respectively) obtained in a meta-analysis. Deep learning would work efficiently in screening breast cancer in digital mammograms of various densities, which could be maximized in breasts with lower parenchyma density.


2020 ◽  
Vol 34 (7) ◽  
pp. 717-730 ◽  
Author(s):  
Matthew C. Robinson ◽  
Robert C. Glen ◽  
Alpha A. Lee

Abstract Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.


2019 ◽  
Author(s):  
Hongyang Li ◽  
Yuanfang Guan

AbstractSleep arousals are transient periods of wakefulness punctuated into sleep. Excessive sleep arousals are associated with many negative effects including daytime sleepiness and sleep disorders. High-quality annotation of polysomnographic recordings is crucial for the diagnosis of sleep arousal disorders. Currently, sleep arousals are mainly annotated by human experts through looking at millions of data points manually, which requires considerable time and effort. Here we present a deep learning approach, DeepSleep, which ranked first in the 2018 PhysioNet Challenge for automatically segmenting sleep arousal regions based on polysomnographic recordings. DeepSleep features accurate (area under receiver operating characteristic curve of 0.93), high-resolution (5-millisecond resolution), and fast (10 seconds per sleep record) delineation of sleep arousals.


2020 ◽  
Author(s):  
Atsushi Togawa ◽  
Michinobu Yoshimura ◽  
Chiemi Tokushige ◽  
Akira Matsunaga ◽  
Tohru Takata ◽  
...  

Abstract Background Hypervirulent Klebsiella pneumoniae (HVKp) infections have distinct clinical manifestations from classical K. pneumoniae infections. The hallmark of HVKp infections are liver abscess formation and metastatic infections. Due to the severe sequelae of these complications, method to identify patients at-risk of HVKp infections should be developed. Results A retrospective cohort study of 222 patients with K. pneumoniae bloodstream infections (BSIs) was performed. Patient demographics, clinical manifestations, and bacterial characteristics were investigated. Ten cases of liver abscesses were identified. Characteristics such as community-onset BSIs, hypermucoviscosity phenotype, and capsular serotype K1 were identified as risk factors for HVKp infections. A scoring system was developed based on the risk factors. The area under the receiver operating characteristic curve for the scoring system was 0.90. A score of >2 points provided sensitivity and specificity of 0.70 and 0.94, respectively. Conclusions Simple scoring system was developed for the diagnosis of HVKp infections. The system allows early identification of patients with K. pneumoniae BSIs in whom hypervirulent infections should be evaluated. Prospective evaluation is expected.


2019 ◽  
Author(s):  
J. Kubach ◽  
A. Muehlebner-Farngruber ◽  
F. Soylemezoglu ◽  
H. Miyata ◽  
P. Niehusmann ◽  
...  

AbstractWe trained a convolutional neural network (CNN) to classify H.E. stained microscopic images of focal cortical dysplasia type IIb (FCD IIb) and cortical tuber of tuberous sclerosis complex (TSC). Both entities are distinct subtypes of human malformations of cortical development that share histopathological features consisting of neuronal dyslamination with dysmorphic neurons and balloon cells. The microscopic review of routine stainings of such surgical specimens remains challenging. A digital processing pipeline was developed for a series of 56 FCD IIb and TSC cases to obtain 4000 regions of interest and 200.000 sub-samples with different zoom and rotation angles to train a CNN. Our best performing network achieved 91% accuracy and 0.88 AUCROC (area under the receiver operating characteristic curve) on a hold-out test-set. Guided gradient-weighted class activation maps visualized morphological features used by the CNN to distinguish both entities. We then developed a web application, which combined the visualization of whole slide images (WSI) with the possibility for classification between FCD IIb and TSC on demand by our pretrained and build-in CNN classifier. This approach might help to introduce deep learning applications for the histopathologic diagnosis of rare and difficult-to-classify brain lesions.


2020 ◽  
Author(s):  
Atsushi Togawa ◽  
Michinobu Yoshimura ◽  
Chiemi Tokushige ◽  
Akira Matsunaga ◽  
Tohru Takata ◽  
...  

Abstract Background Hypervirulent Klebsiella pneumoniae (HVKp) infections have distinct clinical manifestations from classical K. pneumoniae infections. The hallmark of HVKp infections are liver abscess formation and metastatic infections. Due to the severe sequelae of these complications, method to identify patients at-risk of HVKp infections should be developed. Results A retrospective cohort study of 222 patients with K. pneumoniae bloodstream infections (BSIs) was performed. Patient demographics, clinical manifestations, and bacterial characteristics were investigated. Ten cases of liver abscesses were identified. Characteristics such as community-onset BSIs, hypermucoviscosity phenotype, and capsular serotype K1 were identified as risk factors for HVKp infections. A scoring system was developed based on the risk factors. The area under the receiver operating characteristic curve for the scoring system was 0.90. A score of ≥ 2 points provided sensitivity and specificity of 0.70 and 0.94, respectively. Conclusions Simple scoring system was developed for the diagnosis of HVKp infections. The system allows early identification of patients with K. pneumoniae BSIs in whom hypervirulent infections should be evaluated. Prospective evaluation is expected.


2020 ◽  
pp. 221-233
Author(s):  
Yijiang Chen ◽  
Andrew Janowczyk ◽  
Anant Madabhushi

PURPOSE Deep learning (DL), a class of approaches involving self-learned discriminative features, is increasingly being applied to digital pathology (DP) images for tasks such as disease identification and segmentation of tissue primitives (eg, nuclei, glands, lymphocytes). One application of DP is in telepathology, which involves digitally transmitting DP slides over the Internet for secondary diagnosis by an expert at a remote location. Unfortunately, the places benefiting most from telepathology often have poor Internet quality, resulting in prohibitive transmission times of DP images. Image compression may help, but the degree to which image compression affects performance of DL algorithms has been largely unexplored. METHODS We investigated the effects of image compression on the performance of DL strategies in the context of 3 representative use cases involving segmentation of nuclei (n = 137), segmentation of lymph node metastasis (n = 380), and lymphocyte detection (n = 100). For each use case, test images at various levels of compression (JPEG compression quality score ranging from 1-100 and JPEG2000 compression peak signal-to-noise ratio ranging from 18-100 dB) were evaluated by a DL classifier. Performance metrics including F1 score and area under the receiver operating characteristic curve were computed at the various compression levels. RESULTS Our results suggest that DP images can be compressed by 85% while still maintaining the performance of the DL algorithms at 95% of what is achievable without any compression. Interestingly, the maximum compression level sustainable by DL algorithms is similar to where pathologists also reported difficulties in providing accurate interpretations. CONCLUSION Our findings seem to suggest that in low-resource settings, DP images can be significantly compressed before transmission for DL-based telepathology applications.


Author(s):  
Yu Zhang ◽  
Cangzhi Jia ◽  
Chee Keong Kwoh

Abstract Long noncoding RNAs (lncRNAs) play significant roles in various physiological and pathological processes via their interactions with biomolecules like DNA, RNA and protein. The existing in silico methods used for predicting the functions of lncRNA mainly rely on calculating the similarity of lncRNA or investigating whether an lncRNA can interact with a specific biomolecule or disease. In this work, we explored the functions of lncRNA from a different perspective: we presented a tool for predicting the interaction biomolecule type for a given lncRNA. For this purpose, we first investigated the main molecular mechanisms of the interactions of lncRNA–RNA, lncRNA–protein and lncRNA–DNA. Then, we developed an ensemble deep learning model: lncIBTP (lncRNA Interaction Biomolecule Type Prediction). This model predicted the interactions between lncRNA and different types of biomolecules. On the 5-fold cross-validation, the lncIBTP achieves average values of 0.7042 in accuracy, 0.7903 and 0.6421 in macro-average area under receiver operating characteristic curve and precision–recall curve, respectively, which illustrates the model effectiveness. Besides, based on the analysis of the collected published data and prediction results, we hypothesized that the characteristics of lncRNAs that interacted with DNA may be different from those that interacted with only RNA.


Sign in / Sign up

Export Citation Format

Share Document