Weakly-supervised deep learning for ultrasound diagnosis of breast cancer

AbstractConventional deep learning (DL) algorithm requires full supervision of annotating the region of interest (ROI) that is laborious and often biased. We aimed to develop a weakly-supervised DL algorithm that diagnosis breast cancer at ultrasound without image annotation. Weakly-supervised DL algorithms were implemented with three networks (VGG16, ResNet34, and GoogLeNet) and trained using 1000 unannotated US images (500 benign and 500 malignant masses). Two sets of 200 images (100 benign and 100 malignant masses) were used for internal and external validation sets. For comparison with fully-supervised algorithms, ROI annotation was performed manually and automatically. Diagnostic performances were calculated as the area under the receiver operating characteristic curve (AUC). Using the class activation map, we determined how accurately the weakly-supervised DL algorithms localized the breast masses. For internal validation sets, the weakly-supervised DL algorithms achieved excellent diagnostic performances, with AUC values of 0.92–0.96, which were not statistically different (all Ps > 0.05) from those of fully-supervised DL algorithms with either manual or automated ROI annotation (AUC, 0.92–0.96). For external validation sets, the weakly-supervised DL algorithms achieved AUC values of 0.86–0.90, which were not statistically different (Ps > 0.05) or higher (P = 0.04, VGG16 with automated ROI annotation) from those of fully-supervised DL algorithms (AUC, 0.84–0.92). In internal and external validation sets, weakly-supervised algorithms could localize 100% of malignant masses, except for ResNet34 (98%). The weakly-supervised DL algorithms developed in the present study were feasible for US diagnosis of breast cancer with well-performing localization and differential diagnosis.

Download Full-text

Deep Learning-Based Breast Cancer Diagnosis at Ultrasound: Initial Application of Weakly-Supervised Algorithm Without Image Annotation Original Research

10.21203/rs.3.rs-579221/v1 ◽

2021 ◽

Author(s):

Jaeil Kim ◽

Hye Jung Kim ◽

Chanho Kim ◽

Jin Hwa Lee ◽

Keum Won Kim ◽

...

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Image Annotation ◽

Characteristic Curve ◽

External Validation ◽

Region Of Interest ◽

Breast Cancer Diagnosis ◽

Original Research ◽

Internal Validation ◽

Weakly Supervised

Abstract Conventional deep learning (DL) algorithm requires full supervision of annotating the region of interest (ROI) that is laborious and often biased. We aimed to develop a weakly-supervised DL algorithm that diagnosis breast cancer at ultrasound without image annotation. Weakly-supervised DL algorithms were implemented with three networks (VGG16, ResNet34, and GoogLeNet) and trained using 1000 unannotated US images (500 benign and 500 malignant masses). Two sets of 200 images (100 benign and 100 malignant masses) were used for internal and external validation sets. For comparison with fully-supervised algorithms, ROI annotation was performed manually and automatically. Diagnostic performances were calculated as the area under the receiver operating characteristic curve (AUC). Using the class activation map, we determined how accurately the weakly-supervised DL algorithms localized the breast masses. For internal validation sets, the weakly-supervised DL algorithms achieved excellent diagnostic performances, with AUC values of 0.92–0.96, which were not statistically different (all Ps > 0.05) from those of fully-supervised DL algorithms with either manual or automated ROI annotation (AUC, 0.92–0.96). For external validation sets, the weakly-supervised DL algorithms achieved AUC values of 0.86–0.90, which were not statistically different (Ps > 0.05) or higher (P = 0.04, VGG16 with automated ROI annotation) from those of fully-supervised DL algorithms (AUC, 0.84–0.92). In internal and external validation sets, weakly-supervised algorithms could localize 100% of malignant masses, except for ResNet34 (98%). The weakly-supervised DL algorithms developed in the present study were feasible for US diagnosis of breast cancer with well-performing localization and differential diagnosis.

Download Full-text

Development and Verification of a Deep Learning Algorithm to Evaluate Small-Bowel Preparation Quality

Diagnostics ◽

10.3390/diagnostics11061127 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1127

Author(s):

Ji Hyung Nam ◽

Dong Jun Oh ◽

Sumin Lee ◽

Hyun Joo Song ◽

Yun Jeong Lim

Keyword(s):

Deep Learning ◽

Small Bowel ◽

Scoring System ◽

Operating Characteristic ◽

Clinical Evidence ◽

Learning Algorithm ◽

Characteristic Curve ◽

External Validation ◽

Test Results ◽

Deep Learning Algorithm

Capsule endoscopy (CE) quality control requires an objective scoring system to evaluate the preparation of the small bowel (SB). We propose a deep learning algorithm to calculate SB cleansing scores and verify the algorithm’s performance. A 5-point scoring system based on clarity of mucosal visualization was used to develop the deep learning algorithm (400,000 frames; 280,000 for training and 120,000 for testing). External validation was performed using additional CE cases (n = 50), and average cleansing scores (1.0 to 5.0) calculated using the algorithm were compared to clinical grades (A to C) assigned by clinicians. Test results obtained using 120,000 frames exhibited 93% accuracy. The separate CE case exhibited substantial agreement between the deep learning algorithm scores and clinicians’ assessments (Cohen’s kappa: 0.672). In the external validation, the cleansing score decreased with worsening clinical grade (scores of 3.9, 3.2, and 2.5 for grades A, B, and C, respectively, p < 0.001). Receiver operating characteristic curve analysis revealed that a cleansing score cut-off of 2.95 indicated clinically adequate preparation. This algorithm provides an objective and automated cleansing score for evaluating SB preparation for CE. The results of this study will serve as clinical evidence supporting the practical use of deep learning algorithms for evaluating SB preparation quality.

Download Full-text

Automated Breast Cancer Detection in Digital Mammograms of Various Densities via Deep Learning

Journal of Personalized Medicine ◽

10.3390/jpm10040211 ◽

2020 ◽

Vol 10 (4) ◽

pp. 211 ◽

Cited By ~ 1

Author(s):

Yong Joon Suh ◽

Jaewon Jung ◽

Bum-Joo Cho

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Operating Characteristic ◽

Meta Analysis ◽

Characteristic Curve ◽

Malignant Lesion ◽

Model Performance ◽

Mean Values ◽

The Mean ◽

Deep Learning Model

Mammography plays an important role in screening breast cancer among females, and artificial intelligence has enabled the automated detection of diseases on medical images. This study aimed to develop a deep learning model detecting breast cancer in digital mammograms of various densities and to evaluate the model performance compared to previous studies. From 1501 subjects who underwent digital mammography between February 2007 and May 2015, craniocaudal and mediolateral view mammograms were included and concatenated for each breast, ultimately producing 3002 merged images. Two convolutional neural networks were trained to detect any malignant lesion on the merged images. The performances were tested using 301 merged images from 284 subjects and compared to a meta-analysis including 12 previous deep learning studies. The mean area under the receiver-operating characteristic curve (AUC) for detecting breast cancer in each merged mammogram was 0.952 ± 0.005 by DenseNet-169 and 0.954 ± 0.020 by EfficientNet-B5, respectively. The performance for malignancy detection decreased as breast density increased (density A, mean AUC = 0.984 vs. density D, mean AUC = 0.902 by DenseNet-169). When patients’ age was used as a covariate for malignancy detection, the performance showed little change (mean AUC, 0.953 ± 0.005). The mean sensitivity and specificity of the DenseNet-169 (87 and 88%, respectively) surpassed the mean values (81 and 82%, respectively) obtained in a meta-analysis. Deep learning would work efficiently in screening breast cancer in digital mammograms of various densities, which could be maximized in breasts with lower parenchyma density.

Download Full-text

Deep Learning Model for Screening Sepsis Using Electrocardiography

10.21203/rs.3.rs-186976/v1 ◽

2021 ◽

Author(s):

Joon-myoung Kwon ◽

Ye Rang Lee ◽

Min-Seung Jung ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

...

Keyword(s):

Septic Shock ◽

Deep Learning ◽

Characteristic Curve ◽

External Validation ◽

Medical Emergency ◽

Validation Dataset ◽

Internal Validation ◽

Significant Difference ◽

Life Threatening ◽

Sepsis And Septic Shock

Abstract Background: Sepsis is a life-threatening organ dysfunction and is a major healthcare burden worldwide. Although sepsis is a medical emergency that requires immediate management, it is difficult to screen the occurrence of sepsis. In this study, we propose an artificial intelligence based on deep learning-based model (DLM) for screening sepsis using electrocardiography (ECG).Methods: This retrospective cohort study included 46,017 patients who admitted to two hospitals. 1,548 and 639 patients underwent sepsis and septic shock. The DLM was developed using 73,727 ECGs of 18,142 patients and internal validation was conducted using 7,774 ECGs of 7,774 patients. Furthermore, we conducted an external validation with 20,101 ECGs of 20,101 patients from another hospital to verify the applicability of the DLM across centers.Results: During the internal and external validation, the area under the receiver operating characteristic curve (AUC) of an DLM using 12-lead ECG for screening sepsis were 0.901 (95% confidence interval 0.882–0.920) and 0.863 (0.846–0.879), respectively. During internal and external validation, AUC of an DLM for detecting septic shock were 0.906 (95% CI = 0.877–0.936) and 0.899 (95% CI = 0.872–0.925), respectively. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs were 0.845–0.882. A sensitivity map showed that the QRS complex and T wave was associated with sepsis. Subgroup analysis was conducted using ECGs from 4,609 patients who admitted with infectious disease, The AUC of the DLM for predicting in-hospital mortality was 0.817 (0.793–0.840). There was a significant difference in the prediction score of DLM using ECG according to the presence of infection in the validation dataset (0.277 vs 0.574, p<0.001), including severe acute respiratory syndrome coronavirus 2 (0.260 vs 0.725, p=0.018).Conclusions: The DLM demonstrated reasonable performance for screening sepsis using 12-, 6-, and single-lead ECG. The results suggest that sepsis can be screened using not only conventional ECG devices, but also diverse life-type ECG machine employing the DLM, thereby preventing irreversible disease progression and mortality.

Download Full-text

A new deep learning algorithm of 12-lead electrocardiogram for identifying atrial fibrillation during sinus rhythm

Scientific Reports ◽

10.1038/s41598-021-92172-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yong-Soo Baek ◽

Sang-Chul Lee ◽

Wonik Choi ◽

Dae-Hyeok Kim

Keyword(s):

Atrial Fibrillation ◽

Deep Learning ◽

Sinus Rhythm ◽

Learning Algorithm ◽

Normal Sinus Rhythm ◽

Characteristic Curve ◽

External Validation ◽

Digital Data ◽

Internal Validation ◽

Optimal Interval

AbstractAtrial fibrillation (AF) is the most prevalent arrhythmia and is associated with increased morbidity and mortality. Its early detection is challenging because of the low detection yield of conventional methods. We aimed to develop a deep learning-based algorithm to identify AF during normal sinus rhythm (NSR) using 12-lead electrocardiogram (ECG) findings. We developed a new deep neural network to detect subtle differences in paroxysmal AF (PAF) during NSR using digital data from standard 12-lead ECGs. Raw digital data of 2,412 12-lead ECGs were analyzed. The artificial intelligence (AI) model showed that the optimal interval to detect subtle changes in PAF was within 0.24 s before the QRS complex in the 12-lead ECG. We allocated the enrolled ECGs to the training, internal validation, and testing datasets in a 7:1:2 ratio. Regarding AF identification, the AI-based algorithm showed the following values in the internal and external validation datasets: area under the receiver operating characteristic curve, 0.79 and 0.75; recall, 82% and 77%; specificity, 78% and 72%; F1 score, 75% and 74%; and overall accuracy, 72.8% and 71.2%, respectively. The deep learning-based algorithm using 12-lead ECG demonstrated high accuracy for detecting AF during NSR.

Download Full-text

Automated differentiation of malignant and benign primary solid liver lesions on MRI: an externally validated radiomics model

10.1101/2021.08.10.21261827 ◽

2021 ◽

Author(s):

Martijn P.A. Starmans ◽

Razvan L. Miclea ◽

Valerie Vilgrain ◽

Maxime Ronot ◽

Yvonne Purcell ◽

...

Keyword(s):

Operating Characteristic ◽

Cross Validation ◽

Characteristic Curve ◽

External Validation ◽

Hepatocellular Adenoma ◽

Learning Approaches ◽

Liver Lesions ◽

Internal Validation ◽

Nodular Hyperplasia ◽

Magnetic Resonance Imaging Mri

Background & Aims: Distinguishing malignant from benign primary solid liver lesions is highly important for treatment planning. However, diagnosis on radiological imaging is challenging. In this study, we developed a radiomics model based on magnetic resonance imaging (MRI) to distinguish the most common malignant and benign primary solid liver lesions, and externally validated the model in two centers. Approach & Results: Datasets were retrospectively collected from three tertiary referral centers (A, B and C) including data from affiliated hospitals sent for revision. Patients with malignant (hepatocellular carcinoma and intrahepatic cholangiocarcinoma) and benign (hepatocellular adenoma and focal nodular hyperplasia) lesions were included. For each patient, only a T2-weighted MRI was included. A radiomics model was developed on dataset A using a combination of machine learning approaches, and internally evaluated on dataset A through cross-validation. Next, the model was externally validated on datasets B and C, and compared to scoring by two experienced abdominal radiologists on dataset C. In the resulting dataset, in total, 486 patients were included (A: 187, B: 98 and C: 201). Despite substantial MRI acquisition heterogeneity, the radiomics model developed on dataset A had a mean area under the receiver operating characteristic curve (AUC) of 0.78 in the internal validation on dataset A, and a similar AUC in the external validations (B: 0.74, C: 0.76). In dataset C, the two radiologists showed moderate agreement (Cohens κ: 0.61) and achieved AUCs of 0.86 and 0.82, respectively. Conclusions: Our radiomics model using T2-weighted MRI only can non-invasively distinguish malignant from benign primary solid liver lesions. External validation indicated that our model is generalizable despite substantial differences in the acquisition protocols.

Download Full-text

Predicting Sex from Retinal Fundus Photographs Using Automated Deep Learning

10.21203/rs.3.rs-402433/v1 ◽

2021 ◽

Author(s):

Edward Korot ◽

Nikolas Pontikos ◽

Xiaoxuan Liu ◽

Siegfried K Wagner ◽

Livia Faes ◽

...

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

External Validation ◽

Model Development ◽

Model Performance ◽

Validation Dataset ◽

Internal Validation ◽

Fundus Photographs ◽

The Uk ◽

Retinal Fundus

Abstract Deep learning may transform health care, but model development has largely been dependent on availability of advanced technical expertise. Herein we present the development of a deep learning model by clinicians without coding, which predicts reported sex from retinal fundus photographs. A model was trained on 84,743 retinal fundus photos from the UK Biobank dataset. External validation was performed on 252 fundus photos from a tertiary ophthalmic referral center. For internal validation, the area under the receiver operating characteristic curve (AUROC) of the code free deep learning (CFDL) model was 0.93. Sensitivity, specificity, positive predictive value (PPV) and accuracy (ACC) were 88.8%, 83.6%, 87.3% and 86.5%, and for external validation were 83.9%, 72.2%, 78.2% and 78.6% respectively. Clinicians are currently unaware of distinct retinal feature variations between males and females, highlighting the importance of model explainability for this task. The model performed significantly worse when foveal pathology was present in the external validation dataset, ACC: 69.4%, compared to 85.4% in healthy eyes, suggesting the fovea is a salient region for model performance OR (95% CI): 0.36 (0.19, 0.70) p = 0.0022. Automated machine learning (AutoML) may enable clinician-driven automated discovery of novel insights and disease biomarkers.

Download Full-text

Deep-learning model for screening sepsis using electrocardiography

Scandinavian Journal of Trauma Resuscitation and Emergency Medicine ◽

10.1186/s13049-021-00953-8 ◽

2021 ◽

Vol 29 (1) ◽

Author(s):

Joon-myoung Kwon ◽

Ye Rang Lee ◽

Min-Seung Jung ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

...

Keyword(s):

Septic Shock ◽

Deep Learning ◽

Confidence Interval ◽

Characteristic Curve ◽

External Validation ◽

Medical Emergency ◽

Validation Dataset ◽

Internal Validation ◽

Significant Difference ◽

Sepsis And Septic Shock

Abstract Background Sepsis is a life-threatening organ dysfunction and a major healthcare burden worldwide. Although sepsis is a medical emergency that requires immediate management, screening for the occurrence of sepsis is difficult. Herein, we propose a deep learning-based model (DLM) for screening sepsis using electrocardiography (ECG). Methods This retrospective cohort study included 46,017 patients who were admitted to two hospitals. A total of 1,548 and 639 patients had sepsis and septic shock, respectively. The DLM was developed using 73,727 ECGs from 18,142 patients, and internal validation was conducted using 7774 ECGs from 7,774 patients. Furthermore, we conducted an external validation with 20,101 ECGs from 20,101 patients from another hospital to verify the applicability of the DLM across centers. Results During the internal and external validations, the area under the receiver operating characteristic curve (AUC) of the DLM using 12-lead ECG was 0.901 (95% confidence interval, 0.882–0.920) and 0.863 (0.846–0.879), respectively, for screening sepsis and 0.906 (95% confidence interval (CI), 0.877–0.936) and 0.899 (95% CI, 0.872–0.925), respectively, for detecting septic shock. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs was 0.845–0.882. A sensitivity map revealed that the QRS complex and T waves were associated with sepsis. Subgroup analysis was conducted using ECGs from 4,609 patients who were admitted with an infectious disease, and the AUC of the DLM for predicting in-hospital mortality was 0.817 (0.793–0.840). There was a significant difference in the prediction score of DLM using ECG according to the presence of infection in the validation dataset (0.277 vs. 0.574, p < 0.001), including severe acute respiratory syndrome coronavirus 2 (0.260 vs. 0.725, p = 0.018). Conclusions The DLM delivered reasonable performance for sepsis screening using 12-, 6-, and single-lead ECGs. The results suggest that sepsis can be screened using not only conventional ECG devices but also diverse life-type ECG machines employing the DLM, thereby preventing irreversible disease progression and mortality.

Download Full-text

Deep Learning Model for Detection of Hypoalbuminemia Using Electrocardiography

10.20944/preprints202101.0408.v1 ◽

2021 ◽

Author(s):

Joon-myoung Kwon ◽

Soo Youn Lee ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

Min-Seung Jung ◽

...

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

External Validation ◽

Patient Data ◽

Albumin Concentration ◽

Historical Cohort ◽

Internal Validation ◽

Patient Deterioration ◽

Risk Patients ◽

Deep Learning Model

Background: Albumin has a pivotal role in the homeostasis of osmotic pressure and is associated with cardiovascular, nephrotic, hepatic, and nutritional diseases. The detection of hypoalbuminemia is a cornerstone for diagnosis of hidden diseases and patient deterioration. We developed and validated a deep-learning-based model (DLM) for detection of hypoalbuminemia using electrocardiography (ECG). Methods: This historical cohort study included data from consecutive patients from two hospitals. The patient data in one hospital were divided into development (82,499 ECGs from 54,248 patients) and internal validation (20,664 ECGs from 20,664 patients) datasets, whereas the patient data in the other hospital were included in only an external validation (37,421 ECGs from 37,421 patients) dataset. An DLM was developed using a 12-lead ECG signal, age, and sex from the development dataset. The endpoint was hypoalbuminemia, defined by serum albumin concentration below 3.5 g/dL. Results: During the internal and external validations, the areas under the receiver operating characteristic curve of the DLM for the detection of hypoalbuminemia were 0.887 (0.877–0.897) and 0.888 (0.880–0.896), respectively. Among the 27,400 individuals without hypoalbuminemia at the initial laboratory exam, those identified by the DLM as higher-risk patients had a significantly larger change in developing hypoalbuminemia than those in the low-risk group (7.09% vs. 1.01%, p < 0.001) during 24 months. The sensitivity map showed that the DLM focused on the T wave and QRS complex for the detection of hypoalbuminemia. Conclusions: The DLM exhibited a high accuracy for hypoalbuminemia detection and prediction using 12-, 6-, and single-lead ECGs.

Download Full-text

Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images

Nature Communications ◽

10.1038/s41467-021-21466-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Wenying Zhou ◽

Yang Yang ◽

Cheng Yu ◽

Juxian Liu ◽

Xingxing Duan ◽

...

Keyword(s):

Deep Learning ◽

Biliary Atresia ◽

Operating Characteristic ◽

Characteristic Curve ◽

External Validation ◽

Learning Model ◽

Video Sequences ◽

Validation Dataset ◽

Patient Level ◽

Deep Learning Model

AbstractIt is still challenging to make accurate diagnosis of biliary atresia (BA) with sonographic gallbladder images particularly in rural area without relevant expertise. To help diagnose BA based on sonographic gallbladder images, an ensembled deep learning model is developed. The model yields a patient-level sensitivity 93.1% and specificity 93.9% [with areas under the receiver operating characteristic curve of 0.956 (95% confidence interval: 0.928-0.977)] on the multi-center external validation dataset, superior to that of human experts. With the help of the model, the performances of human experts with various levels are improved. Moreover, the diagnosis based on smartphone photos of sonographic gallbladder images through a smartphone app and based on video sequences by the model still yields expert-level performances. The ensembled deep learning model in this study provides a solution to help radiologists improve the diagnosis of BA in various clinical application scenarios, particularly in rural and undeveloped regions with limited expertise.

Download Full-text