A new deep learning algorithm of 12-lead electrocardiogram for identifying atrial fibrillation during sinus rhythm

AbstractAtrial fibrillation (AF) is the most prevalent arrhythmia and is associated with increased morbidity and mortality. Its early detection is challenging because of the low detection yield of conventional methods. We aimed to develop a deep learning-based algorithm to identify AF during normal sinus rhythm (NSR) using 12-lead electrocardiogram (ECG) findings. We developed a new deep neural network to detect subtle differences in paroxysmal AF (PAF) during NSR using digital data from standard 12-lead ECGs. Raw digital data of 2,412 12-lead ECGs were analyzed. The artificial intelligence (AI) model showed that the optimal interval to detect subtle changes in PAF was within 0.24 s before the QRS complex in the 12-lead ECG. We allocated the enrolled ECGs to the training, internal validation, and testing datasets in a 7:1:2 ratio. Regarding AF identification, the AI-based algorithm showed the following values in the internal and external validation datasets: area under the receiver operating characteristic curve, 0.79 and 0.75; recall, 82% and 77%; specificity, 78% and 72%; F1 score, 75% and 74%; and overall accuracy, 72.8% and 71.2%, respectively. The deep learning-based algorithm using 12-lead ECG demonstrated high accuracy for detecting AF during NSR.

Download Full-text

Development and Verification of a Deep Learning Algorithm to Evaluate Small-Bowel Preparation Quality

Diagnostics ◽

10.3390/diagnostics11061127 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1127

Author(s):

Ji Hyung Nam ◽

Dong Jun Oh ◽

Sumin Lee ◽

Hyun Joo Song ◽

Yun Jeong Lim

Keyword(s):

Deep Learning ◽

Small Bowel ◽

Scoring System ◽

Operating Characteristic ◽

Clinical Evidence ◽

Learning Algorithm ◽

Characteristic Curve ◽

External Validation ◽

Test Results ◽

Deep Learning Algorithm

Capsule endoscopy (CE) quality control requires an objective scoring system to evaluate the preparation of the small bowel (SB). We propose a deep learning algorithm to calculate SB cleansing scores and verify the algorithm’s performance. A 5-point scoring system based on clarity of mucosal visualization was used to develop the deep learning algorithm (400,000 frames; 280,000 for training and 120,000 for testing). External validation was performed using additional CE cases (n = 50), and average cleansing scores (1.0 to 5.0) calculated using the algorithm were compared to clinical grades (A to C) assigned by clinicians. Test results obtained using 120,000 frames exhibited 93% accuracy. The separate CE case exhibited substantial agreement between the deep learning algorithm scores and clinicians’ assessments (Cohen’s kappa: 0.672). In the external validation, the cleansing score decreased with worsening clinical grade (scores of 3.9, 3.2, and 2.5 for grades A, B, and C, respectively, p < 0.001). Receiver operating characteristic curve analysis revealed that a cleansing score cut-off of 2.95 indicated clinically adequate preparation. This algorithm provides an objective and automated cleansing score for evaluating SB preparation for CE. The results of this study will serve as clinical evidence supporting the practical use of deep learning algorithms for evaluating SB preparation quality.

Download Full-text

Deep Learning Model for Screening Sepsis Using Electrocardiography

10.21203/rs.3.rs-186976/v1 ◽

2021 ◽

Author(s):

Joon-myoung Kwon ◽

Ye Rang Lee ◽

Min-Seung Jung ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

...

Keyword(s):

Septic Shock ◽

Deep Learning ◽

Characteristic Curve ◽

External Validation ◽

Medical Emergency ◽

Validation Dataset ◽

Internal Validation ◽

Significant Difference ◽

Life Threatening ◽

Sepsis And Septic Shock

Abstract Background: Sepsis is a life-threatening organ dysfunction and is a major healthcare burden worldwide. Although sepsis is a medical emergency that requires immediate management, it is difficult to screen the occurrence of sepsis. In this study, we propose an artificial intelligence based on deep learning-based model (DLM) for screening sepsis using electrocardiography (ECG).Methods: This retrospective cohort study included 46,017 patients who admitted to two hospitals. 1,548 and 639 patients underwent sepsis and septic shock. The DLM was developed using 73,727 ECGs of 18,142 patients and internal validation was conducted using 7,774 ECGs of 7,774 patients. Furthermore, we conducted an external validation with 20,101 ECGs of 20,101 patients from another hospital to verify the applicability of the DLM across centers.Results: During the internal and external validation, the area under the receiver operating characteristic curve (AUC) of an DLM using 12-lead ECG for screening sepsis were 0.901 (95% confidence interval 0.882–0.920) and 0.863 (0.846–0.879), respectively. During internal and external validation, AUC of an DLM for detecting septic shock were 0.906 (95% CI = 0.877–0.936) and 0.899 (95% CI = 0.872–0.925), respectively. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs were 0.845–0.882. A sensitivity map showed that the QRS complex and T wave was associated with sepsis. Subgroup analysis was conducted using ECGs from 4,609 patients who admitted with infectious disease, The AUC of the DLM for predicting in-hospital mortality was 0.817 (0.793–0.840). There was a significant difference in the prediction score of DLM using ECG according to the presence of infection in the validation dataset (0.277 vs 0.574, p<0.001), including severe acute respiratory syndrome coronavirus 2 (0.260 vs 0.725, p=0.018).Conclusions: The DLM demonstrated reasonable performance for screening sepsis using 12-, 6-, and single-lead ECG. The results suggest that sepsis can be screened using not only conventional ECG devices, but also diverse life-type ECG machine employing the DLM, thereby preventing irreversible disease progression and mortality.

Download Full-text

Deep Learning-Based Breast Cancer Diagnosis at Ultrasound: Initial Application of Weakly-Supervised Algorithm Without Image Annotation Original Research

10.21203/rs.3.rs-579221/v1 ◽

2021 ◽

Author(s):

Jaeil Kim ◽

Hye Jung Kim ◽

Chanho Kim ◽

Jin Hwa Lee ◽

Keum Won Kim ◽

...

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Image Annotation ◽

Characteristic Curve ◽

External Validation ◽

Region Of Interest ◽

Breast Cancer Diagnosis ◽

Original Research ◽

Internal Validation ◽

Weakly Supervised

Abstract Conventional deep learning (DL) algorithm requires full supervision of annotating the region of interest (ROI) that is laborious and often biased. We aimed to develop a weakly-supervised DL algorithm that diagnosis breast cancer at ultrasound without image annotation. Weakly-supervised DL algorithms were implemented with three networks (VGG16, ResNet34, and GoogLeNet) and trained using 1000 unannotated US images (500 benign and 500 malignant masses). Two sets of 200 images (100 benign and 100 malignant masses) were used for internal and external validation sets. For comparison with fully-supervised algorithms, ROI annotation was performed manually and automatically. Diagnostic performances were calculated as the area under the receiver operating characteristic curve (AUC). Using the class activation map, we determined how accurately the weakly-supervised DL algorithms localized the breast masses. For internal validation sets, the weakly-supervised DL algorithms achieved excellent diagnostic performances, with AUC values of 0.92–0.96, which were not statistically different (all Ps > 0.05) from those of fully-supervised DL algorithms with either manual or automated ROI annotation (AUC, 0.92–0.96). For external validation sets, the weakly-supervised DL algorithms achieved AUC values of 0.86–0.90, which were not statistically different (Ps > 0.05) or higher (P = 0.04, VGG16 with automated ROI annotation) from those of fully-supervised DL algorithms (AUC, 0.84–0.92). In internal and external validation sets, weakly-supervised algorithms could localize 100% of malignant masses, except for ResNet34 (98%). The weakly-supervised DL algorithms developed in the present study were feasible for US diagnosis of breast cancer with well-performing localization and differential diagnosis.

Download Full-text

Prediction of atrial fibrillation from normal ECG using artificial intelligence in patients with unexplained stroke

European Heart Journal ◽

10.1093/ehjci/ehaa946.0348 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

Y.S Baek ◽

S.C Lee ◽

W.I Choi ◽

D.H Kim

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Atrial Fibrillation ◽

Sinus Rhythm ◽

Normal Sinus Rhythm ◽

Statistical Significance ◽

Validation Dataset ◽

Internal Validation ◽

Normal Sinus ◽

Group 2

Abstract Background Stroke related to embolic and of undetermined source constitute 20 to 30% of ischemic strokes. Many of these strokes are related to atrial fibrillation (AF), which might be underdetected due to its paroxysmal and silent nature. Purpose The aim of our study was to predict AF during normal sinus rhythm in a standard 12-lead ECG to train an artificial intelligence to train deep neural network in patients with unexplained stroke (embolic stroke of undetermined source; ESUS). Methods We analyzed digital raw data of 12-lead ECGs using artificial intelligence (AI) recurrent neural network (RNN) to detect the electrocardiographic signature of atrial fibrillation present during normal sinus rhythm using standard 12-lead ECGs. We included 2,585 cases aged 18 years or older with multiple ECGs at our university hospital between 2005 and 2017 validated by crossover analysis of two electrophysiologists. We defined the first recorded AF ECG as the index ECG and the first day of the window of interest as 14 days before the date of the index ECG. We allocated ECGs to the training, internal validation, and testing datasets in a 7:1:2 ratio. We calculated recall, F1 score, and the area under the curve (AUC) of the receiver operatoring characteristic curve (ROC) for the internal validation dataset to select a probability threshold. We applied this developed AI program to 169 ESUS patients who has been diagnosed and had standard 12-lead ECGs in our hospital. Results We acquired 1,266 NSR ECSs from real normal subjects and 1,319 NSR ECGs form paroxysmal AF patients. RNN AI-enabled ECG identified atrial fibrillation with an AUC of 0.79, recall of 82%, specificity of 78%, F1 score of 75% and overall accuracy of 72.8% (Figure). ESUS patients were divided into three groups according to calculated probabilities of AF using AI guided RNN program: group 1 (35 patients with probability of 0–25% of paroxysmal AF), group 2 (86 patients with probability of 25–75% of paroxysmal AF) and group 3 (48 patients with probability of 75–100% of paroxysmal AF). In Kaplan-Meier estimates, Group 2 and 3 (more than 25% of PAF probabilities) tended to have higher AF incidence although it did not reach statistical significance (log-rank p 0.678) (Figure). Conclusion AI may discriminate subtle changes between real and paroxysmal NSR and can also be helpful in patients with ESUS to identify if AF is the underlying cause of the stroke. Further studies are needed in order to evaluate their possible use in future prognostic models. Funding Acknowledgement Type of funding source: None

Download Full-text

Predicting Sex from Retinal Fundus Photographs Using Automated Deep Learning

10.21203/rs.3.rs-402433/v1 ◽

2021 ◽

Author(s):

Edward Korot ◽

Nikolas Pontikos ◽

Xiaoxuan Liu ◽

Siegfried K Wagner ◽

Livia Faes ◽

...

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

External Validation ◽

Model Development ◽

Model Performance ◽

Validation Dataset ◽

Internal Validation ◽

Fundus Photographs ◽

The Uk ◽

Retinal Fundus

Abstract Deep learning may transform health care, but model development has largely been dependent on availability of advanced technical expertise. Herein we present the development of a deep learning model by clinicians without coding, which predicts reported sex from retinal fundus photographs. A model was trained on 84,743 retinal fundus photos from the UK Biobank dataset. External validation was performed on 252 fundus photos from a tertiary ophthalmic referral center. For internal validation, the area under the receiver operating characteristic curve (AUROC) of the code free deep learning (CFDL) model was 0.93. Sensitivity, specificity, positive predictive value (PPV) and accuracy (ACC) were 88.8%, 83.6%, 87.3% and 86.5%, and for external validation were 83.9%, 72.2%, 78.2% and 78.6% respectively. Clinicians are currently unaware of distinct retinal feature variations between males and females, highlighting the importance of model explainability for this task. The model performed significantly worse when foveal pathology was present in the external validation dataset, ACC: 69.4%, compared to 85.4% in healthy eyes, suggesting the fovea is a salient region for model performance OR (95% CI): 0.36 (0.19, 0.70) p = 0.0022. Automated machine learning (AutoML) may enable clinician-driven automated discovery of novel insights and disease biomarkers.

Download Full-text

5105A deep neural network predicts atrial fibrillation from normal ECGs recorded on a smartphone-enabled device

European Heart Journal ◽

10.1093/eurheartj/ehz746.0041 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Cited By ~ 1

Author(s):

C Galloway ◽

D Treiman ◽

J Shreibati ◽

M Schram ◽

Z Karbaschi ◽

...

Keyword(s):

Neural Network ◽

Atrial Fibrillation ◽

Deep Learning ◽

Sinus Rhythm ◽

Sensitivity And Specificity ◽

Normal Sinus Rhythm ◽

Learning Model ◽

Normal Sinus ◽

Normal Ecgs ◽

Deep Learning Model

Abstract Background Electrocardiographic predictors of atrial fibrillation (AF) from a non-AF ECG–such as p wave abnormalities and supraventricular ectopy–have been extensively documented. However, risk prediction tools for AF utilize little if any of the wealth of information available from the ECG. Better AF prediction from the ECG may improve efficiency of screening and performance of AF risk tools. Deep learning methods have the potential to extract an unlimited number of features from the ECG to improve prediction of AF. Purpose We hypothesize that a deep learning model can identify patterns predictive of AF during normal sinus rhythm. To test the hypothesis, we trained and tested a neural network to predict AF from normal sinus rhythm ambulatory ECG data. Methods We trained a deep convolutional neural network to detect features of AF that are present in single-lead ECGs with normal sinus rhythm, recorded using a Food and Drug Administration (FDA)-cleared, smartphone-enabled device. A cohort of 27,526 patients with at least 50 ECGs recorded between January 7, 2013, and September, 19, 2018, and the FDA-cleared automated findings of Normal and Atrial Fibrillation associated with those ECGs, were used for model development. Specifically, we trained the deep learning model on 1,984,581 Normal ECGs from 19,267 patients with 1) only Normal ECG recordings, or 2) at least 30% ECGs with AF. Of the 27,526 patients, an internal set of 8,259 patients with 841,776 Normal ECGs was saved for testing (validation). Results Among 8,259 patients in the test set, 3,467 patients had at least 30% of their ECGs with an automated finding of AF. When the deep learning model was run on 841,776 Normal ECGs, it was able to predict whether the ECG was from a patient with no AF or with 30% or more AF, with an area under the curve (AUC) of 0.80. Using an operating point with equal sensitivity and specificity, the model's sensitivity and specificity were 73.1%. Using an operating point with high specificity (90.0%), the model's sensitivity was 48.0%. When the model was applied to a randomly-selected, broader cohort of 15,000 patients (at least 50 ECGs recorded, any amount of AF), a positive, non-linear relationship between neural network output and AF burden per patient was observed (Figure). Model Output vs AF Burden Per Patient Conclusions A deep learning model was able to predict AF from ECGs in normal sinus rhythm that were recorded on a smartphone-enabled device. The use of deep learning, if prospectively validated, may facilitate AF screening in patients with paroxysmal disease or warn patients who are at high risk for developing AF. Acknowledgement/Funding AliveCor

Download Full-text

Deep-learning model for screening sepsis using electrocardiography

Scandinavian Journal of Trauma Resuscitation and Emergency Medicine ◽

10.1186/s13049-021-00953-8 ◽

2021 ◽

Vol 29 (1) ◽

Author(s):

Joon-myoung Kwon ◽

Ye Rang Lee ◽

Min-Seung Jung ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

...

Keyword(s):

Septic Shock ◽

Deep Learning ◽

Confidence Interval ◽

Characteristic Curve ◽

External Validation ◽

Medical Emergency ◽

Validation Dataset ◽

Internal Validation ◽

Significant Difference ◽

Sepsis And Septic Shock

Abstract Background Sepsis is a life-threatening organ dysfunction and a major healthcare burden worldwide. Although sepsis is a medical emergency that requires immediate management, screening for the occurrence of sepsis is difficult. Herein, we propose a deep learning-based model (DLM) for screening sepsis using electrocardiography (ECG). Methods This retrospective cohort study included 46,017 patients who were admitted to two hospitals. A total of 1,548 and 639 patients had sepsis and septic shock, respectively. The DLM was developed using 73,727 ECGs from 18,142 patients, and internal validation was conducted using 7774 ECGs from 7,774 patients. Furthermore, we conducted an external validation with 20,101 ECGs from 20,101 patients from another hospital to verify the applicability of the DLM across centers. Results During the internal and external validations, the area under the receiver operating characteristic curve (AUC) of the DLM using 12-lead ECG was 0.901 (95% confidence interval, 0.882–0.920) and 0.863 (0.846–0.879), respectively, for screening sepsis and 0.906 (95% confidence interval (CI), 0.877–0.936) and 0.899 (95% CI, 0.872–0.925), respectively, for detecting septic shock. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs was 0.845–0.882. A sensitivity map revealed that the QRS complex and T waves were associated with sepsis. Subgroup analysis was conducted using ECGs from 4,609 patients who were admitted with an infectious disease, and the AUC of the DLM for predicting in-hospital mortality was 0.817 (0.793–0.840). There was a significant difference in the prediction score of DLM using ECG according to the presence of infection in the validation dataset (0.277 vs. 0.574, p < 0.001), including severe acute respiratory syndrome coronavirus 2 (0.260 vs. 0.725, p = 0.018). Conclusions The DLM delivered reasonable performance for sepsis screening using 12-, 6-, and single-lead ECGs. The results suggest that sepsis can be screened using not only conventional ECG devices but also diverse life-type ECG machines employing the DLM, thereby preventing irreversible disease progression and mortality.

Download Full-text

Development and Prospective Validation of a Transparent Deep Learning Algorithm for Predicting Need for Mechanical Ventilation

10.1101/2020.05.30.20118109 ◽

2020 ◽

Cited By ~ 1

Author(s):

Supreeth P. Shashikumar ◽

Gabriel Wardi ◽

Paulina Paul ◽

Morgan Carlile ◽

...

Keyword(s):

Mechanical Ventilation ◽

Deep Learning ◽

Learning Algorithm ◽

Characteristic Curve ◽

External Validation ◽

Improve Patient Care ◽

Hospitalized Patients ◽

Clinical Criteria ◽

Prediction Horizon ◽

Deep Learning Algorithm

ABSTRACTIMPORTANCEObjective and early identification of hospitalized patients, and particularly those with novel coronavirus disease 2019 (COVID-19), who may require mechanical ventilation is of great importance and may aid in delivering timely treatment.OBJECTIVETo develop, externally validate and prospectively test a transparent deep learning algorithm for predicting 24 hours in advance the need for mechanical ventilation in hospitalized patients and those with COVID-19.DESIGNObservational cohort studySETTINGTwo academic medical centers from January 01, 2016 to December 31, 2019 (Retrospective cohorts) and February 10, 2020 to May 4, 2020 (Prospective cohorts).PARTICIPANTSOver 31,000 admissions to the intensive care units (ICUs) at two hospitals. Additionally, 777 patients with COVID-19 patients were used for prospective validation. Patients who were placed on mechanical ventilation within four hours of their admission were excluded.MAIN OUTCOME(S) and MEASURE(S)Electronic health record (EHR) data were extracted on an hourly basis, and a set of 40 features were calculated and passed to an interpretable deep-learning algorithm to predict the future need for mechanical ventilation 24 hours in advance. Additionally, commonly used clinical criteria (based on heart rate, oxygen saturation, respiratory rate, FiO2 and pH) was used to assess future need for mechanical ventilation. Performance of the algorithms were evaluated using the area under receiver-operating characteristic curve (AUC), sensitivity, specificity and positive predictive value.RESULTSAfter applying exclusion criteria, the external validation cohort included 3,888 general ICU and 402 COVID-19 patients. The performance of the model (AUC) with a 24-hour prediction horizon at the validation site was 0.882 for the general ICU population and 0.918 for patients with COVID-19. In comparison, commonly used clinical criteria and the ROX score achieved AUCs in the range of 0.773 – 0.782 and 0.768 – 0.810 for the general ICU population and patients with COVID-19, respectively.CONCLUSIONS and RELEVANCEA generalizable and transparent deep-learning algorithm improves on traditional clinical criteria to predict the need for mechanical ventilation in hospitalized patients, including those with COVID-19. Such an algorithm may help clinicians with optimizing timing of tracheal intubation, better allocation of mechanical ventilation resources and staff, and improve patient care.

Download Full-text

Deep Learning Model for Detection of Hypoalbuminemia Using Electrocardiography

10.20944/preprints202101.0408.v1 ◽

2021 ◽

Author(s):

Joon-myoung Kwon ◽

Soo Youn Lee ◽

Yoon-Ji Lee ◽

Yong-Yeon Jo ◽

Min-Seung Jung ◽

...

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

External Validation ◽

Patient Data ◽

Albumin Concentration ◽

Historical Cohort ◽

Internal Validation ◽

Patient Deterioration ◽

Risk Patients ◽

Deep Learning Model

Background: Albumin has a pivotal role in the homeostasis of osmotic pressure and is associated with cardiovascular, nephrotic, hepatic, and nutritional diseases. The detection of hypoalbuminemia is a cornerstone for diagnosis of hidden diseases and patient deterioration. We developed and validated a deep-learning-based model (DLM) for detection of hypoalbuminemia using electrocardiography (ECG). Methods: This historical cohort study included data from consecutive patients from two hospitals. The patient data in one hospital were divided into development (82,499 ECGs from 54,248 patients) and internal validation (20,664 ECGs from 20,664 patients) datasets, whereas the patient data in the other hospital were included in only an external validation (37,421 ECGs from 37,421 patients) dataset. An DLM was developed using a 12-lead ECG signal, age, and sex from the development dataset. The endpoint was hypoalbuminemia, defined by serum albumin concentration below 3.5 g/dL. Results: During the internal and external validations, the areas under the receiver operating characteristic curve of the DLM for the detection of hypoalbuminemia were 0.887 (0.877–0.897) and 0.888 (0.880–0.896), respectively. Among the 27,400 individuals without hypoalbuminemia at the initial laboratory exam, those identified by the DLM as higher-risk patients had a significantly larger change in developing hypoalbuminemia than those in the low-risk group (7.09% vs. 1.01%, p < 0.001) during 24 months. The sensitivity map showed that the DLM focused on the T wave and QRS complex for the detection of hypoalbuminemia. Conclusions: The DLM exhibited a high accuracy for hypoalbuminemia detection and prediction using 12-, 6-, and single-lead ECGs.

Download Full-text

Weakly-supervised deep learning for ultrasound diagnosis of breast cancer

Scientific Reports ◽

10.1038/s41598-021-03806-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jaeil Kim ◽

Hye Jung Kim ◽

Chanho Kim ◽

Jin Hwa Lee ◽

Keum Won Kim ◽

...

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Operating Characteristic ◽

Image Annotation ◽

Characteristic Curve ◽

External Validation ◽

Region Of Interest ◽

Internal Validation ◽

Breast Masses ◽

Weakly Supervised

AbstractConventional deep learning (DL) algorithm requires full supervision of annotating the region of interest (ROI) that is laborious and often biased. We aimed to develop a weakly-supervised DL algorithm that diagnosis breast cancer at ultrasound without image annotation. Weakly-supervised DL algorithms were implemented with three networks (VGG16, ResNet34, and GoogLeNet) and trained using 1000 unannotated US images (500 benign and 500 malignant masses). Two sets of 200 images (100 benign and 100 malignant masses) were used for internal and external validation sets. For comparison with fully-supervised algorithms, ROI annotation was performed manually and automatically. Diagnostic performances were calculated as the area under the receiver operating characteristic curve (AUC). Using the class activation map, we determined how accurately the weakly-supervised DL algorithms localized the breast masses. For internal validation sets, the weakly-supervised DL algorithms achieved excellent diagnostic performances, with AUC values of 0.92–0.96, which were not statistically different (all Ps > 0.05) from those of fully-supervised DL algorithms with either manual or automated ROI annotation (AUC, 0.92–0.96). For external validation sets, the weakly-supervised DL algorithms achieved AUC values of 0.86–0.90, which were not statistically different (Ps > 0.05) or higher (P = 0.04, VGG16 with automated ROI annotation) from those of fully-supervised DL algorithms (AUC, 0.84–0.92). In internal and external validation sets, weakly-supervised algorithms could localize 100% of malignant masses, except for ResNet34 (98%). The weakly-supervised DL algorithms developed in the present study were feasible for US diagnosis of breast cancer with well-performing localization and differential diagnosis.

Download Full-text