Expert-level classification of gastritis by endoscopy using deep learning: a multicenter diagnostic trial

Abstract Background and study aims Endoscopy plays a crucial role in diagnosis of gastritis. Endoscopists have low accuracy in diagnosing atrophic gastritis with white-light endoscopy (WLE). High-risk factors (such as atrophic gastritis [AG]) for carcinogenesis demand early detection. Deep learning (DL)-based gastritis classification with WLE rarely has been reported. We built a system for improving the accuracy of diagnosis of AG with WLE to assist with this common gastritis diagnosis and help lessen endoscopist fatigue. Methods We collected a total of 8141 endoscopic images of common gastritis, other gastritis, and non-gastritis in 4587 cases and built a DL -based system constructed with UNet + + and Resnet-50. A system was developed to sort common gastritis images layer by layer: The first layer included non-gastritis/common gastritis/other gastritis, the second layer contained AG/non-atrophic gastritis, and the third layer included atrophy/intestinal metaplasia and erosion/hemorrhage. The convolutional neural networks were tested with three separate test sets. Results Rates of accuracy for classifying non-atrophic gastritis/AG, atrophy/intestinal metaplasia, and erosion/hemorrhage were 88.78 %, 87.40 %, and 93.67 % in internal test set, 91.23 %, 85.81 %, and 92.70 % in the external test set ,and 95.00 %, 92.86 %, and 94.74 % in the video set, respectively. The hit ratio with the segmentation model was 99.29 %. The accuracy for detection of non-gastritis/common gastritis/other gastritis was 93.6 %. Conclusions The system had decent specificity and accuracy in classification of gastritis lesions. DL has great potential in WLE gastritis classification for assisting with achieving accurate diagnoses after endoscopic procedures.

Download Full-text

Automated Detection and Classification of Desmoplastic Reaction at the Colorectal Tumour Front Using Deep Learning

Cancers ◽

10.3390/cancers13071615 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1615

Author(s):

Ines P. Nearchou ◽

Hideki Ueno ◽

Yoshiki Kajiwara ◽

Kate Lillard ◽

Satsuki Mochizuki ◽

...

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Prognostic Significance ◽

Stage Ii ◽

Full Cohort ◽

Test Set ◽

Colorectal Tumour ◽

Desmoplastic Reaction ◽

Deep Learning Algorithm

The categorisation of desmoplastic reaction (DR) present at the colorectal cancer (CRC) invasive front into mature, intermediate or immature type has been previously shown to have high prognostic significance. However, the lack of an objective and reproducible assessment methodology for the assessment of DR has been a major hurdle to its clinical translation. In this study, a deep learning algorithm was trained to automatically classify immature DR on haematoxylin and eosin digitised slides of stage II and III CRC cases (n = 41). When assessing the classifier’s performance on a test set of patient samples (n = 40), a Dice score of 0.87 for the segmentation of myxoid stroma was reported. The classifier was then applied to the full cohort of 528 stage II and III CRC cases, which was then divided into a training (n = 396) and a test set (n = 132). Automatically classed DR was shown to have superior prognostic significance over the manually classed DR in both the training and test cohorts. The findings demonstrated that deep learning algorithms could be applied to assist pathologists in the detection and classification of DR in CRC in an objective, standardised and reproducible manner.

Download Full-text

Segmentation of Cerebral Small Vessel Diseases-White Matter Hyperintensities Based on a Deep Learning System

Frontiers in Medicine ◽

10.3389/fmed.2021.681183 ◽

2021 ◽

Vol 8 ◽

Author(s):

Wei Shan ◽

Yunyun Duan ◽

Yu Zheng ◽

Zhenzhou Wu ◽

Shang Wei Chan ◽

...

Keyword(s):

Deep Learning ◽

White Matter ◽

White Matter Hyperintensities ◽

Automatic Segmentation ◽

Learning System ◽

Test Set ◽

Minor Revision ◽

Lesion Level ◽

External Test ◽

Cerebral Small Vessel Diseases

Objective: Reliable quantification of white matter hyperintensities (WHMs) resulting from cerebral small vessel diseases (CSVD) is essential for understanding their clinical impact. We aim to develop and clinically validate a deep learning system for automatic segmentation of CSVD-WMH from fluid-attenuated inversion recovery (FLAIR) imaging using large multicenter data.Method: A FLAIR imaging dataset of 1,156 patients diagnosed with CSVD associated WMH (median age, 54 years; 653 males) obtained between September 2018 and September 2019 from Beijing Tiantan Hospital was retrospectively analyzed in this study. Locations of CSVD-WMH on the FLAIR scans were manually marked by two experienced neurologists. Using the manually labeled data of 996 patients (development set), a U-shaped novel 2D convolutional neural network (CNN) architecture was trained for automatic segmentation of CSVD-WMH. The segmentation performance of the network was evaluated with per pixel and lesion level dice scores using an independent internal test set (n = 160) and a multi-center external test set (n = 90, three medical centers). The clinical suitability of the segmentation results, classified as acceptable, acceptable with minor revision, acceptable with major revision, and not acceptable, was analyzed by three independent neuroradiologists. The inter-neuroradiologists agreement rate was assessed by the Kendall-W test.Results: On the internal and external test sets, the proposed CNN architecture achieved per pixel and lesion level dice scores of 0.72 (external test set), and they were significantly better than the state-of-the-art deep learning architectures proposed for WMH segmentation. In the clinical evaluation, neuroradiologists observed the segmentation results for 95% of the patients were acceptable or acceptable with a minor revision.Conclusions: A deep learning system can be used for automated, objective, and clinically meaningful segmentation of CSVD-WMH with high accuracy.

Download Full-text

Development and evaluation of a deep learning model for the detection of multiple fundus diseases based on colour fundus photography

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2020-316290 ◽

2021 ◽

pp. bjophthalmol-2020-316290

Author(s):

Bing Li ◽

Huan Chen ◽

Bilei Zhang ◽

Mingzhen Yuan ◽

Xuemin Jin ◽

...

Keyword(s):

Deep Learning ◽

Learning System ◽

Age Related Macular Degeneration ◽

Fundus Photography ◽

Glaucomatous Optic Neuropathy ◽

Test Set ◽

Receiver Operating Characteristic Curves ◽

Age Related ◽

Test Sets ◽

External Test

AimTo explore and evaluate an appropriate deep learning system (DLS) for the detection of 12 major fundus diseases using colour fundus photography.MethodsDiagnostic performance of a DLS was tested on the detection of normal fundus and 12 major fundus diseases including referable diabetic retinopathy, pathologic myopic retinal degeneration, retinal vein occlusion, retinitis pigmentosa, retinal detachment, wet and dry age-related macular degeneration, epiretinal membrane, macula hole, possible glaucomatous optic neuropathy, papilledema and optic nerve atrophy. The DLS was developed with 56 738 images and tested with 8176 images from one internal test set and two external test sets. The comparison with human doctors was also conducted.ResultsThe area under the receiver operating characteristic curves of the DLS on the internal test set and the two external test sets were 0.950 (95% CI 0.942 to 0.957) to 0.996 (95% CI 0.994 to 0.998), 0.931 (95% CI 0.923 to 0.939) to 1.000 (95% CI 0.999 to 1.000) and 0.934 (95% CI 0.929 to 0.938) to 1.000 (95% CI 0.999 to 1.000), with sensitivities of 80.4% (95% CI 79.1% to 81.6%) to 97.3% (95% CI 96.7% to 97.8%), 64.6% (95% CI 63.0% to 66.1%) to 100% (95% CI 100% to 100%) and 68.0% (95% CI 67.1% to 68.9%) to 100% (95% CI 100% to 100%), respectively, and specificities of 89.7% (95% CI 88.8% to 90.7%) to 98.1% (95%CI 97.7% to 98.6%), 78.7% (95% CI 77.4% to 80.0%) to 99.6% (95% CI 99.4% to 99.8%) and 88.1% (95% CI 87.4% to 88.7%) to 98.7% (95% CI 98.5% to 99.0%), respectively. When compared with human doctors, the DLS obtained a higher diagnostic sensitivity but lower specificity.ConclusionThe proposed DLS is effective in diagnosing normal fundus and 12 major fundus diseases, and thus has much potential for fundus diseases screening in the real world.

Download Full-text

Evaluating the Accuracy of Breast Cancer and Molecular Subtype Diagnosis by Ultrasound Image Deep Learning Model

Frontiers in Oncology ◽

10.3389/fonc.2021.623506 ◽

2021 ◽

Vol 11 ◽

Author(s):

Xianyu Zhang ◽

Hui Li ◽

Chaoyun Wang ◽

Wen Cheng ◽

Yuntao Zhu ◽

...

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Sensitivity And Specificity ◽

Molecular Subtypes ◽

Learning Model ◽

Breast Ultrasound ◽

Ultrasound Images ◽

Test Set ◽

External Test ◽

Deep Learning Model

Background: Breast ultrasound is the first choice for breast tumor diagnosis in China, but the Breast Imaging Reporting and Data System (BI-RADS) categorization routinely used in the clinic often leads to unnecessary biopsy. Radiologists have no ability to predict molecular subtypes with important pathological information that can guide clinical treatment.Materials and Methods: This retrospective study collected breast ultrasound images from two hospitals and formed training, test and external test sets after strict selection, which included 2,822, 707, and 210 ultrasound images, respectively. An optimized deep learning model (DLM) was constructed with the training set, and the performance was verified in both the test set and the external test set. Diagnostic results were compared with the BI-RADS categorization determined by radiologists. We divided breast cancer into different molecular subtypes according to hormone receptor (HR) and human epidermal growth factor receptor 2 (HER2) expression. The ability to predict molecular subtypes using the DLM was confirmed in the test set.Results: In the test set, with pathological results as the gold standard, the accuracy, sensitivity and specificity were 85.6, 98.7, and 63.1%, respectively, according to the BI-RADS categorization. The same set achieved an accuracy, sensitivity, and specificity of 89.7, 91.3, and 86.9%, respectively, when using the DLM. For the test set, the area under the curve (AUC) was 0.96. For the external test set, the AUC was 0.90. The diagnostic accuracy was 92.86% with the DLM in BI-RADS 4a patients. Approximately 70.76% of the cases were judged as benign tumors. Unnecessary biopsy was theoretically reduced by 67.86%. However, the false negative rate was 10.4%. A good prediction effect was shown for the molecular subtypes of breast cancer with the DLM. The AUC were 0.864, 0.811, and 0.837 for the triple-negative subtype, HER2 (+) subtype and HR (+) subtype predictions, respectively.Conclusion: This study showed that the DLM was highly accurate in recognizing breast tumors from ultrasound images. Thus, the DLM can greatly reduce the incidence of unnecessary biopsy, especially for patients with BI-RADS 4a. In addition, the predictive ability of this model for molecular subtypes was satisfactory,which has specific clinical application value.

Download Full-text

Assessment of the need for separate test set and number of medical images necessary for deep learning: a sub-sampling study

10.1101/196659 ◽

2017 ◽

Cited By ~ 1

Author(s):

Ariel Rokem ◽

Yue Wu ◽

Aaron Lee

Keyword(s):

Deep Learning ◽

Cross Validation ◽

Learning Algorithms ◽

Effective Number ◽

Moderate Amount ◽

Validation Data ◽

Data Set ◽

Test Set ◽

Healthy Control

AbstractDeep learning algorithms have tremendous potential utility in the classification of biomedical images. For example, images acquired with retinal optical coherence tomography (OCT) can be used to accurately classify patients with adult macular degeneration (AMD), and distinguish them from healthy control patients. However, previous research has suggested that large amounts of data are required in order to train deep learning algorithms, because of the large number of parameters that need to be fit. Here, we show that a moderate amount of data (data from approximately 1,800 patients) may be enough to reach close-to-maximal performance in the classification of AMD patients from OCT images. These results suggest that deep learning algorithms can be trained on moderate amounts of data, provided that images are relatively homogenous, and the effective number of parameters is sufficiently small. Furthermore, we demonstrate that in this application, cross-validation with a separate test set that is not used in any part of the training does not differ substantially from cross-validation with a validation data-set used to determine the optimal stopping point for training.

Download Full-text

Diagnostic Classification of Cystoscopic Images Using Deep Convolutional Neural Networks

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00126 ◽

2018 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Okyaz Eminaga ◽

Nurettin Eminaga ◽

Axel Semjonow ◽

Bernhard Breil

Keyword(s):

Deep Learning ◽

Harmonic Series ◽

Diagnostic Classification ◽

Training Set ◽

Deep Convolutional Neural Networks ◽

Data Set ◽

Test Set ◽

Filter Size ◽

Validation Set

Purpose The recognition of cystoscopic findings remains challenging for young colleagues and depends on the examiner’s skills. Computer-aided diagnosis tools using feature extraction and deep learning show promise as instruments to perform diagnostic classification. Materials and Methods Our study considered 479 patient cases that represented 44 urologic findings. Image color was linearly normalized and was equalized by applying contrast-limited adaptive histogram equalization. Because these findings can be viewed via cystoscopy from every possible angle and side, we ultimately generated images rotated in 10-degree grades and flipped them vertically or horizontally, which resulted in 18,681 images. After image preprocessing, we developed deep convolutional neural network (CNN) models (ResNet50, VGG-19, VGG-16, InceptionV3, and Xception) and evaluated these models using F1 scores. Furthermore, we proposed two CNN concepts: 90%-previous-layer filter size and harmonic-series filter size. A training set (60%), a validation set (10%), and a test set (30%) were randomly generated from the study data set. All models were trained on the training set, validated on the validation set, and evaluated on the test set. Results The Xception-based model achieved the highest F1 score (99.52%), followed by models that were based on ResNet50 (99.48%) and the harmonic-series concept (99.45%). All images with cancer lesions were correctly determined by these models. When the focus was on the images misclassified by the model with the best performance, 7.86% of images that showed bladder stones with indwelling catheter and 1.43% of images that showed bladder diverticulum were falsely classified. Conclusion The results of this study show the potential of deep learning for the diagnostic classification of cystoscopic images. Future work will focus on integration of artificial intelligence–aided cystoscopy into clinical routines and possibly expansion to other clinical endoscopy applications.

Download Full-text

A Deep Learning Model for Classification of Endoscopic Gastroesophageal Reflux Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052428 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2428

Author(s):

Chi-Chih Wang ◽

Yu-Ching Chiu ◽

Wei-Liang Chen ◽

Tzu-Wei Yang ◽

Ming-Chang Tsai ◽

...

Keyword(s):

Deep Learning ◽

Gastroesophageal Reflux Disease ◽

Gastroesophageal Reflux ◽

Reflux Disease ◽

Automatic Classification ◽

Learning Model ◽

Test Set ◽

Proposed Model ◽

Deep Learning Model

Gastroesophageal reflux disease (GERD) is a common disease with high prevalence, and its endoscopic severity can be evaluated using the Los Angeles classification (LA grade). This paper proposes a deep learning model (i.e., GERD-VGGNet) that employs convolutional neural networks for automatic classification and interpretation of routine GERD LA grade. The proposed model employs a data augmentation technique, a two-stage no-freezing fine-tuning policy, and an early stopping criterion. As a result, the proposed model exhibits high generalizability. A dataset of images from 464 patients was used for model training and validation. An additional 32 patients served as a test set to evaluate the accuracy of both the model and our trainees. Experimental results demonstrate that the best model for the development set exhibited an overall accuracy of 99.2% (grade A–B), 100% (grade C–D), and 100% (normal group) using narrow-band image (NBI) endoscopy. On the test set, the proposed model resulted in an accuracy of 87.9%, which was significantly higher than the results of the trainees (75.0% and 65.6%). The proposed GERD-VGGNet model can assist automatic classification of GERD in conventional and NBI environments and thereby increase the accuracy of interpretation of the results by inexperienced endoscopists.

Download Full-text

Prevalence of High-Risk Groups for Gastric Carcinoma – A Biopsy Finding

Nepalese Medical Journal ◽

10.3126/nmj.v1i2.21600 ◽

2018 ◽

Vol 1 (2) ◽

pp. 82-85

Author(s):

Geetika KC ◽

Shiva Raj KC ◽

Purnima Gyawali

Keyword(s):

Risk Factors ◽

Helicobacter Pylori ◽

High Risk ◽

Gastric Carcinoma ◽

Atrophic Gastritis ◽

Intestinal Metaplasia ◽

Total Study Population ◽

Female Ratio ◽

High Risk Factors ◽

H Pylori

Introduction: Gastric carcinoma is leading cause of death world wide including Nepal. The 5 years survival rate of gastric carcinoma (25%) has drastically decreased compared to early gastric cancers (90-90%) hence implying the need for early detection. Atrophic gastritis and intestinal metaplasia are considered as major high-risk factors and is a precancerous lesion along with Helicobacter pylori. This study tries to look at the distribution of atrophy and intestinal metaplasia across age and gender and their occurrence in Helicobacter pylori positive cases.Materials and methods: It is Cross-sectional study of a retrospectively collected data at KIST medical college and GRP poly clinic private limited from April 2008 till March 2018. Total of 10,683 cases were included. The slides were stained with Hematoxilin and Eosin stain and Giemsa stain and evaluated by two pathologists. Statistical analysis was done using SPSS vs 21.Results: Total numbers of cases studied were 10,683 with male to female ratio of 1.04:1. The most common age group of the study was 18-40 years (n=6206; 58.8%). Atrophy was seen in 81 (0.8 %) cases, Intestinal metaplasia in 298 (2.8 %) cases and Helicobacter Pylori was positive in 4459 (42.2%) cases. The incidence of atrophic gastritis was more in H. pylori positive group 54 (0.5%) group where as intestinal metaplasia was more in H. pylori negative 190(1.8%) group.Conclusion: Atrophic gastritis and intestinal metaplasia, high-risk factors for gastric carcinoma, were not the common findings. Atrophic gastritis was seen in 0.8% and intestinal metaplasia was seen in 2.8% of the total study population.

Download Full-text

Deep learning predicts post-surgical recurrence of hepatocellular carcinoma from digital whole-slide images

10.1101/2020.08.22.20179952 ◽

2020 ◽

Author(s):

Rikiya Yamashita ◽

Jin Long ◽

Atif Saleem ◽

Daniel L Rubin ◽

Jeanne Shen

Keyword(s):

Deep Learning ◽

Surgical Resection ◽

Tumor Recurrence ◽

Recurrence Risk ◽

Risk Scores ◽

Test Set ◽

Surgical Recurrence ◽

Test Sets ◽

External Test ◽

Whole Slide Images

Recurrence risk stratification of patients undergoing primary surgical resection for hepatocellular carcinoma (HCC) is an area of active investigation, and several staging systems have been proposed to optimize treatment strategies. However, as many as 70% of patients still have tumor recurrence at 5 years post-surgery. Routine hematoxylin and eosin (H&E)-stained histopathology slides may contain morphologic features associated with tumor recurrence. In this study, we developed and independently validated a deep learning-based system (HCC-SurvNet) that provides risk scores for disease recurrence after primary surgical resection, directly from H&E-stained digital whole-slide images of formalin-fixed, paraffin embedded liver resections. Our model achieved a concordance index of 0.724 on a held-out internal test set of 53 patients, and 0.683 on an external test set of 198 patients, exceeding the performance of standard staging using the American Joint Committee on Cancer (AJCC)/International Union against Cancer (UICC) Tumor-Node-Metastasis (TNM) classification system, on both the internal and external test cohorts (p=0.018 and 0.025, respectively). We observed statistically significant differences in the survival distributions between low- and high-risk subgroups, as stratified by the risk scores predicted by HCC-SurvNet on both the internal and external test sets (log-rank p-value: 0.0013 and <0.0001, respectively). On multivariable Cox proportional hazards analysis, the risk score was an independent risk factor for post-surgical recurrence, on both the internal (hazard ratio (HR)=7.44 (95% CI: 1.60, 34.6), p=0.0105) and external (HR=2.37 (95% CI: 1.27, 4.43), p=0.0069) test sets. Our results suggest that deep learning-based models can provide recurrence risk scores which may augment current patient stratification methods, and help refine the clinical management of patients undergoing primary surgical resection for HCC.

Download Full-text

A DEEP LEARNING FRAMEWORK FOR CLASSIFICATION OF SEVERITY IN CHRONIC OBSTRUCTIVE PULMONARY DISEASE (COPD)

10.26226/morressier.5ade45fed462b8029238e7b4 ◽

2018 ◽

Author(s):

Roger Tam

Keyword(s):

Chronic Obstructive Pulmonary Disease ◽

Deep Learning ◽

Pulmonary Disease ◽

Chronic Obstructive ◽

Obstructive Pulmonary Disease ◽

Learning Framework

Download Full-text