scholarly journals Explainability of Convolutional Neural Networks for Dermatological Diagnosis (Preprint)

2021 ◽  
Author(s):  
Raluca Jalaboi ◽  
Mauricio Orbes Arteaga ◽  
Dan Richter Jørgensen ◽  
Ionela Manole ◽  
Oana Ionescu Bozdog ◽  
...  

BACKGROUND Convolutional neural networks (CNNs) are regarded as state-of-the-art artificial intelligence (AI) tools for dermatological diagnosis, and they have been shown to achieve expert-level performance when trained on a representative dataset. CNN explainability is a key factor to adopting such techniques in practice and can be achieved using attention maps of the network. However, evaluation of CNN explainability has been limited to visual assessment and remains qualitative, subjective, and time consuming. OBJECTIVE This study aimed to provide a framework for an objective quantitative assessment of the explainability of CNNs for dermatological diagnosis benchmarks. METHODS We sourced 566 images available under the Creative Commons license from two public datasets—DermNet NZ and SD-260, with reference diagnoses of acne, actinic keratosis, psoriasis, seborrheic dermatitis, viral warts, and vitiligo. Eight dermatologists with teledermatology expertise annotated each clinical image with a diagnosis, as well as diagnosis-supporting characteristics and their localization. A total of 16 supporting visual characteristics were selected, including basic terms such as <i>macule, nodule, papule, patch, plaque, pustule,</i> and <i>scale</i>, and additional terms such as <i>closed comedo, cyst, dermatoglyphic disruption, leukotrichia, open comedo, scar, sun damage, telangiectasia</i>, and <i>thrombosed capillary</i>. The resulting dataset consisted of 525 images with three rater annotations for each. Explainability of two fine-tuned CNN models, ResNet-50 and EfficientNet-B4, was analyzed with respect to the reference explanations provided by the dermatologists. Both models were pretrained on the ImageNet natural image recognition dataset and fine-tuned using 3214 images of the six target skin conditions obtained from an internal clinical dataset. CNN explanations were obtained as activation maps of the models through gradient-weighted class-activation maps. We computed the fuzzy sensitivity and specificity of each characteristic attention map with regard to both the fuzzy gold standard characteristic attention fusion masks and the fuzzy union of all characteristics. RESULTS On average, explainability of EfficientNet-B4 was higher than that of ResNet-50 in terms of sensitivity for 13 of 16 supporting characteristics, with mean values of 0.24 (SD 0.07) and 0.16 (SD 0.05), respectively. However, explainability was lower in terms of specificity, with mean values of 0.82 (SD 0.03) and 0.90 (SD 0.00) for EfficientNet-B4 and ResNet-50, respectively. All measures were within the range of corresponding interrater metrics. CONCLUSIONS We objectively benchmarked the explainability power of dermatological diagnosis models through the use of expert-defined supporting characteristics for diagnosis.

Iproceedings ◽  
10.2196/35437 ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. e35437
Author(s):  
Raluca Jalaboi ◽  
Mauricio Orbes Arteaga ◽  
Dan Richter Jørgensen ◽  
Ionela Manole ◽  
Oana Ionescu Bozdog ◽  
...  

Background Convolutional neural networks (CNNs) are regarded as state-of-the-art artificial intelligence (AI) tools for dermatological diagnosis, and they have been shown to achieve expert-level performance when trained on a representative dataset. CNN explainability is a key factor to adopting such techniques in practice and can be achieved using attention maps of the network. However, evaluation of CNN explainability has been limited to visual assessment and remains qualitative, subjective, and time consuming. Objective This study aimed to provide a framework for an objective quantitative assessment of the explainability of CNNs for dermatological diagnosis benchmarks. Methods We sourced 566 images available under the Creative Commons license from two public datasets—DermNet NZ and SD-260, with reference diagnoses of acne, actinic keratosis, psoriasis, seborrheic dermatitis, viral warts, and vitiligo. Eight dermatologists with teledermatology expertise annotated each clinical image with a diagnosis, as well as diagnosis-supporting characteristics and their localization. A total of 16 supporting visual characteristics were selected, including basic terms such as macule, nodule, papule, patch, plaque, pustule, and scale, and additional terms such as closed comedo, cyst, dermatoglyphic disruption, leukotrichia, open comedo, scar, sun damage, telangiectasia, and thrombosed capillary. The resulting dataset consisted of 525 images with three rater annotations for each. Explainability of two fine-tuned CNN models, ResNet-50 and EfficientNet-B4, was analyzed with respect to the reference explanations provided by the dermatologists. Both models were pretrained on the ImageNet natural image recognition dataset and fine-tuned using 3214 images of the six target skin conditions obtained from an internal clinical dataset. CNN explanations were obtained as activation maps of the models through gradient-weighted class-activation maps. We computed the fuzzy sensitivity and specificity of each characteristic attention map with regard to both the fuzzy gold standard characteristic attention fusion masks and the fuzzy union of all characteristics. Results On average, explainability of EfficientNet-B4 was higher than that of ResNet-50 in terms of sensitivity for 13 of 16 supporting characteristics, with mean values of 0.24 (SD 0.07) and 0.16 (SD 0.05), respectively. However, explainability was lower in terms of specificity, with mean values of 0.82 (SD 0.03) and 0.90 (SD 0.00) for EfficientNet-B4 and ResNet-50, respectively. All measures were within the range of corresponding interrater metrics. Conclusions We objectively benchmarked the explainability power of dermatological diagnosis models through the use of expert-defined supporting characteristics for diagnosis. Acknowledgments This work was supported in part by the Danish Innovation Fund under Grant 0153-00154A. Conflict of Interest None declared.


Author(s):  
Sebastian Nowak ◽  
Narine Mesropyan ◽  
Anton Faron ◽  
Wolfgang Block ◽  
Martin Reuter ◽  
...  

Abstract Objectives To investigate the diagnostic performance of deep transfer learning (DTL) to detect liver cirrhosis from clinical MRI. Methods The dataset for this retrospective analysis consisted of 713 (343 female) patients who underwent liver MRI between 2017 and 2019. In total, 553 of these subjects had a confirmed diagnosis of liver cirrhosis, while the remainder had no history of liver disease. T2-weighted MRI slices at the level of the caudate lobe were manually exported for DTL analysis. Data were randomly split into training, validation, and test sets (70%/15%/15%). A ResNet50 convolutional neural network (CNN) pre-trained on the ImageNet archive was used for cirrhosis detection with and without upstream liver segmentation. Classification performance for detection of liver cirrhosis was compared to two radiologists with different levels of experience (4th-year resident, board-certified radiologist). Segmentation was performed using a U-Net architecture built on a pre-trained ResNet34 encoder. Differences in classification accuracy were assessed by the χ2-test. Results Dice coefficients for automatic segmentation were above 0.98 for both validation and test data. The classification accuracy of liver cirrhosis on validation (vACC) and test (tACC) data for the DTL pipeline with upstream liver segmentation (vACC = 0.99, tACC = 0.96) was significantly higher compared to the resident (vACC = 0.88, p < 0.01; tACC = 0.91, p = 0.01) and to the board-certified radiologist (vACC = 0.96, p < 0.01; tACC = 0.90, p < 0.01). Conclusion This proof-of-principle study demonstrates the potential of DTL for detecting cirrhosis based on standard T2-weighted MRI. The presented method for image-based diagnosis of liver cirrhosis demonstrated expert-level classification accuracy. Key Points • A pipeline consisting of two convolutional neural networks (CNNs) pre-trained on an extensive natural image database (ImageNet archive) enables detection of liver cirrhosis on standard T2-weighted MRI. • High classification accuracy can be achieved even without altering the pre-trained parameters of the convolutional neural networks. • Other abdominal structures apart from the liver were relevant for detection when the network was trained on unsegmented images.


2020 ◽  
Vol 10 (2) ◽  
pp. 391-400 ◽  
Author(s):  
Ying Chen ◽  
Xiaomin Qin ◽  
Jingyu Xiong ◽  
Shugong Xu ◽  
Jun Shi ◽  
...  

This study aimed to propose a deep transfer learning framework for histopathological image analysis by using convolutional neural networks (CNNs) with visualization schemes, and to evaluate its usage for automated and interpretable diagnosis of cervical cancer. First, in order to examine the potential of the transfer learning for classifying cervix histopathological images, we pre-trained three state-of-the-art CNN architectures on large-size natural image datasets and then fine-tuned them on small-size histopathological datasets. Second, we investigated the impact of three learning strategies on classification accuracy. Third, we visualized both the multiple-layer convolutional kernels of CNNs and the regions of interest so as to increase the clinical interpretability of the networks. Our method was evaluated on a database of 4993 cervical histological images (2503 benign and 2490 malignant). The experimental results demonstrated that our method achieved 95.88% sensitivity, 98.93% specificity, 97.42% accuracy, 94.81% Youden's index and 99.71% area under the receiver operating characteristic curve. Our method can reduce the cognitive burden on pathologists for cervical disease classification and improve their diagnostic efficiency and accuracy. It may be potentially used in clinical routine for histopathological diagnosis of cervical cancer.


Author(s):  
Yang Yi ◽  
Feng Ni ◽  
Yuexin Ma ◽  
Xinge Zhu ◽  
Yuankai Qi ◽  
...  

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.


2017 ◽  
Vol 2017 ◽  
pp. 1-16 ◽  
Author(s):  
Adrián Núñez-Marcos ◽  
Gorka Azkune ◽  
Ignacio Arganda-Carreras

One of the biggest challenges in modern societies is the improvement of healthy aging and the support to older persons in their daily activities. In particular, given its social and economic impact, the automatic detection of falls has attracted considerable attention in the computer vision and pattern recognition communities. Although the approaches based on wearable sensors have provided high detection rates, some of the potential users are reluctant to wear them and thus their use is not yet normalized. As a consequence, alternative approaches such as vision-based methods have emerged. We firmly believe that the irruption of the Smart Environments and the Internet of Things paradigms, together with the increasing number of cameras in our daily environment, forms an optimal context for vision-based systems. Consequently, here we propose a vision-based solution using Convolutional Neural Networks to decide if a sequence of frames contains a person falling. To model the video motion and make the system scenario independent, we use optical flow images as input to the networks followed by a novel three-step training phase. Furthermore, our method is evaluated in three public datasets achieving the state-of-the-art results in all three of them.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Zenghai Chen ◽  
Hong Fu ◽  
Wai-Lun Lo ◽  
Zheru Chi

Strabismus is one of the most common vision diseases that would cause amblyopia and even permanent vision loss. Timely diagnosis is crucial for well treating strabismus. In contrast to manual diagnosis, automatic recognition can significantly reduce labor cost and increase diagnosis efficiency. In this paper, we propose to recognize strabismus using eye-tracking data and convolutional neural networks. In particular, an eye tracker is first exploited to record a subject’s eye movements. A gaze deviation (GaDe) image is then proposed to characterize the subject’s eye-tracking data according to the accuracies of gaze points. The GaDe image is fed to a convolutional neural network (CNN) that has been trained on a large image database called ImageNet. The outputs of the full connection layers of the CNN are used as the GaDe image’s features for strabismus recognition. A dataset containing eye-tracking data of both strabismic subjects and normal subjects is established for experiments. Experimental results demonstrate that the natural image features can be well transferred to represent eye-tracking data, and strabismus can be effectively recognized by our proposed method.


10.2196/18089 ◽  
2020 ◽  
Vol 8 (8) ◽  
pp. e18089
Author(s):  
Ryoungwoo Jang ◽  
Namkug Kim ◽  
Miso Jang ◽  
Kyung Hwa Lee ◽  
Sang Min Lee ◽  
...  

Background Computer-aided diagnosis on chest x-ray images using deep learning is a widely studied modality in medicine. Many studies are based on public datasets, such as the National Institutes of Health (NIH) dataset and the Stanford CheXpert dataset. However, these datasets are preprocessed by classical natural language processing, which may cause a certain extent of label errors. Objective This study aimed to investigate the robustness of deep convolutional neural networks (CNNs) for binary classification of posteroanterior chest x-ray through random incorrect labeling. Methods We trained and validated the CNN architecture with different noise levels of labels in 3 datasets, namely, Asan Medical Center-Seoul National University Bundang Hospital (AMC-SNUBH), NIH, and CheXpert, and tested the models with each test set. Diseases of each chest x-ray in our dataset were confirmed by a thoracic radiologist using computed tomography (CT). Receiver operating characteristic (ROC) and area under the curve (AUC) were evaluated in each test. Randomly chosen chest x-rays of public datasets were evaluated by 3 physicians and 1 thoracic radiologist. Results In comparison with the public datasets of NIH and CheXpert, where AUCs did not significantly drop to 16%, the AUC of the AMC-SNUBH dataset significantly decreased from 2% label noise. Evaluation of the public datasets by 3 physicians and 1 thoracic radiologist showed an accuracy of 65%-80%. Conclusions The deep learning–based computer-aided diagnosis model is sensitive to label noise, and computer-aided diagnosis with inaccurate labels is not credible. Furthermore, open datasets such as NIH and CheXpert need to be distilled before being used for deep learning–based computer-aided diagnosis.


Sign in / Sign up

Export Citation Format

Share Document