scholarly journals Evaluating reproducibility of AI algorithms in digital pathology with DAPPER

2018 ◽  
Author(s):  
Andrea Bizzego ◽  
Nicole Bussola ◽  
Marco Chierici ◽  
Marco Cristoforetti ◽  
Margherita Francescatto ◽  
...  

AbstractArtificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g. from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA’s MAQC project, designed to analyse causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test 3 different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with 4 datasets (5, 10, 20 or 30 classes), for a total 53000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for random features and random labels diagnostic tests to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning backbone pipeline and the HINT (Histological Imaging - Newsy Tiles) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for Digital Pathology.Author summaryIn this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. It is indeed a top priority that reproducibility-by-design gets adopted as standard practice in building and validating AI methods in the healthcare domain. Here we introduce DAPPER, a first framework to evaluate deep features and classifiers in digital pathology, based on a rigorous data analysis plan originally developed in the FDA’s MAQC initiative for predictive biomarkers from massive omics data. We apply DAPPER on models trained to identify tissue of origin from the HINT benchmark dataset of 53000 tiles from 787 Whole Slide Images in the Genotype-Tissue Expression (GTEx) project. We analyze accuracy and feature stability of different deep learning architectures (VGG, ResNet and Inception) as feature extractors and classifiers (a fully connected multilayer, SVMs and Random Forests) on up to 20 classes. Further, we use the deep features from the VGG model (trained on HINT) on the 1300 annotated tiles of the KIMIA24 dataset for identification of slide of origin (24 classes). The DAPPER software is available together with the HINT benchmark dataset.

2021 ◽  
Vol 7 (3) ◽  
pp. 51
Author(s):  
Emanuela Paladini ◽  
Edoardo Vantaggiato ◽  
Fares Bougourzi ◽  
Cosimo Distante ◽  
Abdenour Hadid ◽  
...  

In recent years, automatic tissue phenotyping has attracted increasing interest in the Digital Pathology (DP) field. For Colorectal Cancer (CRC), tissue phenotyping can diagnose the cancer and differentiate between different cancer grades. The development of Whole Slide Images (WSIs) has provided the required data for creating automatic tissue phenotyping systems. In this paper, we study different hand-crafted feature-based and deep learning methods using two popular multi-classes CRC-tissue-type databases: Kather-CRC-2016 and CRC-TP. For the hand-crafted features, we use two texture descriptors (LPQ and BSIF) and their combination. In addition, two classifiers are used (SVM and NN) to classify the texture features into distinct CRC tissue types. For the deep learning methods, we evaluate four Convolutional Neural Network (CNN) architectures (ResNet-101, ResNeXt-50, Inception-v3, and DenseNet-161). Moreover, we propose two Ensemble CNN approaches: Mean-Ensemble-CNN and NN-Ensemble-CNN. The experimental results show that the proposed approaches outperformed the hand-crafted feature-based methods, CNN architectures and the state-of-the-art methods in both databases.


2020 ◽  
Vol 12 ◽  
pp. 175883592097141
Author(s):  
Fan Zhang ◽  
Lian-Zhen Zhong ◽  
Xun Zhao ◽  
Di Dong ◽  
Ji-Jin Yao ◽  
...  

Background: To explore the prognostic value of radiomics-based and digital pathology-based imaging biomarkers from macroscopic magnetic resonance imaging (MRI) and microscopic whole-slide images for patients with nasopharyngeal carcinoma (NPC). Methods: We recruited 220 NPC patients and divided them into training ( n = 132), internal test ( n = 44), and external test ( n = 44) cohorts. The primary endpoint was failure-free survival (FFS). Radiomic features were extracted from pretreatment MRI and selected and integrated into a radiomic signature. The histopathological signature was extracted from whole-slide images of biopsy specimens using an end-to-end deep-learning method. Incorporating two signatures and independent clinical factors, a multi-scale nomogram was constructed. We also tested the correlation between the key imaging features and genetic alternations in an independent cohort of 16 patients (biological test cohort). Results: Both radiomic and histopathologic signatures presented significant associations with treatment failure in the three cohorts (C-index: 0.689–0.779, all p < 0.050). The multi-scale nomogram showed a consistent significant improvement for predicting treatment failure compared with the clinical model in the training (C-index: 0.817 versus 0.730, p < 0.050), internal test (C-index: 0.828 versus 0.602, p < 0.050) and external test (C-index: 0.834 versus 0.679, p < 0.050) cohorts. Furthermore, patients were stratified successfully into two groups with distinguishable prognosis (log-rank p < 0.0010) using our nomogram. We also found that two texture features were related to the genetic alternations of chromatin remodeling pathways in another independent cohort. Conclusion: The multi-scale imaging features showed a complementary value in prognostic prediction and may improve individualized treatment in NPC.


2020 ◽  
pp. 1-9
Author(s):  
Ewen David McAlpine ◽  
Liron Pantanowitz ◽  
Pamela M. Michelow

<b><i>Background:</i></b> The incorporation of digital pathology into routine pathology practice is becoming more widespread. Definite advantages exist with respect to the implementation of artificial intelligence (AI) and deep learning in pathology, including cytopathology. However, there are also unique challenges in this regard. <b><i>Summary:</i></b> This review discusses cytology-specific challenges, including the need to implement digital cytology prior to AI; the large file sizes and increased acquisition times for whole slide images in cytology; the routine use of multiple stains, such as Papanicolaou and Romanowsky stains; the lack of high-quality annotated datasets on which to train algorithms; and the considerable computer resources required, in terms of both computer infrastructure and skilled personnel, for computing and storage of data. Global concerns regarding AI that are certainly applicable to cytology include the need for model validation and continued quality assurance, ethical issues such as the use of patient data in developing algorithms, the need to develop regulatory frameworks regarding what type of data can be utilized and ensuring cybersecurity during data collection and storage, and algorithm development. <b><i>Key Messages:</i></b> While AI will likely play a role in cytology practice in the future, applying this technology to cytology poses a unique set of challenges. A broad understanding of digital pathology and algorithm development is desirable to guide the development of algorithms, as well as the need to be cognizant of potential pitfalls to avoid when incorporating the technology in practice.


2019 ◽  
Vol 26 (11) ◽  
pp. 1181-1188 ◽  
Author(s):  
Isabel Segura-Bedmar ◽  
Pablo Raez

Abstract Objective The goal of the 2018 n2c2 shared task on cohort selection for clinical trials (track 1) is to identify which patients meet the selection criteria for clinical trials. Cohort selection is a particularly demanding task to which natural language processing and deep learning can make a valuable contribution. Our goal is to evaluate several deep learning architectures to deal with this task. Materials and Methods Cohort selection can be formulated as a multilabeling problem whose goal is to determine which criteria are met for each patient record. We explore several deep learning architectures such as a simple convolutional neural network (CNN), a deep CNN, a recurrent neural network (RNN), and CNN-RNN hybrid architecture. Although our architectures are similar to those proposed in existing deep learning systems for text classification, our research also studies the impact of using a fully connected feedforward layer on the performance of these architectures. Results The RNN and hybrid models provide the best results, though without statistical significance. The use of the fully connected feedforward layer improves the results for all the architectures, except for the hybrid architecture. Conclusions Despite the limited size of the dataset, deep learning methods show promising results in learning useful features for the task of cohort selection. Therefore, they can be used as a previous filter for cohort selection for any clinical trial with a minimum of human intervention, thus reducing the cost and time of clinical trials significantly.


Author(s):  
Byron Smith ◽  
Meyke Hermsen ◽  
Elizabeth Lesser ◽  
Deepak Ravichandar ◽  
Walter Kremers

Abstract Deep learning has pushed the scope of digital pathology beyond simple digitization and telemedicine. The incorporation of these algorithms in routine workflow is on the horizon and maybe a disruptive technology, reducing processing time, and increasing detection of anomalies. While the newest computational methods enjoy much of the press, incorporating deep learning into standard laboratory workflow requires many more steps than simply training and testing a model. Image analysis using deep learning methods often requires substantial pre- and post-processing order to improve interpretation and prediction. Similar to any data processing pipeline, images must be prepared for modeling and the resultant predictions need further processing for interpretation. Examples include artifact detection, color normalization, image subsampling or tiling, removal of errant predictions, etc. Once processed, predictions are complicated by image file size – typically several gigabytes when unpacked. This forces images to be tiled, meaning that a series of subsamples from the whole-slide image (WSI) are used in modeling. Herein, we review many of these methods as they pertain to the analysis of biopsy slides and discuss the multitude of unique issues that are part of the analysis of very large images.


2019 ◽  
Vol 2 (1) ◽  
pp. 182-186
Author(s):  
Santosh Giri

Deep learning is one of the essential parts of machine learning. Applications such as image classification, text recognition, object detection etc. used deep learning architectures. In this paper neural network model was designed for image classification. A NN classifier with one fully connected layer and one softmax layer was designed and feature extraction part of inception v3 model was reused to calculate the feature value of each images. And by using these feature values the NN classifier was trained. By adopting transfer learning mechanism NN classifier was trained with 17 classes of oxford 17 flower image dataset. The system provided final training accuracy of 99 %. After training, system was evaluated with testing dataset images. The mean testing accuracy was 86.4%.


2021 ◽  
Author(s):  
Adyn Miles ◽  
Mahdi S. Hosseini ◽  
Sheyang Tang ◽  
Zhou Wang ◽  
Savvas Damaskinos ◽  
...  

Abstract Out-of-focus sections of whole slide images are a significant source of false positives and other systematic errors in clinical diagnoses. As a result, focus quality assessment (FQA) methods must be able to quickly and accurately differentiate between focus levels in a scan. Recently, deep learning methods using convolutional neural networks (CNNs) have been adopted for FQA. However, the biggest obstacles impeding their wide usage in clinical workflows are their generalizability across different test conditions and their potentially high computational cost. In this study, we focus on the transferability and scalability of CNN-based FQA approaches. We carry out an investigation on ten architecturally diverse networks using five datasets with stain and tissue diversity. We evaluate the computational complexity of each network and scale this to realistic applications involving hundreds of whole slide images. We assess how well each full model transfers to a separate, unseen dataset without fine-tuning. We show that shallower networks transfer well when used on small input patch sizes, while deeper networks work more effectively on larger inputs. Furthermore, we introduce neural architecture search (NAS) to the field and learn an automatically designed low-complexity CNN architecture using differentiable architecture search which achieved competitive performance relative to established CNNs.


Cancers ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 797 ◽  
Author(s):  
Hanadi El El Achi ◽  
Joseph D. Khoury

Digital Pathology is the process of converting histology glass slides to digital images using sophisticated computerized technology to facilitate acquisition, evaluation, storage, and portability of histologic information. By its nature, digitization of analog histology data renders it amenable to analysis using deep learning/artificial intelligence (DL/AI) techniques. The application of DL/AI to digital pathology data holds promise, even if the scope of use cases and regulatory framework for deploying such applications in the clinical environment remains in the early stages. Recent studies using whole-slide images and DL/AI to detect histologic abnormalities in general and cancer in particular have shown encouraging results. In this review, we focus on these emerging technologies intended for use in diagnostic hematology and the evaluation of lymphoproliferative diseases.


2021 ◽  
Author(s):  
Alef Iury S. Ferreira ◽  
Frederico S. Oliveira ◽  
Nádia F. Felipe da Silva ◽  
Anderson S. Soares

O reconhecimento de gênero a partir da fala é um problema relacionado à análise de fala humana, e possui diversas aplicações que vão desde a personalização na recomendação de produtos à ciência forense. A identificação da eficiência e custos de diferentes abordagens que lidam com esse problema é imprescindível. Este trabalho tem como foco investigar e comparar a eficiência e custos de diferentes arquiteturas de deep learning para o reconhecimento de gênero a partir da fala. Os resultados mostram que o modelo convolucional unidimensional consegue os melhores resultados. No entanto, constatou-se que o modelo fully connected apresentou resultados próximos com menor custo, tanto no uso de memória, quanto no tempo de treinamento.


2019 ◽  
Author(s):  
Mohammad Rezaei ◽  
Yanjun Li ◽  
Xiaolin Li ◽  
Chenglong Li

<b>Introduction:</b> The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results.<br><b>Objectives:</b> The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters.<br><b>Methods:</b> The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models.<br><b>Results: </b>The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model.<br><b>Conclusions:</b> The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.


Sign in / Sign up

Export Citation Format

Share Document