The Impact of Digital Histopathology Batch Effect on Deep Learning Model Accuracy and Bias

AbstractThe Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. This site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the digital image characteristics constituting this histologic batch effect. As an example, we show that patient ethnicity within the TCGA breast cancer cohort can be inferred from histology due to site-level batch effect, which must be accounted for to ensure equitable application of DL. Batch effect also leads to overoptimistic estimates of model performance, and we propose a quadratic programming method to guide validation that abrogates this bias.

Download Full-text

The impact of site-specific digital histology signatures on deep learning model accuracy and bias

Nature Communications ◽

10.1038/s41467-021-24698-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Frederick M. Howard ◽

James Dolezal ◽

Sara Kochanny ◽

Jefree Schulte ◽

Heather Chen ◽

...

Keyword(s):

Deep Learning ◽

Expression Patterns ◽

Model Performance ◽

Tumor Stage ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Site Specific ◽

Cancer Subtypes ◽

Quadratic Programming Method ◽

The Impact

AbstractThe Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site.

Download Full-text

Effect of Sequence Padding on the Performance of Protein-Based Deep Learning Models

10.21203/rs.2.21336/v1 ◽

2020 ◽

Author(s):

Angela Lopez-del Rio ◽

Maria Martin ◽

Alexandre Perera-Lluna ◽

Rabie Saidi

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Enzyme Commission ◽

Model Performance ◽

Enzyme Commission Number ◽

Amino Acid Sequences ◽

Learning Models ◽

Zero Padding ◽

The One ◽

The Impact

Abstract Background The use of raw amino acid sequences as input for protein-based deep learning models has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. Results We analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Our results show that padding has an effect on model performance even when there are convolutional layers implied. We propose and implement four novel types of padding the amino acid sequences. Conclusions The present study highlights the relevance of the step of padding the one-hot encoded amino acid sequences when building deep learning-based models for Enzyme Commission number prediction. The fact that this has an effect on model performance should raise awareness on the need of justifying the details of this step on future works. The code of this analysis is available at https://github.com/b2slab/padding_benchmark.

Download Full-text

A Deep Learning Approach for Rapid Mutational Screening in Melanoma

10.1101/610311 ◽

2019 ◽

Cited By ~ 10

Author(s):

Randie H. Kim ◽

Sofia Nomikou ◽

Nicolas Coudray ◽

George Jour ◽

Zarmeena Dawood ◽

...

Keyword(s):

Deep Learning ◽

Structural Characteristics ◽

Tumor Biology ◽

Model Performance ◽

Area Under The Curve ◽

The Cancer Genome Atlas ◽

Clinical Settings ◽

Mutational Screening ◽

Cancer Genome Atlas ◽

Whole Slide Images

AbstractImage-based analysis as a rapid method for mutation detection can be advantageous in research or clinical settings when tumor tissue is limited or unavailable for direct testing. Here, we applied a deep convolutional neural network (CNN) to whole slide images of melanomas from 256 patients and developed a fully automated model that first selects for tumor-rich areas (Area Under the Curve AUC=0.96) then predicts for the presence of mutated BRAF in our test set (AUC=0.72) Model performance was cross-validated on melanoma images from The Cancer Genome Atlas (AUC=0.75). We confirm that the mutated BRAF genotype is linked to phenotypic alterations at the level of the nucleus through saliency mapping and pathomics analysis, which reveal that cells with mutated BRAF exhibit larger and rounder nuclei. Not only do these findings provide additional insights on how BRAF mutations affects tumor structural characteristics, deep learning-based analysis of histopathology images have the potential to be integrated into higher order models for understanding tumor biology, developing biomarkers, and predicting clinical outcomes.

Download Full-text

Identifying cancer type specific oncogenes and tumor suppressors using limited size data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500311 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650031 ◽

Cited By ~ 4

Author(s):

Ana B. Pavel ◽

Cristian I. Vasile

Keyword(s):

Tumor Suppressors ◽

Molecular Mechanisms ◽

Lung Squamous Cell Carcinoma ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Type ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Subtypes ◽

Cancer Types

Cancer is a complex and heterogeneous genetic disease. Different mutations and dysregulated molecular mechanisms alter the pathways that lead to cell proliferation. In this paper, we explore a method which classifies genes into oncogenes (ONGs) and tumor suppressors. We optimize this method to identify specific (ONGs) and tumor suppressors for breast cancer, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and colon adenocarcinoma (COAD), using data from the cancer genome atlas (TCGA). A set of genes were previously classified as ONGs and tumor suppressors across multiple cancer types (Science 2013). Each gene was assigned an ONG score and a tumor suppressor score based on the frequency of its driver mutations across all variants from the catalogue of somatic mutations in cancer (COSMIC). We evaluate and optimize this approach within different cancer types from TCGA. We are able to determine known driver genes for each of the four cancer types. After establishing the baseline parameters for each cancer type, we identify new driver genes for each cancer type, and the molecular pathways that are highly affected by them. Our methodology is general and can be applied to different cancer subtypes to identify specific driver genes and improve personalized therapy.

Download Full-text

Generalizability of deep learning models for dental image analysis

Scientific Reports ◽

10.1038/s41598-021-85454-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Joachim Krois ◽

Anselmo Garcia Cantu ◽

Akhilanand Chaurasia ◽

Ranjitkumar Patil ◽

Prabhat Kumar Chaudhari ◽

...

Keyword(s):

Deep Learning ◽

Root Canal ◽

Model Performance ◽

Image Data ◽

Dental Status ◽

Learning Models ◽

Panoramic Radiographs ◽

Image Characteristics ◽

Pixel Value ◽

The Impact

AbstractWe assessed the generalizability of deep learning models and how to improve it. Our exemplary use-case was the detection of apical lesions on panoramic radiographs. We employed two datasets of panoramic radiographs from two centers, one in Germany (Charité, Berlin, n = 650) and one in India (KGMU, Lucknow, n = 650): First, U-Net type models were trained on images from Charité (n = 500) and assessed on test sets from Charité and KGMU (each n = 150). Second, the relevance of image characteristics was explored using pixel-value transformations, aligning the image characteristics in the datasets. Third, cross-center training effects on generalizability were evaluated by stepwise replacing Charite with KGMU images. Last, we assessed the impact of the dental status (presence of root-canal fillings or restorations). Models trained only on Charité images showed a (mean ± SD) F1-score of 54.1 ± 0.8% on Charité and 32.7 ± 0.8% on KGMU data (p < 0.001/t-test). Alignment of image data characteristics between the centers did not improve generalizability. However, by gradually increasing the fraction of KGMU images in the training set (from 0 to 100%) the F1-score on KGMU images improved (46.1 ± 0.9%) at a moderate decrease on Charité images (50.9 ± 0.9%, p < 0.01). Model performance was good on KGMU images showing root-canal fillings and/or restorations, but much lower on KGMU images without root-canal fillings and/or restorations. Our deep learning models were not generalizable across centers. Cross-center training improved generalizability. Noteworthy, the dental status, but not image characteristics were relevant. Understanding the reasons behind limits in generalizability helps to mitigate generalizability problems.

Download Full-text

Integrating multi-omics data with deep learning for predicting cancer prognosis

10.1101/807214 ◽

2019 ◽

Cited By ~ 3

Author(s):

Hua Chai ◽

Xiang Zhou ◽

Zifeng Cui ◽

Jiahua Rao ◽

Zheng Hu ◽

...

Keyword(s):

Deep Learning ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Small Sample ◽

The Cancer Genome Atlas ◽

Cancer Prognosis ◽

Omics Data ◽

Hazards Model ◽

Small Sample Sizes ◽

The Impact

AbstractMotivationAccurately predicting cancer prognosis is necessary to choose precise strategies of treatment for patients. One of effective approaches in the prediction is the integration of multi-omics data, which reduces the impact of noise within single omics data. However, integrating multi-omics data brings large number of redundant variables and relative small sample sizes. In this study, we employed Autoencoder networks to extract important features that were then input to the proportional hazards model to predict the cancer prognosis.ResultsThe method was applied to 12 common cancers from the Cancer Genome Atlas. The results show that the multi-omics averagely improves 4.1% C-index for prognosis prediction over single mRNA data, and our method outperforms previous approaches by at least 7.4%. A comparison of the contribution of single omics data show that mRNA contributes the most, followed by the DNA methylation, miRNA, and the copy number variation. In the case study for differential gene expression analysis, we identified 161 differentially expressed genes in the cervical cancer, among which 77 genes (65.8%) have been proven to be associated with cancer. In addition, we performed the cross-cancer test where the model trained on one cancer was used to predict the prognosis of another cancer, and found 23 pairs of cancers have a C-index larger than 0.5, with the largest value of 0.68. Thus, this study has provided a deep learning framework to effectively integrate multiple omics data to predict cancer prognosis.

Download Full-text

xDEEP-MSI: Explainable Bias-Rejecting Microsatellite Instability Deep Learning System in Colorectal Cancer

Biomolecules ◽

10.3390/biom11121786 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1786

Author(s):

Aurelia Bustos ◽

Artemio Payá ◽

Andrés Torrubia ◽

Rodrigo Jover ◽

Xavier Llor ◽

...

Keyword(s):

Colorectal Cancer ◽

Deep Learning ◽

Microsatellite Instability ◽

Digital Pathology ◽

Model Performance ◽

Tissue Microarrays ◽

Learning System ◽

Batch Effects ◽

Patient Level ◽

The Impact

The prediction of microsatellite instability (MSI) using deep learning (DL) techniques could have significant benefits, including reducing cost and increasing MSI testing of colorectal cancer (CRC) patients. Nonetheless, batch effects or systematic biases are not well characterized in digital histology models and lead to overoptimistic estimates of model performance. Methods to not only palliate but to directly abrogate biases are needed. We present a multiple bias rejecting DL system based on adversarial networks for the prediction of MSI in CRC from tissue microarrays (TMAs), trained and validated in 1788 patients from EPICOLON and HGUA. The system consists of an end-to-end image preprocessing module that tile samples at multiple magnifications and a tissue classification module linked to the bias-rejecting MSI predictor. We detected three biases associated with the learned representations of a baseline model: the project of origin of samples, the patient’s spot and the TMA glass where each spot was placed. The system was trained to directly avoid learning the batch effects of those variables. The learned features from the bias-ablated model achieved maximum discriminative power with respect to the task and minimal statistical mean dependence with the biases. The impact of different magnifications, types of tissues and the model performance at tile vs patient level is analyzed. The AUC at tile level, and including all three selected tissues (tumor epithelium, mucin and lymphocytic regions) and 4 magnifications, was 0.87 ± 0.03 and increased to 0.9 ± 0.03 at patient level. To the best of our knowledge, this is the first work that incorporates a multiple bias ablation technique at the DL architecture in digital pathology, and the first using TMAs for the MSI prediction task.

Download Full-text

Short-Term Daily Prediction of Sea Ice Concentration Based on Deep Learning of Gradient Loss Function

Frontiers in Marine Science ◽

10.3389/fmars.2021.736429 ◽

2021 ◽

Vol 8 ◽

Author(s):

Quanhong Liu ◽

Ren Zhang ◽

Yangjun Wang ◽

Hengqian Yan ◽

Mei Hong

Keyword(s):

Deep Learning ◽

Sea Ice ◽

Loss Function ◽

Model Performance ◽

Arctic Sea Ice ◽

The Arctic ◽

Short Term ◽

Sea Ice Concentration ◽

Ice Concentration ◽

The Impact

The navigability potential of the Northeast Passage has gradually emerged with the melting of Arctic sea ice. For the purpose of navigation safety in the Arctic area, a reliable daily sea ice concentration (SIC) prediction result is required. As the mature application of deep learning technique in short-term prediction of other fields (atmosphere, ocean, and hurricane, etc.), a new model was proposed for daily SIC prediction by selecting multiple factors, adopting gradient loss function (Grad-loss) and incorporating an improved predictive recurrent neural network (PredRNN++). Three control experiments are designed to test the impact of these three improvements for model performance with multiple indicators. Results show that the proposed model has best prediction skill in our experiments by taking physical process and local SIC variation into consideration, which can continuously predict daily SIC for up to 9 days.

Download Full-text

Impact of image compression on deep learning-based mammogram classification

Scientific Reports ◽

10.1038/s41598-021-86726-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yong-Yeon Jo ◽

Young Sang Choi ◽

Hyun Woo Park ◽

Jae Hyeok Lee ◽

Hyojung Jung ◽

...

Keyword(s):

Deep Learning ◽

Image Compression ◽

Prediction Models ◽

Characteristic Curve ◽

Model Performance ◽

Ground Truth ◽

Medical Intervention ◽

Cancer Center ◽

Compact Representation ◽

The Impact

AbstractImage compression is used in several clinical organizations to help address the overhead associated with medical imaging. These methods reduce file size by using a compact representation of the original image. This study aimed to analyze the impact of image compression on the performance of deep learning-based models in classifying mammograms as “malignant”—cases that lead to a cancer diagnosis and treatment—or “normal” and “benign,” non-malignant cases that do not require immediate medical intervention. In this retrospective study, 9111 unique mammograms–5672 normal, 1686 benign, and 1754 malignant cases were collected from the National Cancer Center in the Republic of Korea. Image compression was applied to mammograms with compression ratios (CRs) ranging from 15 to 11 K. Convolutional neural networks (CNNs) with three convolutional layers and three fully-connected layers were trained using these images to classify a mammogram as malignant or not malignant across a range of CRs using five-fold cross-validation. Models trained on images with maximum CRs of 5 K had an average area under the receiver operating characteristic curve (AUROC) of 0.87 and area under the precision-recall curve (AUPRC) of 0.75 across the five folds and compression ratios. For images compressed with CRs of 10 K and 11 K, model performance decreased (average 0.79 in AUROC and 0.49 in AUPRC). Upon generating saliency maps that visualize the areas each model views as significant for prediction, models trained on less compressed (CR < = 5 K) images had maps encapsulating a radiologist’s label, while models trained on images with higher amounts of compression had maps that missed the ground truth completely. In addition, base ResNet18 models pre-trained on ImageNet and trained using compressed mammograms did not show performance improvements over our CNN model, with AUROC and AUPRC values ranging from 0.77 to 0.87 and 0.52 to 0.71 respectively when trained and tested on images with maximum CRs of 5 K. This paper finds that while training models on images with increased the robustness of the models when tested on compressed data, moderate image compression did not substantially impact the classification performance of DL-based models.

Download Full-text

Development of Biologically Interpretable Multimodal Deep Learning Model for Cancer Prognosis Prediction

10.1101/2021.10.30.466610 ◽

2021 ◽

Author(s):

Zarif L Azher ◽

Louis J Vaickus ◽

Lucas A Salas ◽

Brock Christensen ◽

Joshua Levy

Keyword(s):

Deep Learning ◽

Clinical Features ◽

Preliminary Analysis ◽

Clinical Information ◽

The Cancer Genome Atlas ◽

Cancer Prognosis ◽

Biomedical Data ◽

Cancer Subtypes ◽

Prognosis Prediction ◽

Modeling Approach

Robust cancer prognostication can enable more effective patient care and management, which may potentially improve health outcomes. Deep learning has proven to be a powerful tool to extract meaningful information from cancer patient data. In recent years it has displayed promise in quantifying prognostication by predicting patient risk. However, most current deep learning-based cancer prognosis prediction methods use only a single data source and miss out on learning from potentially rich relationships across modalities. Existing multimodal approaches are challenging to interpret in a biological or medical context, limiting real-world clinical integration as a trustworthy prognostic decision aid. Here, we developed a multimodal modeling approach that can integrate information from the central modalities of gene expression, DNA methylation, and histopathological imaging with clinical information for cancer prognosis prediction. Our multimodal modeling approach combines pathway and gene-based sparsely coded layers with patch-based graph convolutional networks to facilitate biological interpretation of the model results. We present a preliminary analysis that compares the potential applicability of combining all modalities to uni- or bi-modal approaches. Leveraging data from four cancer subtypes from the Cancer Genome Atlas, results demonstrate the encouraging performance of our multimodal approach (C-index=0.660 without clinical features; C-index=0.665 with clinical features) across four cancer subtypes versus unimodal approaches and existing state-of-the-art approaches. This work brings insight to the development of interpretable multimodal methods of applying AI to biomedical data and can potentially serve as a foundation for clinical implementations of such software. We plan to follow up this preliminary analysis with an in-depth exploration of factors to improve multimodal modeling approaches on an in-house dataset.

Download Full-text