scholarly journals Extendable and explainable deep learning for pan-cancer radiogenomics research

2022 ◽  
Vol 66 ◽  
pp. 102111
Qian Liu ◽  
Pingzhao Hu
2018 ◽  
Yeping Lina Qiu ◽  
Hong Zheng ◽  
Olivier Gevaert

AbstractMotivationThe presence of missing values is a frequent problem encountered in genomic data analysis. Lost data can be an obstacle to downstream analyses that require complete data matrices. State-of-the-art imputation techniques including Singular Value Decomposition (SVD) and K-Nearest Neighbors (KNN) based methods usually achieve good performances, but are computationally expensive especially for large datasets such as those involved in pan-cancer analysis.ResultsThis study describes a new method: a denoising autoencoder with partial loss (DAPL) as a deep learning based alternative for data imputation. Results on pan-cancer gene expression data and DNA methylation data from over 11,000 samples demonstrate significant improvement over standard denoising autoencoder for both data missing-at-random cases with a range of missing percentages, and missing-not-at-random cases based on expression level and GC-content. We discuss the advantages of DAPL over traditional imputation methods and show that it achieves comparable or better performance with less computational burden.Availability[email protected]

Xiangtao Li ◽  
Shaochuan Li ◽  
Yunhe Wang ◽  
Shixiong Zhang ◽  
Ka-Chun Wong

Abstract The identification of hidden responders is often an essential challenge in precision oncology. A recent attempt based on machine learning has been proposed for classifying aberrant pathway activity from multiomic cancer data. However, we note several critical limitations there, such as high-dimensionality, data sparsity and model performance. Given the central importance and broad impact of precision oncology, we propose nature-inspired deep Ras activation pan-cancer (NatDRAP), a deep neural network (DNN) model, to address those restrictions for the identification of hidden responders. In this study, we develop the nature-inspired deep learning model that integrates bulk RNA sequencing, copy number and mutation data from PanCanAltas to detect pan-cancer Ras pathway activation. In NatDRAP, we propose to synergize the nature-inspired artificial bee colony algorithm with different gradient-based optimizers in one framework for optimizing DNNs in a collaborative manner. Multiple experiments were conducted on 33 different cancer types across PanCanAtlas. The experimental results demonstrate that the proposed NatDRAP can provide superior performance over other benchmark methods with strong robustness towards diagnosing RAS aberrant pathway activity across different cancer types. In addition, gene ontology enrichment and pathological analysis are conducted to reveal novel insights into the RAS aberrant pathway activity identification and characterization. NatDRAP is written in Python and available at

2021 ◽  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Xiayu Fu ◽  
Zeliang Hou ◽  

Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. We trained and tested DeepHBV using the HBV integration sites data from dsVIS database. Initially, DeepHBV showed AUROC of 0.6363 and AUPR of 0.5471 on the dataset. Adding repeat peaks and TCGA Pan Cancer peaks can significantly improve the model performance, with an AUROC of 0.8378 and 0.9430 and an AUPR of 0.7535 and 0.9310, respectively. On independent validation dataset of HBV integration sites from VISDB, DeepHBV with HBV integration sequences plus TCGA Pan Cancer (AUROC of 0.7603 and AUPR of 0.6189) performed better than HBV integration sequences plus repeat peaks (AUROC of 0.6657 and AUPR of 0.5737). Next, we found the transcriptional factor binding sites (TFBS) were significantly enriched near genomic positions that were paid attention to by convolution neural network. The binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra and Foxo3 were highlighted by DeepHBV attention mechanism in both dsVIS dataset and VISDB dataset, revealing the HBV integration preference. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for further mechanism study of HBV induced cancer.

2021 ◽  
Marie PAVAGEAU ◽  
Louis REBAUD ◽  
Daphne MOREL ◽  
Eric DEUTSCH ◽  

RNA sequencing (RNAseq) analysis offers a tumor centered approach of growing interest for personalizing cancer care. However, existing methods , including deep learning models, struggle to reach satisfying performances on survival prediction based upon pan-cancer RNAseq data. Here, we present DeepOS, a novel deep learning model that predicts overall survival (OS) from pancancer RNAseq with a concordance index of 0.715 and a survival AUC of 0.752 across 33 TCGA tumor types whilst tested on an unseen test cohort. DeepOS notably uses (i) prior biological knowledge to condense inputs dimensionality, (ii) transfer learning to enlarge its training capacity through pretraining on organ prediction, and (iii) mean squared error adapted to survival loss function; all of which contributed to improve the model performances. Interpretation showed that DeepOS learned biologically relevant prognosis biomarkers. Altogether, DeepOS achieved unprecedented and consistent performances on pan-cancer prognosis estimation from individual RNA-seq data.

2021 ◽  
Javad Noorbakhsh ◽  
Saman Farahmand ◽  
Ali Foroughi pour ◽  
Sandeep Namburi ◽  
Dennis Caruana ◽  

2020 ◽  
Vol 10 (1) ◽  
Bo Yuan ◽  
Dong Yang ◽  
Bonnie E. G. Rothberg ◽  
Hao Chang ◽  
Tian Xu

Abstract Deep learning analysis of images and text unfolds new horizons in medicine. However, analysis of transcriptomic data, the cause of biological and pathological changes, is hampered by structural complexity distinctive from images and text. Here we conduct unsupervised training on more than 20,000 human normal and tumor transcriptomic data and show that the resulting Deep-Autoencoder, DeepT2Vec, has successfully extracted informative features and embedded transcriptomes into 30-dimensional Transcriptomic Feature Vectors (TFVs). We demonstrate that the TFVs could recapitulate expression patterns and be used to track tissue origins. Trained on these extracted features only, a supervised classifier, DeepC, can effectively distinguish tumors from normal samples with an accuracy of 90% for Pan-Cancer and reach an average 94% for specific cancers. Training on a connected network, the accuracy is further increased to 96% for Pan-Cancer. Together, our study shows that deep learning with autoencoder is suitable for transcriptomic analysis, and DeepT2Vec could be successfully applied to distinguish cancers, normal tissues, and other potential traits with limited samples.

2021 ◽  
Vol 11 (1) ◽  
Luís A. Vale-Silva ◽  
Karl Rohr

AbstractThe age of precision medicine demands powerful computational techniques to handle high-dimensional patient data. We present MultiSurv, a multimodal deep learning method for long-term pan-cancer survival prediction. MultiSurv uses dedicated submodels to establish feature representations of clinical, imaging, and different high-dimensional omics data modalities. A data fusion layer aggregates the multimodal representations, and a prediction submodel generates conditional survival probabilities for follow-up time intervals spanning several decades. MultiSurv is the first non-linear and non-proportional survival prediction method that leverages multimodal data. In addition, MultiSurv can handle missing data, including single values and complete data modalities. MultiSurv was applied to data from 33 different cancer types and yields accurate pan-cancer patient survival curves. A quantitative comparison with previous methods showed that Multisurv achieves the best results according to different time-dependent metrics. We also generated visualizations of the learned multimodal representation of MultiSurv, which revealed insights on cancer characteristics and heterogeneity.

Kamil Wnuk ◽  
Jeremi Sudol ◽  
Shahrooz Rabizadeh ◽  
Patrick Soon-Shiong ◽  
Christopher Szeto ◽  

Sign in / Sign up

Export Citation Format

Share Document