scholarly journals A Pan-Cancer and Polygenic Bayesian Hierarchical Model for the Effect of Somatic Mutations on Survival

2020 ◽  
Vol 19 ◽  
pp. 117693512090739
Author(s):  
Sarah Samorodnitsky ◽  
Katherine A Hoadley ◽  
Eric F Lock

We built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to “borrow” information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues of origin or tumor types. The effect of a mutation at each gene was allowed to vary by cancer type, whereas the mean effect of each gene was shared across cancers. Within this framework, we considered 4 parametric survival models (normal, log-normal, exponential, and Weibull), and we compared their performance via a cross-validation approach in which we fit each model on training data and estimate the log-posterior predictive likelihood on test data. The log-normal model gave the best fit, and we investigated the partial effect of each gene on survival via a forward selection procedure. Through this we determined that mutations at TP53 and FAT4 were together the most useful for predicting patient survival. We validated the model via simulation to ensure that our algorithm for posterior computation gave nominal coverage rates. The code used for this analysis can be found at https://github.com/sarahsamorodnitsky/Pan-Cancer-Survival-Modeling.git , and the results are summarized at http://ericfrazerlock.com/surv_figs/SurvivalDisplay.html .

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Erik van Dijk ◽  
Tom van den Bosch ◽  
Kristiaan J. Lenos ◽  
Khalid El Makrini ◽  
Lisanne E. Nijman ◽  
...  

AbstractSurvival rates of cancer patients vary widely within and between malignancies. While genetic aberrations are at the root of all cancers, individual genomic features cannot explain these distinct disease outcomes. In contrast, intra-tumour heterogeneity (ITH) has the potential to elucidate pan-cancer survival rates and the biology that drives cancer prognosis. Unfortunately, a comprehensive and effective framework to measure ITH across cancers is missing. Here, we introduce a scalable measure of chromosomal copy number heterogeneity (CNH) that predicts patient survival across cancers. We show that the level of ITH can be derived from a single-sample copy number profile. Using gene-expression data and live cell imaging we demonstrate that ongoing chromosomal instability underlies the observed heterogeneity. Analysing 11,534 primary cancer samples from 37 different malignancies, we find that copy number heterogeneity can be accurately deduced and predicts cancer survival across tissues of origin and stages of disease. Our results provide a unifying molecular explanation for the different survival rates observed between cancer types.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e14576-e14576
Author(s):  
Xinlu Liu ◽  
Jiasheng Xu ◽  
Jian Sun ◽  
Deng Wei ◽  
Xinsheng Zhang ◽  
...  

e14576 Background: Clinically, MSI had been used as an important molecular marker for the prognosis of colorectal cancer and other solid tumors and the formulation of adjuvant treatment plans, and it had been used to assist in the screening of Lynch syndrome. However, there were currently few reports on the incidence of MSI-H in Chinese pan-cancer patients. This study described the occurrence of MSI in a large multi-center pan-cancer cohort in China, and explored the correlation between MSI and patients' TMB, age, PD-L1 expression and other indicators. Methods: The study included 8361 patients with 8 cancer types from multiple tumor centers. Use immunohistochemistry to detect the expression of MMR protein (MLH1, MSH2, MSH6 and PMS2) in patients with various cancer types to determine the MSI status and detect the expression of PD-L1 in patients. Through NGS technology, 831 genes of 8361 Chinese cancer patients were sequenced and the tumor mutation load of the patients was calculated. The MSI mutations of patients in 8 cancer types were analyzed and the correlation between MSI mutations of patients and the patient's age, TMB and PD-L1 expression was analyzed. Results: The test results showed that MSI patients accounted for 1.66% of pan-cancers. Among them, MSI-H patients accounted for the highest proportion in intestinal cancer, reaching 7.2%. The correlation analysis between MSI and TMB was performed on patients of various cancer types. The results showed that: in each cancer type, MSI-H patients had TMB greater than 10, and 26.83% of MSI-H patients had TMB greater than 100 in colorectal cancer patients. The result of correlation analysis showed that there was no significant correlation between the patient's age and the risk of MSI mutation ( P> 0.05). In addition to PAAD and LUAD, the expression of PD-L1 in MSI-H patients was higher than that in MSS patients in other cancer types( P< 0.05). The correlation analysis between PD-L1 expression and TMB in patients found that in colorectal cancer, the higher the expression of PD-L1, the higher the patient's TMB ( P< 0.05). Conclusions: In this study, we explored the incidence of MSI-H in pan-cancer patients in China and found that the TMB was greater than 10 in patients with MSI-H. Compared with MSS patients, MSI-H patients have higher PD-L1 expression, and the higher the PD-L1 expression in colorectal cancer, the higher the TMB value of patients.


2022 ◽  
Author(s):  
James W. Webber ◽  
Kevin M. Elias

Background: Cancer identification is generally framed as binary classification, normally discrimination of a control group from a single cancer group. However, such models lack any cancer-specific information, as they are only trained on one cancer type. The models fail to account for competing cancer risks. For example, an ostensibly healthy individual may have any number of different cancer types, and a tumor may originate from one of several primary sites. Pan-cancer evaluation requires a model trained on multiple cancer types, and controls, simultaneously, so that a physician can be directed to the correct area of the body for further testing. Methods: We introduce novel neural network models to address multi-cancer classification problems across several data types commonly applied in cancer prediction, including circulating miRNA expression, protein, and mRNA. In particular, we present an analysis of neural network depth and complexity, and investigate how this relates to classification performance. Comparisons of our models with state-of-the-art neural networks from the literature are also presented. Results: Our analysis evidences that shallow, feed-forward neural net architectures offer greater performance when compared to more complex deep feed-forward, Convolutional Neural Network (CNN), and Graph CNN (GCNN) architectures considered in the literature. Conclusion: The results show that multiple cancers and controls can be classified accurately using the proposed models, across a range of expression technologies in cancer prediction. Impact: This study addresses the important problem of pan-cancer classification, which is often overlooked in the literature. The promising results highlight the urgency for further research.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008720
Author(s):  
John P. Lloyd ◽  
Matthew B. Soellner ◽  
Sofia D. Merajver ◽  
Jun Z. Li

Increased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines, each with MEK inhibitor (MEKi) response and mRNA expression, point mutation, and copy number variation data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ = 0.88–0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ = 0.11–0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as exclusion of between-tissue signals leads to a decrease in Spearman’s ρ from a range of 0.43–0.62 to 0.30–0.51. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference.


Cancers ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1816
Author(s):  
Xiaoli Zhang ◽  
Shuai Shao ◽  
Lang Li

Class-3 semaphorins (SEMA3s), initially characterized as axon guidance cues, have been recognized as key regulators for immune responses, angiogenesis, tumorigenesis and drug responses. The functions of SEMA3s are attributed to the activation of downstream signaling cascades mainly mediated by cell surface receptors neuropilins (NRPs) and plexins (PLXNs), yet their roles in human cancers are not completely understood. Here, we provided a detailed pan-cancer analysis of NRPs and PLXNs in their expression, and association with key signal transducers, patient survival, tumor microenvironment (TME), and drug responses. The expression of NRPs and PLXNs were dysregulated in many cancer types, and the majority of them were further dysregulated in metastatic tumors, indicating a role in metastatic progression. Importantly, the expression of these genes was frequently associated with key transducers, patient survival, TME, and drug responses; however, the direction of the association varied for the particular gene queried and the specific cancer type/subtype tested. Specifically, NRP1, NRP2, PLXNA1, PLXNA3, PLXNB3, PLXNC1, and PLXND1 were primarily associated with aggressive phenotypes, whereas the rest were more associated with favorable prognosis. These data highlighted the need to study each as a separate entity in a cancer type- and subtype-dependent manner.


2019 ◽  
Author(s):  
John P. Lloyd ◽  
Matthew Soellner ◽  
Sofia D. Merajver ◽  
Jun Z. Li

ABSTRACTIncreased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines with MEK inhibitor (MEKi) response and RNA, SNP, and CNV data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ=0.88-0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ=0.11-0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as we estimate that exclusion of between-tissue signals leads to a 22% decrease in performance metrics. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that the higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference.


2021 ◽  
Vol 12 ◽  
Author(s):  
Rongchuan Zhao ◽  
Xiaohan Sa ◽  
Nan Ouyang ◽  
Hong Zhang ◽  
Jiao Yang ◽  
...  

Numerous studies have identified various prognostic long non-coding RNAs (LncRNAs) in a specific cancer type, but a comprehensive pan-cancer analysis for prediction of LncRNAs that may serve as prognostic biomarkers is of great significance to be performed. Glioblastoma multiforme (GBM) is the most common and aggressive malignant adult primary brain tumor. There is an urgent need to identify novel therapies for GBM due to its poor prognosis and universal recurrence. Using available LncRNA expression data of 12 cancer types and survival data of 30 cancer types from online databases, we identified 48 differentially expressed LncRNAs in cancers as potential pan-cancer prognostic biomarkers. Two candidate LncRNAs were selected for validation in GBM. By the expression detection in GBM cell lines and survival analysis in GBM patients, we demonstrated the reliability of the list of pan-cancer prognostic LncRNAs obtained above. By constructing LncRNA-mRNA-drug network in GBM, we predicted novel drug-target interactions for GBM correlated LncRNA. This analysis has revealed common prognostic LncRNAs among cancers, which may provide insights into cancer pathogenesis and novel drug target in GBM.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Haley Hieronymus ◽  
Rajmohan Murali ◽  
Amy Tin ◽  
Kamlesh Yadav ◽  
Wassim Abida ◽  
...  

The level of copy number alteration (CNA), termed CNA burden, in the tumor genome is associated with recurrence of primary prostate cancer. Whether CNA burden is associated with prostate cancer survival or outcomes in other cancers is unknown. We analyzed the CNA landscape of conservatively treated prostate cancer in a biopsy and transurethral resection cohort, reflecting an increasingly common treatment approach. We find that CNA burden is prognostic for cancer-specific death, independent of standard clinical prognosticators. More broadly, we find CNA burden is significantly associated with disease-free and overall survival in primary breast, endometrial, renal clear cell, thyroid, and colorectal cancer in TCGA cohorts. To assess clinical applicability, we validated these findings in an independent pan-cancer cohort of patients whose tumors were sequenced using a clinically-certified next generation sequencing assay (MSK-IMPACT), where prognostic value varied based on cancer type. This prognostic association was affected by incorporating tumor purity in some cohorts. Overall, CNA burden of primary and metastatic tumors is a prognostic factor, potentially modulated by sample purity and measurable by current clinical sequencing.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Luís A. Vale-Silva ◽  
Karl Rohr

AbstractThe age of precision medicine demands powerful computational techniques to handle high-dimensional patient data. We present MultiSurv, a multimodal deep learning method for long-term pan-cancer survival prediction. MultiSurv uses dedicated submodels to establish feature representations of clinical, imaging, and different high-dimensional omics data modalities. A data fusion layer aggregates the multimodal representations, and a prediction submodel generates conditional survival probabilities for follow-up time intervals spanning several decades. MultiSurv is the first non-linear and non-proportional survival prediction method that leverages multimodal data. In addition, MultiSurv can handle missing data, including single values and complete data modalities. MultiSurv was applied to data from 33 different cancer types and yields accurate pan-cancer patient survival curves. A quantitative comparison with previous methods showed that Multisurv achieves the best results according to different time-dependent metrics. We also generated visualizations of the learned multimodal representation of MultiSurv, which revealed insights on cancer characteristics and heterogeneity.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Joel Nulsen ◽  
Hrvoje Misetic ◽  
Christopher Yau ◽  
Francesca D. Ciccarelli

Abstract Background Identifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions. Results We present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways. Conclusions sysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).


Sign in / Sign up

Export Citation Format

Share Document