scholarly journals Modeling clinical and molecular covariates of mutational process activity in cancer

2019 ◽  
Vol 35 (14) ◽  
pp. i492-i500
Author(s):  
Welles Robinson ◽  
Roded Sharan ◽  
Mark D M Leiserson

Abstract Motivation Somatic mutations result from processes related to DNA replication or environmental/lifestyle exposures. Knowing the activity of mutational processes in a tumor can inform personalized therapies, early detection, and understanding of tumorigenesis. Computational methods have revealed 30 validated signatures of mutational processes active in human cancers, where each signature is a pattern of single base substitutions. However, half of these signatures have no known etiology, and some similar signatures have distinct etiologies, making patterns of mutation signature activity hard to interpret. Existing mutation signature detection methods do not consider tumor-level clinical/demographic (e.g. smoking history) or molecular features (e.g. inactivations to DNA damage repair genes). Results To begin to address these challenges, we present the Tumor Covariate Signature Model (TCSM), the first method to directly model the effect of observed tumor-level covariates on mutation signatures. To this end, our model uses methods from Bayesian topic modeling to change the prior distribution on signature exposure conditioned on a tumor’s observed covariates. We also introduce methods for imputing covariates in held-out data and for evaluating the statistical significance of signature-covariate associations. On simulated and real data, we find that TCSM outperforms both non-negative matrix factorization and topic modeling-based approaches, particularly in recovering the ground truth exposure to similar signatures. We then use TCSM to discover five mutation signatures in breast cancer and predict homologous recombination repair deficiency in held-out tumors. We also discover four signatures in a combined melanoma and lung cancer cohort—using cancer type as a covariate—and provide statistical evidence to support earlier claims that three lung cancers from The Cancer Genome Atlas are misdiagnosed metastatic melanomas. Availability and implementation TCSM is implemented in Python 3 and available at https://github.com/lrgr/tcsm, along with a data workflow for reproducing the experiments in the paper. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Vol 8 ◽  
Author(s):  
Rui Wang ◽  
Shanshan Li ◽  
Wen Wen ◽  
Jianquan Zhang

Comprehensive studies on cancer patients with different smoking histories, including non-smokers, former smokers, and current smokers, remain elusive. Therefore, we conducted a multi-omics analysis to explore the effect of smoking history on cancer patients. Patients with smoking history were screened from The Cancer Genome Atlas database, and their multi-omics data and clinical information were downloaded. A total of 2,317 patients were included in this study, whereby current smokers presented the worst prognosis, followed by former smokers, while non-smokers showed the best prognosis. More importantly, smoking history was an independent prognosis factor. Patients with different smoking histories exhibited different immune content, and former smokers had the highest immune cells and tumor immune microenvironment. Smokers are under a higher incidence of genomic instability that can be reversed following smoking cessation in some changes. We also noted that smoking reduced the sensitivity of patients to chemotherapeutic drugs, whereas smoking cessation can reverse the situation. Competing endogenous RNA network revealed that mir-193b-3p, mir-301b, mir-205-5p, mir-132-3p, mir-212-3p, mir-1271-5p, and mir-137 may contribute significantly in tobacco-mediated tumor formation. We identified 11 methylation driver genes (including EIF5A2, GBP6, HGD, HS6ST1, ITGA5, NR2F2, PLS1, PPP1R18, PTHLH, SLC6A15, and YEATS2), and methylation modifications of some of these genes have not been reported to be associated with tumors. We constructed a 46-gene model that predicted overall survival with good predictive power. We next drew nomograms of each cancer type. Interestingly, calibration diagrams and concordance indexes are verified that the nomograms were highly accurate for the prognosis of patients. Meanwhile, we found that the 46-gene model has good applicability to the overall survival as well as to disease-specific survival and progression-free intervals. The results of this research provide new and valuable insights for the diagnosis, treatment, and follow-up of cancer patients with different smoking histories.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i154-i160 ◽  
Author(s):  
Xinrui Lyu ◽  
Jean Garret ◽  
Gunnar Rätsch ◽  
Kjong-Van Lehmann

Abstract Motivation Understanding the underlying mutational processes of cancer patients has been a long-standing goal in the community and promises to provide new insights that could improve cancer diagnoses and treatments. Mutational signatures are summaries of the mutational processes, and improving the derivation of mutational signatures can yield new discoveries previously obscured by technical and biological confounders. Results from existing mutational signature extraction methods depend on the size of available patient cohort and solely focus on the analysis of mutation count data without considering the exploitation of metadata. Results Here we present a supervised method that utilizes cancer type as metadata to extract more distinctive signatures. More specifically, we use a negative binomial non-negative matrix factorization and add a support vector machine loss. We show that mutational signatures extracted by our proposed method have a lower reconstruction error and are designed to be more predictive of cancer type than those generated by unsupervised methods. This design reduces the need for elaborate post-processing strategies in order to recover most of the known signatures unlike the existing unsupervised signature extraction methods. Signatures extracted by a supervised model used in conjunction with cancer-type labels are also more robust, especially when using small and potentially cancer-type limited patient cohorts. Finally, we adapted our model such that molecular features can be utilized to derive an according mutational signature. We used APOBEC expression and MUTYH mutation status to demonstrate the possibilities that arise from this ability. We conclude that our method, which exploits available metadata, improves the quality of mutational signatures as well as helps derive more interpretable representations. Availability and implementation https://github.com/ratschlab/SNBNMF-mutsig-public. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Yashoda Ghanekar ◽  
Subhashini Sadasivam

AbstractBackgroundSequencing studies across multiple cancers continue to reveal the spectrum of mutations and genes involved in the pathobiology of these cancers. Exome sequencing of oral cancers, a subset of Head and Neck Squamous cell Carcinomas (HNSCs) common among tobacco-chewing populations, revealed that ~34% of the affected patients harbor mutations in the CASP8 gene. Uterine Corpus Endometrial Carcinoma (UCEC) is another cancer type where about 10% cases harbor CASP8 mutations. Caspase-8, the protease encoded by CASP8 gene, plays a dual role in programmed cell death, which in turn has an important role in tumor cell death and drug resistance. CASP8 is a protease required for the extrinsic pathway of apoptosis and is also a negative regulator of necroptosis. Using bioinformatics approaches to mine data in The Cancer Genome Atlas, we compared the molecular features and survival of these carcinomas with and without CASP8 mutations.ResultsOur in silico analyses showed that HNSCs with CASP8 mutations displayed a prominent signature of genes involved in immune response and inflammation, and were rich in immune cell infiltrates. However, in contrast to Human Papilloma Virus-positive HNSCs, a subtype that exhibits high immune cell infiltration and better overall survival, HNSC patients with mutant-CASP8 tumors did not display any survival advantage. A similar bioinformatic analyses in UCECs revealed that while UCECs with CASP8 mutations also displayed an immune signature, they had better overall survival, in contrast to the HNSC scenario. On further examination, we found that there was significant up-regulation of neutrophils as well as the cytokine, IL33 in mutant-CASP8 HNSCs, both of which were not observed in mutant-CASP8 UCECs.ConclusionsThese results suggested that carcinomas with mutant CASP8 have broadly similar immune signatures albeit with different effects on survival. We hypothesize that subtle tissue-dependent differences could influence survival by modifying the micro-environment of mutant-CASP8 carcinomas. High neutrophil numbers, which is a well-known negative prognosticator in HNSCs, and/or high IL33 levels may be some of the factors affecting survival of mutant-CASP8 cases.


2019 ◽  
Vol 35 (19) ◽  
pp. 3635-3641 ◽  
Author(s):  
Yue Wang ◽  
Jennifer M Franks ◽  
Michael L Whitfield ◽  
Chao Cheng

AbstractMotivationThe accumulation of publicly available DNA methylation datasets has resulted in the need for tools to interpret the specific cellular phenotypes in bulk tissue data. Current approaches use either single differentially methylated CpG sites or differentially methylated regions that map to genes. However, these approaches may introduce biases in downstream analyses of biological interpretation, because of the variability in gene length. There is a lack of approaches to interpret DNA methylation effectively. Therefore, we have developed computational models to provide biological interpretation of relevant gene sets using DNA methylation data in the context of The Cancer Genome Atlas.ResultsWe illustrate that Biological interpretation of DNA Methylation (BioMethyl) utilizes the complete DNA methylation data for a given cancer type to reflect corresponding gene expression profiles and performs pathway enrichment analyses, providing unique biological insight. Using breast cancer as an example, BioMethyl shows high consistency in the identification of enriched biological pathways from DNA methylation data compared to the results calculated from RNA sequencing data. We find that 12 out of 14 pathways identified by BioMethyl are shared with those by using RNA-seq data, with a Jaccard score 0.8 for estrogen receptor (ER) positive samples. For ER negative samples, three pathways are shared in the two enrichments with a slight lower similarity (Jaccard score = 0.6). Using BioMethyl, we can successfully identify those hidden biological pathways in DNA methylation data when gene expression profile is lacking.Availability and implementationBioMethyl R package is freely available in the GitHub repository (https://github.com/yuewangpanda/BioMethyl).Supplementary informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Robert L. Hollis ◽  
Barbara Stanley ◽  
John P. Thomson ◽  
Michael Churchman ◽  
Ian Croy ◽  
...  

AbstractEndometrioid ovarian carcinoma (EnOC) is an under-investigated ovarian cancer type. Recent studies have described disease subtypes defined by genomics and hormone receptor expression patterns; here, we determine the relationship between these subtyping layers to define the molecular landscape of EnOC with high granularity and identify therapeutic vulnerabilities in high-risk cases. Whole exome sequencing data were integrated with progesterone and oestrogen receptor (PR and ER) expression-defined subtypes in 90 EnOC cases following robust pathological assessment, revealing dominant clinical and molecular features in the resulting integrated subtypes. We demonstrate significant correlation between subtyping approaches: PR-high (PR + /ER + , PR + /ER−) cases were predominantly CTNNB1-mutant (73.2% vs 18.4%, P < 0.001), while PR-low (PR−/ER + , PR−/ER−) cases displayed higher TP53 mutation frequency (38.8% vs 7.3%, P = 0.001), greater genomic complexity (P = 0.007) and more frequent copy number alterations (P = 0.001). PR-high EnOC patients experience favourable disease-specific survival independent of clinicopathological and genomic features (HR = 0.16, 95% CI 0.04–0.71). TP53 mutation further delineates the outcome of patients with PR-low tumours (HR = 2.56, 95% CI 1.14–5.75). A simple, routinely applicable, classification algorithm utilising immunohistochemistry for PR and p53 recapitulated these subtypes and their survival profiles. The genomic profile of high-risk EnOC subtypes suggests that inhibitors of the MAPK and PI3K-AKT pathways, alongside PARP inhibitors, represent promising candidate agents for improving patient survival. Patients with PR-low TP53-mutant EnOC have the greatest unmet clinical need, while PR-high tumours—which are typically CTNNB1-mutant and TP53 wild-type—experience excellent survival and may represent candidates for trials investigating de-escalation of adjuvant chemotherapy to agents such as endocrine therapy.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2013
Author(s):  
Edian F. Franco ◽  
Pratip Rana ◽  
Aline Cruz ◽  
Víctor V. Calderón ◽  
Vasco Azevedo ◽  
...  

A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Steven F. Gameiro ◽  
Farhad Ghasemi ◽  
Peter Y. F. Zeng ◽  
Neil Mundi ◽  
Christopher J. Howlett ◽  
...  

Abstract Background Frequent mutations in the nuclear receptor binding SET domain protein 1 (NSD1) gene have been observed in head and neck squamous cell carcinomas (HNSCC). NSD1 encodes a histone 3 lysine-36 methyltransferase. NSD1 mutations are correlated with improved clinical outcomes and increased sensitivity to platinum-based chemotherapy agents in human papillomavirus-negative (HPV-) tumors, despite weak T-cell infiltration. However, the role of NSD1 and related family members NSD2 and NSD3 in human papillomavirus-positive (HPV+) HNSCC is unclear. Methods Using data from over 500 HNSCC patients from The Cancer Genome Atlas (TCGA), we compared the relative level of mRNA expression of NSD1, NSD2, and NSD3 in HPV+ and HPV- HNSCC. Correlation analyses were performed between T-cell infiltration and the relative level of expression of NSD1, NSD2, and NSD3 mRNA in HPV+ and HPV- HNSCC. In addition, overall survival outcomes were compared for both the HPV+ and HPV- subsets of patients based on stratification by NSD1, NSD2, and NSD3 expression levels. Results Expression levels of NSD1, NSD2 or NSD3 were not correlated with altered lymphocyte infiltration in HPV+ HNSCC. More importantly, low expression of NSD1, NSD2, or NSD3 correlated with significantly reduced overall patient survival in HPV+, but not HPV- HNSCC. Conclusion These results starkly illustrate the contrast in molecular features between HPV+ and HPV- HNSCC tumors and suggest that NSD1, NSD2, and NSD3 expression levels should be further investigated as novel clinical metrics for improved prognostication and patient stratification in HPV+ HNSCC.


Author(s):  
Martin Pirkl ◽  
Niko Beerenwinkel

Abstract Motivation Cancer is one of the most prevalent diseases in the world. Tumors arise due to important genes changing their activity, e.g. when inhibited or over-expressed. But these gene perturbations are difficult to observe directly. Molecular profiles of tumors can provide indirect evidence of gene perturbations. However, inferring perturbation profiles from molecular alterations is challenging due to error-prone molecular measurements and incomplete coverage of all possible molecular causes of gene perturbations. Results We have developed a novel mathematical method to analyze cancer driver genes and their patient-specific perturbation profiles. We combine genetic aberrations with gene expression data in a causal network derived across patients to infer unobserved perturbations. We show that our method can predict perturbations in simulations, CRISPR perturbation screens and breast cancer samples from The Cancer Genome Atlas. Availability and implementation The method is available as the R-package nempi at https://github.com/cbg-ethz/nempi and http://bioconductor.org/packages/nempi. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 13 (5) ◽  
pp. 2876
Author(s):  
Anne Parlina ◽  
Kalamullah Ramli ◽  
Hendri Murfi

The literature discussing the concepts, technologies, and ICT-based urban innovation approaches of smart cities has been growing, along with initiatives from cities all over the world that are competing to improve their services and become smart and sustainable. However, current studies that provide a comprehensive understanding and reveal smart and sustainable city research trends and characteristics are still lacking. Meanwhile, policymakers and practitioners alike need to pursue progressive development. In response to this shortcoming, this research offers content analysis studies based on topic modeling approaches to capture the evolution and characteristics of topics in the scientific literature on smart and sustainable city research. More importantly, a novel topic-detecting algorithm based on the deep learning and clustering techniques, namely deep autoencoders-based fuzzy C-means (DFCM), is introduced for analyzing the research topic trend. The topics generated by this proposed algorithm have relatively higher coherence values than those generated by previously used topic detection methods, namely non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and eigenspace-based fuzzy C-means (EFCM). The 30 main topics that appeared in topic modeling with the DFCM algorithm were classified into six groups (technology, energy, environment, transportation, e-governance, and human capital and welfare) that characterize the six dimensions of smart, sustainable city research.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Yang-Hong Dai ◽  
Ying-Fu Wang ◽  
Po-Chien Shen ◽  
Cheng-Hsiang Lo ◽  
Jen-Fu Yang ◽  
...  

AbstractIn the era of immunotherapy, there lacks of a reliable genomic predictor to identify optimal patient populations in combined radiotherapy and immunotherapy (CRI). The purpose of this study is to investigate whether genomic scores defining radiosensitivity are associated with immune response. Genomic data from Merged Microarray-Acquired dataset (MMD) were established and the Cancer Genome Atlas (TCGA) were obtained. Based on rank-based regression model including 10 genes, radiosensitivity index (RSI) was calculated. A total of 12832 primary tumours across 11 major cancer types were analysed for the association with DNA repair, cellular stemness, macrophage polarisation, and immune subtypes. Additional 585 metastatic tissues were extracted from MET500. RSI was stratified into RSI-Low and RSI-High by a cutpoint of 0.46. Proteomic differential analysis was used to identify significant proteins according to RSI categories. Gene Set Variance Analysis (GSVA) was applied to measure the genomic pathway activity (18 genes for T-cell inflamed activity). Kaplan-Meier analysis was performed for survival analysis. RSI was significantly associated with homologous DNA repair, cancer stemness and immune-related molecular features. Lower RSI was associated with higher fraction of M1 macrophage. Differential proteomic analysis identified significantly higher TAP2 expression in RSI-Low colorectal tumours. In the TCGA cohort, dominant interferon-γ (IFN-γ) response was characterised by low RSI and predicted better response to programmed cell death 1 (PD-1) blockade. In conclusion, in addition to radiation response, our study identified RSI to be associated with various immune-related features and predicted response to PD-1 blockade, thus, highlighting its potential as a candidate biomarker for CRI.


Sign in / Sign up

Export Citation Format

Share Document