CHASMplus reveals the scope of somatic missense mutations driving human cancers

SummaryLarge-scale cancer sequencing studies of patient cohorts have statistically implicated many genes driving cancer growth and progression, and their identification has yielded substantial translational impact. However, a remaining challenge is to increase the resolution of driver prediction from the gene level to the mutation level, because mutation-level predictions are more closely aligned with the goal of precision cancer medicine. Here we present CHASMplus, a computational method, that is uniquely capable of identifying driver missense mutations, including those specific to a cancer type, as evidenced by significantly superior performance on diverse benchmarks. Applied to 8,657 tumor samples across 32 cancer types in The Cancer Genome Atlas, CHASMplus identifies over 4,000 unique driver missense mutations in 240 genes, supporting a prominent role for rare driver mutations. We show which TCGA cancer types are likely to yield discovery of new driver missense mutations by additional sequencing, which has important implications for public policy.SignificanceMissense mutations are the most frequent mutation type in cancers and the most difficult to interpret. While many computational methods have been developed to predict whether genes are cancer drivers or whether missense mutations are generally deleterious or pathogenic, there has not previously been a method to score the oncogenic impact of a missense mutation specifically by cancer type, limiting adoption of computational missense mutation predictors in the clinic. Cancer patients are routinely sequenced with targeted panels of cancer driver genes, but such genes contain a mixture of driver and passenger missense mutations which differ by cancer type. A patient’s therapeutic response to drugs and optimal assignment to a clinical trial depends on both the specific mutation in the gene of interest and cancer type. We present a new machine learning method honed for each TCGA cancer type, and a resource for fast lookup of the cancer-specific driver propensity of every possible missense mutation in the human exome.

Download Full-text

Identifying cancer type specific oncogenes and tumor suppressors using limited size data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500311 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650031 ◽

Cited By ~ 4

Author(s):

Ana B. Pavel ◽

Cristian I. Vasile

Keyword(s):

Tumor Suppressors ◽

Molecular Mechanisms ◽

Lung Squamous Cell Carcinoma ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Type ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Subtypes ◽

Cancer Types

Cancer is a complex and heterogeneous genetic disease. Different mutations and dysregulated molecular mechanisms alter the pathways that lead to cell proliferation. In this paper, we explore a method which classifies genes into oncogenes (ONGs) and tumor suppressors. We optimize this method to identify specific (ONGs) and tumor suppressors for breast cancer, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and colon adenocarcinoma (COAD), using data from the cancer genome atlas (TCGA). A set of genes were previously classified as ONGs and tumor suppressors across multiple cancer types (Science 2013). Each gene was assigned an ONG score and a tumor suppressor score based on the frequency of its driver mutations across all variants from the catalogue of somatic mutations in cancer (COSMIC). We evaluate and optimize this approach within different cancer types from TCGA. We are able to determine known driver genes for each of the four cancer types. After establishing the baseline parameters for each cancer type, we identify new driver genes for each cancer type, and the molecular pathways that are highly affected by them. Our methodology is general and can be applied to different cancer subtypes to identify specific driver genes and improve personalized therapy.

Download Full-text

Pathway-based dissection of the genomic heterogeneity of cancer hallmarks’ acquisition with SLAPenrich

10.1101/077701 ◽

2016 ◽

Cited By ~ 6

Author(s):

Francesco Iorio ◽

Luz Garcia-Alonso ◽

Jonathan S. Brammeld ◽

Iñigo Martincorena ◽

David R. Wille ◽

...

Keyword(s):

Somatic Mutations ◽

Population Level ◽

Computational Method ◽

Driver Mutations ◽

Cancer Type ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Hallmarks ◽

Pathway Gene ◽

Cancer Types

ABSTRACTCancer hallmarks are evolutionary traits required by a tumour to develop. While extensively characterised, the way these traits are achieved through the accumulation of somatic mutations in key biological pathways is not fully understood. To shed light on this subject, we characterised the landscape of pathway alterations associated with somatic mutations observed in 4,415 patients across ten cancer types, using 374 orthogonal pathway gene-sets mapped onto canonical cancer hallmarks. Towards this end, we developed SLAPenrich: a computational method based on population-level statistics, freely available as an open source R package. Assembling the identified pathway alterations into sets of hallmark signatures allowed us to connect somatic mutations to clinically interpretable cancer mechanisms. Further, we explored the heterogeneity of these signatures, in terms of ratio of altered pathways associated with each individual hallmark, assuming that this is reflective of the extent of selective advantage provided to the cancer type under consideration. Our analysis revealed the predominance of certain hallmarks in specific cancer types, thus suggesting different evolutionary trajectories across cancer lineages.Finally, although many pathway alteration enrichments are guided by somatic mutations in frequently altered high-confidence cancer genes, excluding these driver mutations preserves the hallmark heterogeneity signatures, thus the detected hallmarks’ predominance across cancer types. As a consequence, we propose the hallmark signatures as a ground truth to characterise tails of infrequent genomic alterations and identify potential novel cancer driver genes and networks.

Download Full-text

Multi-omic data helps improve prediction of personalised tumor suppressors and oncogenes

10.1101/2022.01.13.476163 ◽

2022 ◽

Author(s):

Malvika Sudhakar ◽

Raghunathan Rengaswamy ◽

Karthik Raman

Keyword(s):

Tumour Progression ◽

Suppressor Gene ◽

Tumour Suppressor Gene ◽

Driver Mutations ◽

Cancer Type ◽

Expression Data ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Types ◽

Omic Data

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require a large number of samples to produce reliable results. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumour progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify the perturbation in the network. Our method is the first machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalised Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalised driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labelling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared to mutation and expression data, achieving an accuracy >0.99 for BRCA, LUAD and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD and LUAD cancer types reveal commonly altered genes such as TP53, and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9 and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalised driver genes as TSGs and OGs and also identifies rare driver genes. PIVOT is available at https://github.com/RamanLab/PIVOT.

Download Full-text

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Download Full-text

Contextual Classifications of Cancer Driver Genes

10.1101/715508 ◽

2019 ◽

Author(s):

Pramod Chandrashekar ◽

Navid Ahmadinejad ◽

Junwen Wang ◽

Aleksandar Sekulic ◽

Jan B. Egan ◽

...

Keyword(s):

Computational Method ◽

Cancer Type ◽

Sequencing Data ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Link Type ◽

Mutational Hotspots ◽

Cancer Types ◽

Cancer Driver Genes

ABSTRACTFunctions of cancer driver genes depend on cellular contexts that vary substantially across tissues and organs. Distinguishing oncogenes (OGs) and tumor suppressor genes (TSGs) for each cancer type is critical to identifying clinically actionable targets. However, current resources for context-aware classifications of cancer drivers are limited. In this study, we show that the direction and magnitude of somatic selection of missense and truncating mutations of a gene are suggestive of its contextual activities. By integrating these features with ratiometric and conservation measures, we developed a computational method to categorize OGs and TSGs using exome sequencing data. This new method, named genes under selection in tumors (GUST) shows an overall accuracy of 0.94 when tested on manually curated benchmarks. Application of GUST to 10,172 tumor exomes of 33 cancer types identified 98 OGs and 179 TSGs, >70% of which promote tumorigenesis in only one cancer type. In broad-spectrum drivers shared across multiple cancer types, we found heterogeneous mutational hotspots modifying distinct functional domains, implicating the synchrony of convergent and divergent disease mechanisms. We further discovered two novel OGs and 28 novel TSGs with high confidence. The GUST program is available at https://github.com/liliulab/gust. A database with pre-computed classifications is available at https://liliulab.shinyapps.io/gust

Download Full-text

MEGSA: A powerful and flexible framework for analyzing mutual exclusivity of tumor mutations

10.1101/017731 ◽

2015 ◽

Author(s):

Xing Hua ◽

Paula L. Hyland ◽

Jing Huang ◽

Bin Zhu ◽

Neil E. Caporaso ◽

...

Keyword(s):

Statistical Power ◽

De Novo ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Ratio Test ◽

Mutual Exclusivity ◽

Driver Genes ◽

Functional Relationships ◽

Cancer Types ◽

Tumor Sequencing

The central challenge in tumor sequencing studies is to identify driver genes and pathways, investigate their functional relationships and nominate drug targets. The efficiency of these analyses, particularly for infrequently mutated genes, is compromised when patients carry different combinations of driver mutations. Mutual exclusivity analysis helps address these challenges. To identify mutually exclusive gene sets (MEGS), we developed a powerful and flexible analytic framework based on a likelihood ratio test and a model selection procedure. Extensive simulations demonstrated that our method outperformed existing methods for both statistical power and the capability of identifying the exact MEGS, particularly for highly imbalanced MEGS. Our method can be used for de novo discovery, pathway-guided searches or for expanding established small MEGS. We applied our method to the whole exome sequencing data for fourteen cancer types from The Cancer Genome Atlas (TCGA). We identified multiple previously unreported non-pairwise MEGS in multiple cancer types. For acute myeloid leukemia, we identified a novel MEGS with five genes (FLT3, IDH2, NRAS, KIT and TP53) and a MEGS (NPM1, TP53 and RUX1) whose mutation status was strongly associated with survival (P=6.7×10-4). For breast cancer, we identified a significant MEGS consisting of TP53 and four infrequently mutated genes (ARID1A, AKT1, MED23 and TBL1XR1), providing support for their role as cancer drivers. Keywords: Mutual exclusivity, oncogenic pathways, driver genes, tumor sequencing

Download Full-text

Investigating structure function relationships in the NOTCH family through large-scale somatic DNA sequencing studies

10.1101/2020.03.31.018325 ◽

2020 ◽

Author(s):

Michael W J Hall ◽

David Shorthouse ◽

Philip H Jones ◽

Benjamin A Hall

Keyword(s):

Dna Sequencing ◽

Structure Function ◽

Calcium Binding ◽

Large Scale ◽

Driver Mutations ◽

Missense Mutations ◽

Mutant Selection ◽

Ligand Interaction ◽

Binding Interface

AbstractThe recent development of highly sensitive DNA sequencing techniques has detected large numbers of missense mutations of genes, including NOTCH1 and 2, in ageing normal tissues. Driver mutations persist and propagate in the tissue through a selective advantage over both wild-type cells and alternative mutations. This process of selection can be considered as a large scale, in vivo screen for mutations that increase clone fitness. It follows that the specific missense mutations that are observed in individual genes may offer us insights into the structure-function relationships. Here we show that the positively selected missense mutations in NOTCH1 and NOTCH2 in human oesophageal epithelium cause inactivation predominantly through protein misfolding. Once these mutations are excluded, we further find statistically significant evidence for selection at the ligand binding interface and calcium binding sites. In this, we observe stronger evidence of selection at the ligand interface on EGF12 over EGF11, suggesting that in this tissue EGF12 may play a more important role in ligand interaction. Finally, we show how a mutation hotspot in the NOTCH1 transmembrane helix arises through the intersection of both a high mutation rate and residue conservation. Together these insights offer a route to understanding the mechanism of protein function through in vivo mutant selection.

Download Full-text

Additive effects of variants of unknown significance in replication repair-associated DNA polymerase genes on mutational burden and prognosis across diverse cancers

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-002336 ◽

2021 ◽

Vol 9 (9) ◽

pp. e002336

Author(s):

Jieer Ying ◽

Lin Yang ◽

Jiani C Yin ◽

Guojie Xia ◽

Minyan Xing ◽

...

Keyword(s):

Hot Spot ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Type ◽

Additive Effects ◽

Variants Of Unknown Significance ◽

Mutational Burden ◽

Pathway Gene ◽

Exonuclease Domain ◽

Unknown Significance

BackgroundDefects in replication repair-associated DNA polymerases often manifest an ultra-high tumor mutational burden (TMB), which is associated with higher probabilities of response to immunotherapies. The functional and clinical implications of different polymerase variants remain unclear.MethodsTargeted next-generation sequencing using a 425-cancer gene panel, which covers all exonic regions of three polymerase genes (POLE, POLD1, and POLH), was conducted in a cohort of 12,266 patients across 16 different tumor types from January 2017 to January 2019. Prognostication of POL variant-positive patients was performed using a cohort of 4679 patients from the The Cancer Genome Atlas (TCGA) datasets.ResultsThe overall prevalence of somatic and germline polymerase variants was 4.2% (95% CI 3.8% to 4.5%) and 0.7% (95% CI 0.5% to 0.8%), respectively, with highest frequencies in endometrial, urinary, prostate, and colorectal cancers (CRCs). While most germline polymerase variants showed no clear functional consequences, we identified a candidate p.T466A affecting the exonuclease domain of POLE, which might be underlying the early onset in a case with childhood CRC. Low frequencies of known hot-spot somatic mutations in POLE were detected and were associated with younger age, the male sex, and microsatellite stability. In both the panel and TCGA cohorts, POLE drivers exhibited high frequencies of alterations in genes in the DNA damage and repair (DDR) pathways, including BRCA2, ATM, MSH6, and ATR. Variants of unknown significance (VUS) of different polymerase domains showed variable penetrance with those in the exonuclease domain of POLE and POLD1 displaying high TMB. VUS in POL genes exhibited an additive effect as carriers of multiple VUS had exponentially increased TMB and prolonged overall survival. Similar to cases with driver mutations, the TMB-high POL VUS samples showed DDR pathway involvement and polymerase hypermutation signatures. Combinatorial analysis of POL and DDR pathway status further supported the potential additive effects of POL VUS and DDR pathway genes and revealed distinct prognostic subclasses that were independent of cancer type and TMB.ConclusionsOur results demonstrate the pathogenicity and additive prognostic value of POL VUS and DDR pathway gene alterations and suggest that genetic testing may be warranted in patients with diverse solid tumors.

Download Full-text

ASF1B Promotes Oncogenesis in Lung Adenocarcinoma and Other Cancer Types

Frontiers in Oncology ◽

10.3389/fonc.2021.731547 ◽

2021 ◽

Vol 11 ◽

Author(s):

Wencheng Zhang ◽

Zhouyong Gao ◽

Mingxiu Guan ◽

Ning Liu ◽

Fanjie Meng ◽

...

Keyword(s):

Lung Adenocarcinoma ◽

Immune Cell ◽

Gene Set Enrichment Analysis ◽

The Cancer Genome Atlas ◽

Tumor Promoter ◽

Regulatory Subunit ◽

Cell Cycle Distribution ◽

Cancer Type ◽

Dependent Manner ◽

Cancer Types

Anti-silencing function 1B histone chaperone (ASF1B) is known to be an important modulator of oncogenic processes, yet its role in lung adenocarcinoma (LUAD) remains to be defined. In this study, an integrated assessment of The Cancer Genome Atlas (TCGA) and genotype-tissue expression (GTEx) datasets revealed the overexpression of ASF1B in all analyzed cancer types other than LAML. Genetic, epigenetic, microsatellite instability (MSI), and tumor mutational burden (TMB) analysis showed that ASF1B was regulated by single or multiple factors. Kaplan-Meier survival curves suggested that elevated ASF1B expression was associated with better or worse survival in a cancer type-dependent manner. The CIBERSORT algorithm was used to evaluate immune microenvironment composition, and distinct correlations between ASF1B expression and immune cell infiltration were evident when comparing tumor and normal tissue samples. Gene set enrichment analysis (GSEA) indicated that ASF1B was associated with proliferation- and immunity-related pathways. Knocking down ASF1B impaired the proliferation, affected cell cycle distribution, and induced cell apoptosis in LUAD cell lines. In contrast, ASF1B overexpression had no impact on the malignant characteristics of LUAD cells. At the mechanistic level, ASF1B served as an indirect regulator of DNA Polymerase Epsilon 3, Accessory Subunit (POLE3), CDC28 protein kinase regulatory subunit 1(CKS1B), Dihydrofolate reductase (DHFR), as established through proteomic profiling and Immunoprecipitation-Mass Spectrometry (IP-MS) analyses. Overall, these data suggested that ASF1B serves as a tumor promoter and potential target for cancer therapy and provided us with clues to better understand the importance of ASF1B in many types of cancer.

Download Full-text

Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1901156116 ◽

2019 ◽

Vol 116 (38) ◽

pp. 18962-18970 ◽

Cited By ~ 5

Author(s):

Sushant Kumar ◽

Declan Clarke ◽

Mark B. Gerstein

Keyword(s):

Protein Dynamics ◽

Protein Function ◽

Large Scale ◽

Protein Structures ◽

Structural Data ◽

The Cancer Genome Atlas ◽

Detection Methods ◽

Hotspot Detection ◽

Driver Genes ◽

Mutational Hotspots

Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue–residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.

Download Full-text