The cancer-mutation network and the number and specificity of driver mutations

AbstractCancer genomics has produced extensive information on cancer-associated genes but the number and specificity of cancer driver mutations remains a matter of debate. We constructed a bipartite network in which 7665 tumors from 30 cancer types are connected via shared mutations in 198 previously identified cancer-associated genes. We show that 27% of the tumors can be assigned to statistically supported modules, most of which encompass 1-2 cancer types. The rest of the tumors belong to a diffuse network component suggesting lower gene-specificity of driver mutations. Linear regression of the mutational loads in cancer-associated genes was used to estimate the number of drivers required for the onset of different cancers. The mean number of drivers is ~2, with a range of 1 to 5. Cancers that are associated to modules had more drivers than those from the diffuse network component, suggesting that unidentified and/or interchangeable drivers exist in the latter.

Download Full-text

Cancer-mutation network and the number and specificity of driver mutations

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1803155115 ◽

2018 ◽

Vol 115 (26) ◽

pp. E6010-E6019 ◽

Cited By ~ 24

Author(s):

Jaime Iranzo ◽

Iñigo Martincorena ◽

Eugene V. Koonin

Keyword(s):

Cancer Genomics ◽

Driver Mutations ◽

Bipartite Network ◽

Cancer Genes ◽

Cancer Driver ◽

Cancer Mutation ◽

Cancer Types ◽

Extensive Information ◽

The Mean ◽

Network Component

Cancer genomics has produced extensive information on cancer-associated genes, but the number and specificity of cancer-driver mutations remains a matter of debate. We constructed a bipartite network in which 7,665 tumors from 30 cancer types are connected via shared mutations in 198 previously identified cancer genes. We show that about 27% of the tumors can be assigned to statistically supported modules, most of which encompass one or two cancer types. The rest of the tumors belong to a diffuse network component suggesting lower gene specificity of driver mutations. Linear regression of the mutational loads in cancer genes was used to estimate the number of drivers required for the onset of different cancers. The mean number of drivers in known cancer genes is approximately two, with a range of one to five. Cancers that are associated with modules had more drivers than those from the diffuse network component, suggesting that unidentified and/or interchangeable drivers exist in the latter.

Download Full-text

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Download Full-text

Learning the mutational landscape of the cancer genome

10.1101/2021.08.03.454669 ◽

2021 ◽

Author(s):

Maxwell A Sherman ◽

Adam Yaari ◽

Oliver Priebe ◽

Felix Dietlein ◽

Po-Ru Loh ◽

...

Keyword(s):

Deep Neural Networks ◽

Mutation Rates ◽

Untranslated Regions ◽

Driver Mutations ◽

Web Interface ◽

Cancer Driver ◽

Genome Wide ◽

Cancer Types ◽

Neutral Mutations ◽

Proliferative Advantage

An ongoing challenge to better understand and treat cancer is to distinguish neutral mutations, which do not affect tumor fitness, from those that provide a proliferative advantage. However, the variability of mutation rates has limited our ability to model patterns of neutral mutations and therefore identify cancer driver mutations. Here, we predict cancer-specific mutation rates genome-wide by leveraging deep neural networks to learn mutation rates within kilobase-scale regions and then refining these estimates to test for evidence of selection on combinations of mutations by comparing observed to expected mutation counts. We mapped mutation rates for 37 cancer types and used these maps to identify new putative drivers in understudied regions of the genome including cryptic alternative-splice sites, 5 prime untranslated regions and infrequently mutated genes. These results, available for exploration via web interface, indicate the potential for high-resolution neutral mutation models to empower further driver discovery as cancer sequencing cohorts grow.

Download Full-text

ConsensusDriver improves upon individual algorithms for predicting driver alterations in different cancer types and individual patients – a toolbox for precision oncology

10.1101/127985 ◽

2017 ◽

Cited By ~ 2

Author(s):

Denis Bertrand ◽

Sibyl Drissler ◽

Burton Chia ◽

Jia Yu Koh ◽

Li Chenhao ◽

...

Keyword(s):

Cancer Genomics ◽

Clinical Phenotype ◽

Genetic Alterations ◽

Driver Gene ◽

Prediction Methods ◽

Precision Oncology ◽

Cancer Driver ◽

Recurrent Mutations ◽

Cancer Types ◽

Patient Subgroups

AbstractBackgroundIn recent years, several large-scale cancer genomics studies have helped generate detailed molecular profiling datasets for many cancer types and thousands of patients. These datasets provide a unique resource for studying cancer driver prediction methods and their utility for precision oncology, both to predict driver genetic alterations in patient subgroups (e.g. defined by histology or clinical phenotype) or even individual patients.MethodsWe performed the most comprehensive assessment to date of 18 driver gene prediction methods, on more than 3,400 tumour samples, from 15 cancer types, to determine their suitability in guiding precision medicine efforts. These methods have diverse approaches, which can be classified into five categories:functionalimpact on proteins in general (FI) or specific tocancer (FIC),cohort-basedanalysis for recurrent mutations (CBA),mutations withexpressioncorrelation (MEC) and methods that use geneinteractionnetwork-basedanalysis (INA).ResultsThe performance of driver prediction methods varies considerably, with concordance with a gold-standard varying from 9% to 68%. FI methods show relatively poor performance (concordance <22%) while CBA methods provide conservative results, but require large sample sizes for high sensitivity. INA methods, through the integration of genomic and transcriptomic data, and FIC methods, by training cancer-specific models, provide the best trade-off between sensitivity and specificity. As the methods were found to predict different subsets of drivers, we propose a novel consensus-based approach, ConsensusDriver, which significantly improves the quality of predictions (20% increase in sensitivity). This tool can be applied to predict driver alterations in patient subgroups (e.g. defined by histology or clinical phenotype) or even individual patients.ConclusionExisting cancer driver prediction methods are based on very different assumptions and each of them can only detect a particular subset of driver events. Consensus-based methods, like ConsensusDriver, are thus a promising approach to harness the strengths of different driver prediction paradigms.

Download Full-text

CanDriS: posterior profiling of cancer-driving sites based on two-component evolutionary model

Briefings in Bioinformatics ◽

10.1093/bib/bbab131 ◽

2021 ◽

Author(s):

Wenyi Zhao ◽

Jingwen Yang ◽

Jingcheng Wu ◽

Guoxing Cai ◽

Yao Zhang ◽

...

Keyword(s):

Somatic Mutations ◽

Cancer Genomics ◽

Evolutionary Model ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Component Mixture ◽

Cancer Driver ◽

Two Component ◽

Potential Cancer

Abstract Current cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored. Due to the over-excess mutations unrelated to cancer, the great challenge is to identify somatic mutations that are cancer-driven. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model: while the ground component corresponds to passenger mutations, the rapidly evolving component corresponds to driver mutations. Then, we implemented an empirical Bayesian procedure to calculate the posterior probability of a site being cancer-driven. Based on these, we developed a software CanDriS (Cancer Driver Sites) to profile the potential cancer-driving sites for thousands of tumor samples from the Cancer Genome Atlas and International Cancer Genome Consortium across tumor types and pan-cancer level. As a result, we identified that approximately 1% of the sites have posterior probabilities larger than 0.90 and listed potential cancer-wide and cancer-specific driver mutations. By comprehensively profiling all potential cancer-driving sites, CanDriS greatly enhances our ability to refine our knowledge of the genetic basis of cancer and might guide clinical medication in the upcoming era of precision medicine. The results were displayed in a database CandrisDB (http://biopharm.zju.edu.cn/candrisdb/).

Download Full-text

CpG methylation accounts for genome-wide C>T mutation variation and cancer driver formation across cancer types

10.1101/106872 ◽

2017 ◽

Author(s):

Rebecca C. Poulos ◽

Jake Olivier ◽

Jason W. H. Wong

Keyword(s):

Cytosine Methylation ◽

Excision Repair ◽

Replication Timing ◽

Cpg Methylation ◽

Tissue Type ◽

Driver Mutations ◽

Cancer Subtypes ◽

Cancer Driver ◽

Timing Data ◽

Cancer Types

AbstractCytosine methylation (5mC) is vital for cellular function, and yet 5mC sites are also commonly mutated in the genome. In this study, we analyse the genomes of over 900 cancer samples, together with tissue type-specific methylation and replication timing data. We describe a strong mutation-methylation association in colorectal cancers with microsatellite instability (MSI) or withPolymerase epsilon (POLE)exonuclease domain mutation. We describe a potential role for mismatch repair in the correction of mismatches resulting from deamination of 5mC, and propose a mutator phenotype to exist inPOLE-mutant cancers specifically at 5mC sites. We also associatePOLE-mutant hotspot coding mutations inAPCandTP53with CpG methylation. Analysing mutations across additional cancer types, we identify nucleotide excision repair- and AID/APOBEC-induced processes to underlie differential mutation-methylation associations in certain cancer subtypes. This study reveals differential associations vital for accurately mapping regional variation in mutation density and pinpointing driver mutations in cancer.

Download Full-text

Glioblastoma signature in the DNA of blood-derived cells

PLoS ONE ◽

10.1371/journal.pone.0256831 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0256831

Author(s):

Siddharth Jain ◽

Bijan Mazaheri ◽

Netanel Raviv ◽

Jehoshua Bruck

Keyword(s):

Cancer Detection ◽

Tandem Repeat ◽

Early Cancer ◽

Current Approach ◽

Driver Mutations ◽

Genetic Mutations ◽

Cancer Driver ◽

Cancer Prediction ◽

Cancer Types ◽

Continuous Accumulation

Current approach for the detection of cancer is based on identifying genetic mutations typical to tumor cells. This approach is effective only when cancer has already emerged, however, it might be in a stage too advanced for effective treatment. Cancer is caused by the continuous accumulation of mutations; is it possible to measure the time-dependent information of mutation accumulation and predict the emergence of cancer? We hypothesize that the mutation history derived from the tandem repeat regions in blood-derived DNA carries information about the accumulation of the cancer driver mutations in other tissues. To validate our hypothesis, we computed the mutation histories from the tandem repeat regions in blood-derived exomic DNA of 3874 TCGA patients with different cancer types and found a statistically significant signal with specificity ranging from 66% to 93% differentiating Glioblastoma patients from other cancer patients. Our approach and findings offer a new direction for future cancer prediction and early cancer detection based on information derived from blood-derived DNA.

Download Full-text

Novel ratio-metric features enable the identification of new driver genes across cancer types

Scientific Reports ◽

10.1038/s41598-021-04015-y ◽

2022 ◽

Vol 12 (1) ◽

Author(s):

Malvika Sudhakar ◽

Raghunathan Rengaswamy ◽

Karthik Raman

Keyword(s):

Cancer Genomics ◽

Low Frequency ◽

Tumour Suppressor Genes ◽

Functional Categories ◽

Driver Genes ◽

Nonsense Mutations ◽

Cancer Driver ◽

Selective Growth Advantage ◽

Cancer Types ◽

Pan Cancer

AbstractAn emergent area of cancer genomics is the identification of driver genes. Driver genes confer a selective growth advantage to the cell. While several driver genes have been discovered, many remain undiscovered, especially those mutated at a low frequency across samples. This study defines new features and builds a pan-cancer model, cTaG, to identify new driver genes. The features capture the functional impact of the mutations as well as their recurrence across samples, which helps build a model unbiased to genes with low frequency. The model classifies genes into the functional categories of driver genes, tumour suppressor genes (TSGs) and oncogenes (OGs), having distinct mutation type profiles. We overcome overfitting and show that certain mutation types, such as nonsense mutations, are more important for classification. Further, cTaG was employed to identify tissue-specific driver genes. Some known cancer driver genes predicted by cTaG as TSGs with high probability are ARID1A, TP53, and RB1. In addition to these known genes, potential driver genes predicted are CD36, ZNF750 and ARHGAP35 as TSGs and TAB3 as an oncogene. Overall, our approach surmounts the issue of low recall and bias towards genes with high mutation rates and predicts potential new driver genes for further experimental screening. cTaG is available at https://github.com/RamanLab/cTaG.

Download Full-text

Novel ratio-metric features enable the identification of new driver genes across cancer types

10.1101/2020.01.17.910075 ◽

2020 ◽

Author(s):

Malvika Sudhakar ◽

Raghunathan Rengaswamy ◽

Karthik Raman

Keyword(s):

High Probability ◽

Cancer Genomics ◽

Low Frequency ◽

Mutation Rates ◽

Tumour Suppressor Genes ◽

Driver Genes ◽

Nonsense Mutations ◽

Cancer Driver ◽

Selective Growth Advantage ◽

Cancer Types

ABSTRACTAn emergent area of cancer genomics has been the identification of driver genes. Driver genes confer a selective growth advantage to the cell and push it towards tumorigenesis. Functionally, driver genes can be divided into two categories, tumour suppressor genes (TSGs) and oncogenes (OGs), which have distinct mutation type profiles. While several driver genes have been discovered, many remain undiscovered, especially those that are mutated at a low frequency across samples. The current methods are not sufficient to predict all driver genes because the underlying characteristics of these genes are not yet well understood. Thus, to predict novel genes, we need to define new features and models that are not biased and identify genes that might otherwise be overshadowed by mutation profiles of recurrent driver genes. In this study, we define new features and build a model to identify novel driver genes. We overcome overfitting and show that certain mutation types such as nonsense mutations are more important for classification. Some known cancer driver genes, which are predicted by the model as TSGs with high probability are ARID1A, TP53, and RB1. In addition to these known genes, potential driver genes predicted are CD36, ZNF750 and ARHGAP35 as TSGs and TAB3 as an oncogene. Overall, our approach surmounts the issue of low recall and bias towards genes with high mutation rates and predicts potential novel driver genes for further experimental screening.

Download Full-text

Pervasive conditional selection of driver mutations and modular epistasis networks in cancer

10.1101/2022.01.10.475617 ◽

2022 ◽

Author(s):

Jaime Iranzo ◽

George Gruenhagen ◽

Jorge Calle-Espinosa ◽

Eugene V. Koonin

Keyword(s):

Cancer Genomics ◽

Mutual Exclusion ◽

Driver Mutations ◽

Tumor Evolution ◽

Cancer Genes ◽

Cancer Subtypes ◽

Cancer Driver ◽

Conditional Selection ◽

Tcga Dataset ◽

Complex Scenario

Cancer driver mutations often display mutual exclusion or co-occurrence, underscoring the key role of epistasis in carcinogenesis. However, estimating the magnitude of epistatic interactions and their quantitative effect on tumor evolution remains a challenge. We developed a method to quantify COnditional SELection on the Excess of Nonsynonymous Substitutions (Coselens) in cancer genes. Coselens infers the number of drivers per gene in different partitions of a cancer genomics dataset using covariance-based mutation models and determines whether coding mutations in a gene affect selection for drivers in any other gene. Using Coselens, we identified 296 conditionally selected gene pairs across 16 cancer types in the TCGA dataset. Conditional selection accounts for 25-50% of driver substitutions in tumors with >2 drivers. Conditionally co-selected genes form modular networks, whose structures challenge the traditional interpretation of within-pathway mutual exclusivity and across-pathway synergy, suggesting a more complex scenario, where gene-specific across-pathway interactions shape differentiated cancer subtypes.

Download Full-text