Novel ratio-metric features enable the identification of new driver genes across cancer types

ABSTRACTAn emergent area of cancer genomics has been the identification of driver genes. Driver genes confer a selective growth advantage to the cell and push it towards tumorigenesis. Functionally, driver genes can be divided into two categories, tumour suppressor genes (TSGs) and oncogenes (OGs), which have distinct mutation type profiles. While several driver genes have been discovered, many remain undiscovered, especially those that are mutated at a low frequency across samples. The current methods are not sufficient to predict all driver genes because the underlying characteristics of these genes are not yet well understood. Thus, to predict novel genes, we need to define new features and models that are not biased and identify genes that might otherwise be overshadowed by mutation profiles of recurrent driver genes. In this study, we define new features and build a model to identify novel driver genes. We overcome overfitting and show that certain mutation types such as nonsense mutations are more important for classification. Some known cancer driver genes, which are predicted by the model as TSGs with high probability are ARID1A, TP53, and RB1. In addition to these known genes, potential driver genes predicted are CD36, ZNF750 and ARHGAP35 as TSGs and TAB3 as an oncogene. Overall, our approach surmounts the issue of low recall and bias towards genes with high mutation rates and predicts potential novel driver genes for further experimental screening.

Download Full-text

Novel ratio-metric features enable the identification of new driver genes across cancer types

Scientific Reports ◽

10.1038/s41598-021-04015-y ◽

2022 ◽

Vol 12 (1) ◽

Author(s):

Malvika Sudhakar ◽

Raghunathan Rengaswamy ◽

Karthik Raman

Keyword(s):

Cancer Genomics ◽

Low Frequency ◽

Tumour Suppressor Genes ◽

Functional Categories ◽

Driver Genes ◽

Nonsense Mutations ◽

Cancer Driver ◽

Selective Growth Advantage ◽

Cancer Types ◽

Pan Cancer

AbstractAn emergent area of cancer genomics is the identification of driver genes. Driver genes confer a selective growth advantage to the cell. While several driver genes have been discovered, many remain undiscovered, especially those mutated at a low frequency across samples. This study defines new features and builds a pan-cancer model, cTaG, to identify new driver genes. The features capture the functional impact of the mutations as well as their recurrence across samples, which helps build a model unbiased to genes with low frequency. The model classifies genes into the functional categories of driver genes, tumour suppressor genes (TSGs) and oncogenes (OGs), having distinct mutation type profiles. We overcome overfitting and show that certain mutation types, such as nonsense mutations, are more important for classification. Further, cTaG was employed to identify tissue-specific driver genes. Some known cancer driver genes predicted by cTaG as TSGs with high probability are ARID1A, TP53, and RB1. In addition to these known genes, potential driver genes predicted are CD36, ZNF750 and ARHGAP35 as TSGs and TAB3 as an oncogene. Overall, our approach surmounts the issue of low recall and bias towards genes with high mutation rates and predicts potential new driver genes for further experimental screening. cTaG is available at https://github.com/RamanLab/cTaG.

Download Full-text

Ranking cancer drivers via betweenness-based outlier detection and random walks

BMC Bioinformatics ◽

10.1186/s12859-021-03989-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Cesim Erten ◽

Aissa Houdjedj ◽

Hilal Kazan

Keyword(s):

Cancer Genomics ◽

Interaction Network ◽

Molecular Data ◽

Alternative Methods ◽

Patient Specific ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Protein Protein Interaction ◽

Genomic Studies

Abstract Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.

Download Full-text

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Download Full-text

Bayesian inference of cancer driver genes using signatures of positive selection

10.1101/059360 ◽

2017 ◽

Author(s):

Luis Zapata ◽

Hana Susak ◽

Oliver Drechsel ◽

Marc R. Friedländer ◽

Xavier Estivill ◽

...

Keyword(s):

Bayesian Inference ◽

Large Fraction ◽

Driver Gene ◽

Tumor Type ◽

Sequencing Data ◽

Cancer Etiology ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Cancer Cell Fraction

AbstractTumors are composed of an evolving population of cells subjected to tissue-specific selection, which fuels tumor heterogeneity and ultimately complicates cancer driver gene identification. Here, we integrate cancer cell fraction, population recurrence, and functional impact of somatic mutations as signatures of selection into a Bayesian inference model for driver prediction. In an in-depth benchmark, we demonstrate that our model, cDriver, outperforms competing methods when analyzing solid tumors, hematological malignancies, and pan-cancer datasets. Applying cDriver to exome sequencing data of 21 cancer types from 6,870 individuals revealed 98 unreported tumor type-driver gene connections. These novel connections are highly enriched for chromatin-modifying proteins, hinting at a universal role of chromatin regulation in cancer etiology. Although infrequently mutated as single genes, we show that chromatin modifiers are altered in a large fraction of cancer patients. In summary, we demonstrate that integration of evolutionary signatures is key for identifying mutational driver genes, thereby facilitating the discovery of novel therapeutic targets for cancer treatment.

Download Full-text

The cancer-mutation network and the number and specificity of driver mutations

10.1101/237016 ◽

2017 ◽

Author(s):

Jaime Iranzo ◽

Iñigo Martincorena ◽

Eugene V. Koonin

Keyword(s):

Linear Regression ◽

Cancer Genomics ◽

Driver Mutations ◽

Bipartite Network ◽

Cancer Driver ◽

Cancer Mutation ◽

Cancer Types ◽

Extensive Information ◽

The Mean ◽

Network Component

AbstractCancer genomics has produced extensive information on cancer-associated genes but the number and specificity of cancer driver mutations remains a matter of debate. We constructed a bipartite network in which 7665 tumors from 30 cancer types are connected via shared mutations in 198 previously identified cancer-associated genes. We show that 27% of the tumors can be assigned to statistically supported modules, most of which encompass 1-2 cancer types. The rest of the tumors belong to a diffuse network component suggesting lower gene-specificity of driver mutations. Linear regression of the mutational loads in cancer-associated genes was used to estimate the number of drivers required for the onset of different cancers. The mean number of drivers is ~2, with a range of 1 to 5. Cancers that are associated to modules had more drivers than those from the diffuse network component, suggesting that unidentified and/or interchangeable drivers exist in the latter.

Download Full-text

Whole exome precision oncology targeting synthetic lethal vulnerabilities across the tumor transcriptome

10.1101/2020.02.16.951699 ◽

2020 ◽

Cited By ~ 1

Author(s):

Joo Sang Lee ◽

Nishanth Ulhas Nair ◽

Lesley Chapman ◽

Sanju Sinha ◽

Kun Wang ◽

...

Keyword(s):

Predictive Performance ◽

Patient Treatment ◽

Precision Oncology ◽

Driver Genes ◽

Patient Response ◽

Synthetic Lethal ◽

Cancer Driver ◽

Whole Exome ◽

Transcriptomics Data ◽

Cancer Types

AbstractPrecision oncology has made significant advances in the last few years, mainly by targeting actionable mutations in cancer driver genes. However, the proportion of patients whose tumors can be targeted therapeutically remains limited. Recent studies have begun to explore the benefit of analyzing tumor transcriptomics data to guide patient treatment, raising the need for new approaches for systematically accomplishing that. Here we show that computationally derived genetic interactions can successfully predict patient response. Assembling a broad repertoire of 32 datasets spanning more than 1,500 patients and including both tumor transcriptomics and response data, we predicted the response in 17 out of 21 targeted and 8 out of 11 checkpoint therapy datasets across 8 different cancer types with considerable accuracy, without ever training on these datasets. Analyzing the recently published multi-arm WINTHER trial, we show that the fraction of patients benefitting from transcriptomic-based treatments could potentially be markedly increased from 15% to about 85% by targeting synthetic lethal vulnerabilities in their tumors. In summary, this is the first computational approach to obtain considerable predictive performance across many different targeted and immunotherapy datasets, providing a promising new way for guiding cancer treatment based on the tumor transcriptomics of cancer patients.

Download Full-text

TC3A: The Cancer 3′ UTR Atlas

Nucleic Acids Research ◽

10.1093/nar/gkx892 ◽

2017 ◽

Vol 46 (D1) ◽

pp. D1027-D1030 ◽

Cited By ~ 8

Author(s):

Xin Feng ◽

Lei Li ◽

Eric J Wagner ◽

Wei Li

Keyword(s):

Tumor Growth ◽

Cellular Proliferation ◽

Alternative Polyadenylation ◽

Prognostic Biomarkers ◽

Driver Genes ◽

Cancer Driver ◽

Large Community ◽

Cancer Types

AbstractWidespread alternative polyadenylation (APA) occurs during enhanced cellular proliferation and transformation. Recently, we demonstrated that CFIm25-mediated 3′ UTR shortening through APA promotes glioblastoma tumor growth in vitro and in vivo, further underscoring its significance to tumorigenesis. Here, we report The Cancer 3′ UTR Atlas (TC3A), a comprehensive resource of APA usage for 10,537 tumors across 32 cancer types. These APA events represent potentially novel prognostic biomarkers and may uncover novel mechanisms for the regulation of cancer driver genes. TC3A is built on top of the now de facto standard cBioPortal. Therefore, the large community of existing cBioPortal users and clinical researchers will find TC3A familiar and immediately usable. TC3A is currently fully functional and freely available at http://tc3a.org.

Download Full-text

Learning the mutational landscape of the cancer genome

10.1101/2021.08.03.454669 ◽

2021 ◽

Author(s):

Maxwell A Sherman ◽

Adam Yaari ◽

Oliver Priebe ◽

Felix Dietlein ◽

Po-Ru Loh ◽

...

Keyword(s):

Deep Neural Networks ◽

Mutation Rates ◽

Untranslated Regions ◽

Driver Mutations ◽

Web Interface ◽

Cancer Driver ◽

Genome Wide ◽

Cancer Types ◽

Neutral Mutations ◽

Proliferative Advantage

An ongoing challenge to better understand and treat cancer is to distinguish neutral mutations, which do not affect tumor fitness, from those that provide a proliferative advantage. However, the variability of mutation rates has limited our ability to model patterns of neutral mutations and therefore identify cancer driver mutations. Here, we predict cancer-specific mutation rates genome-wide by leveraging deep neural networks to learn mutation rates within kilobase-scale regions and then refining these estimates to test for evidence of selection on combinations of mutations by comparing observed to expected mutation counts. We mapped mutation rates for 37 cancer types and used these maps to identify new putative drivers in understudied regions of the genome including cryptic alternative-splice sites, 5 prime untranslated regions and infrequently mutated genes. These results, available for exploration via web interface, indicate the potential for high-resolution neutral mutation models to empower further driver discovery as cancer sequencing cohorts grow.

Download Full-text

ConsensusDriver improves upon individual algorithms for predicting driver alterations in different cancer types and individual patients – a toolbox for precision oncology

10.1101/127985 ◽

2017 ◽

Cited By ~ 2

Author(s):

Denis Bertrand ◽

Sibyl Drissler ◽

Burton Chia ◽

Jia Yu Koh ◽

Li Chenhao ◽

...

Keyword(s):

Cancer Genomics ◽

Clinical Phenotype ◽

Genetic Alterations ◽

Driver Gene ◽

Prediction Methods ◽

Precision Oncology ◽

Cancer Driver ◽

Recurrent Mutations ◽

Cancer Types ◽

Patient Subgroups

AbstractBackgroundIn recent years, several large-scale cancer genomics studies have helped generate detailed molecular profiling datasets for many cancer types and thousands of patients. These datasets provide a unique resource for studying cancer driver prediction methods and their utility for precision oncology, both to predict driver genetic alterations in patient subgroups (e.g. defined by histology or clinical phenotype) or even individual patients.MethodsWe performed the most comprehensive assessment to date of 18 driver gene prediction methods, on more than 3,400 tumour samples, from 15 cancer types, to determine their suitability in guiding precision medicine efforts. These methods have diverse approaches, which can be classified into five categories:functionalimpact on proteins in general (FI) or specific tocancer (FIC),cohort-basedanalysis for recurrent mutations (CBA),mutations withexpressioncorrelation (MEC) and methods that use geneinteractionnetwork-basedanalysis (INA).ResultsThe performance of driver prediction methods varies considerably, with concordance with a gold-standard varying from 9% to 68%. FI methods show relatively poor performance (concordance <22%) while CBA methods provide conservative results, but require large sample sizes for high sensitivity. INA methods, through the integration of genomic and transcriptomic data, and FIC methods, by training cancer-specific models, provide the best trade-off between sensitivity and specificity. As the methods were found to predict different subsets of drivers, we propose a novel consensus-based approach, ConsensusDriver, which significantly improves the quality of predictions (20% increase in sensitivity). This tool can be applied to predict driver alterations in patient subgroups (e.g. defined by histology or clinical phenotype) or even individual patients.ConclusionExisting cancer driver prediction methods are based on very different assumptions and each of them can only detect a particular subset of driver events. Consensus-based methods, like ConsensusDriver, are thus a promising approach to harness the strengths of different driver prediction paradigms.

Download Full-text

DriverRWH: Discovering Cancer Driver Genes By Random Walk On a Gene Mutation Hypergraph

10.21203/rs.3.rs-1192205/v1 ◽

2021 ◽

Author(s):

Chenye Wang ◽

Junhan Shi ◽

Jiansheng Cai ◽

Yusen Zhang ◽

Xiaoqi Zheng ◽

...

Keyword(s):

Random Walk ◽

Candidate Genes ◽

Gene Mutation ◽

Network Data ◽

Cumulative Number ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Mutation Data ◽

Cancer Driver Genes

Abstract Background: Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few driver mutation genes from a much larger number of passenger mutation genes. However, majority of existing computational approaches underuse the co-occurrence information of the individuals, which deems to be important in tumorigenesis and tumor progression. Driver gene list predicted from these tools are prone to be false positive, recent research is far from achieving the ultimate goal of discovering a complete catalog of driver genes. Results: To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas (TCGA), DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve (AUC) scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data. Conclusion: DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies.

Download Full-text