Evaluating the Evaluation of Cancer Driver Genes

AbstractSequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, i.e., bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied when a gold standard is not available. We used this framework to compare the performance of eight driver gene prediction methods. One of these methods, newly described here, incorporated a machine learning-based ratiometric approach. We show that the driver genes predicted by each of these eight methods vary widely. Moreover, the p-values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.SignificanceModern large-scale sequencing of human cancers seeks to comprehensively discover mutated genes that confer a selective advantage to cancer cells. Key to this effort has been development of computational algorithms to find genes that drive cancer, based on their patterns of mutation in large patient cohorts. However, since there is no generally accepted gold standard of driver genes, it has been difficult to quantitatively compare these methods. We present a new machine learning method for driver gene prediction and a rigorous protocol to evaluate and compare prediction methods. Our results suggest that most current methods do not adequately account for heterogeneity in the number of mutations expected by chance and consequently have many false positive calls. The problem is most acute for cancers with high mutation rates and comprehensive discovery of drivers in these cancers may be more difficult than currently anticipated.

Download Full-text

Evaluating the evaluation of cancer driver genes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1616440113 ◽

2016 ◽

Vol 113 (50) ◽

pp. 14330-14335 ◽

Cited By ~ 160

Author(s):

Collin J. Tokheim ◽

Nickolas Papadopoulos ◽

Kenneth W. Kinzler ◽

Bert Vogelstein ◽

Rachel Karchin

Keyword(s):

Somatic Mutations ◽

Gene Prediction ◽

Gene Mutations ◽

Evaluation Framework ◽

Mutation Rates ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Bona Fide ◽

Cancer Driver Genes

Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machine-learning–based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.

Download Full-text

Combining Mutation and Gene Network Data in a Machine Learning Approach for False-Positive Cancer Driver Gene Discovery

Advances in Bioinformatics and Computational Biology - Lecture Notes in Computer Science ◽

10.1007/978-3-030-65775-8_8 ◽

2020 ◽

pp. 81-92

Author(s):

Jorge Francisco Cutigi ◽

Renato Feijo Evangelista ◽

Rodrigo Henrique Ramos ◽

Cynthia de Oliveira Lage Ferreira ◽

Adriane Feijo Evangelista ◽

...

Keyword(s):

Machine Learning ◽

False Positive ◽

Gene Network ◽

Gene Discovery ◽

Driver Gene ◽

Network Data ◽

Learning Approach ◽

Cancer Driver ◽

Cancer Driver Gene ◽

Machine Learning Approach

Download Full-text

LOTUS: a Single- and Multitask Machine Learning Algorithm for the Prediction of Cancer Driver Genes

10.1101/398537 ◽

2018 ◽

Cited By ~ 1

Author(s):

Olivier Collier ◽

Véronique Stoven ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Biological Networks ◽

Learning Strategy ◽

Gene Prediction ◽

Scoring Function ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Cancer Driver Genes

AbstractCancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types.In this paper we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including informations about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types.We empirically show that LOTUS outperforms three other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.Author summaryCancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by targeted therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous informations into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction softwares.

Download Full-text

Machine learning algorithm improved automated droplet classification of ddPCR for detection of BRAF V600E in paraffin-embedded samples

Scientific Reports ◽

10.1038/s41598-021-92014-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gabriel A. Colozza-Gama ◽

Fabiano Callegari ◽

Nikola Bešič ◽

Ana C. de J. Paviza ◽

Janete M. Cerutti

Keyword(s):

Machine Learning ◽

Sanger Sequencing ◽

Learning Algorithm ◽

Absolute Quantification ◽

Braf V600e Mutation ◽

Braf V600e ◽

Driver Genes ◽

Quantitative Classification ◽

Cancer Driver

AbstractSomatic mutations in cancer driver genes can help diagnosis, prognosis and treatment decisions. Formalin-fixed paraffin-embedded (FFPE) specimen is the main source of DNA for somatic mutation detection. To overcome constraints of DNA isolated from FFPE, we compared pyrosequencing and ddPCR analysis for absolute quantification of BRAF V600E mutation in the DNA extracted from FFPE specimens and compared the results to the qualitative detection information obtained by Sanger Sequencing. Sanger sequencing was able to detect BRAF V600E mutation only when it was present in more than 15% total alleles. Although the sensitivity of ddPCR is higher than that observed for Sanger, it was less consistent than pyrosequencing, likely due to droplet classification bias of FFPE-derived DNA. To address the droplet allocation bias in ddPCR analysis, we have compared different algorithms for automated droplet classification and next correlated these findings with those obtained from pyrosequencing. By examining the addition of non-classifiable droplets (rain) in ddPCR, it was possible to obtain better qualitative classification of droplets and better quantitative classification compared to no rain droplets, when considering pyrosequencing results. Notable, only the Machine learning k-NN algorithm was able to automatically classify the samples, surpassing manual classification based on no-template controls, which shows promise in clinical practice.

Download Full-text

driveR: a novel method for prioritizing cancer driver genes using somatic genomics data

BMC Bioinformatics ◽

10.1186/s12859-021-04203-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ege Ülgen ◽

O. Uğur Sezerman

Keyword(s):

Biological Knowledge ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Prior Biological Knowledge ◽

Wilcoxon Rank Sum Test ◽

Cancer Genomes ◽

Novel Method ◽

Cancer Driver Genes ◽

Batch Analysis

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.

Download Full-text

DriverDBv3: a multi-omics database for cancer driver gene research

Nucleic Acids Research ◽

10.1093/nar/gkz964 ◽

2019 ◽

Cited By ~ 9

Author(s):

Shu-Hsuan Liu ◽

Pei-Chun Shen ◽

Chen-Yang Chen ◽

An-Ni Hsu ◽

Yi-Chun Cho ◽

...

Keyword(s):

Cancer Biology ◽

Synergistic Effects ◽

Driver Gene ◽

Driver Genes ◽

Specific Patient ◽

Mutation Status ◽

Cancer Driver ◽

Cancer Driver Gene ◽

Significant Survival ◽

Gene Database

Abstract An integrative multi-omics database is needed urgently, because focusing only on analysis of one-dimensional data falls far short of providing an understanding of cancer. Previously, we presented DriverDB, a cancer driver gene database that applies published bioinformatics algorithms to identify driver genes/mutations. The updated DriverDBv3 database (http://ngs.ym.edu.tw/driverdb) is designed to interpret cancer omics’ sophisticated information with concise data visualization. To offer diverse insights into molecular dysregulation/dysfunction events, we incorporated computational tools to define CNV and methylation drivers. Further, four new features, CNV, Methylation, Survival, and miRNA, allow users to explore the relations from two perspectives in the ‘Cancer’ and ‘Gene’ sections. The ‘Survival’ panel offers not only significant survival genes, but gene pairs synergistic effects determine. A fresh function, ‘Survival Analysis’ in ‘Customized-analysis,’ allows users to investigate the co-occurring events in user-defined gene(s) by mutation status or by expression in a specific patient group. Moreover, we redesigned the web interface and provided interactive figures to interpret cancer omics’ sophisticated information, and also constructed a Summary panel in the ‘Cancer’ and ‘Gene’ sections to visualize the features on multi-omics levels concisely. DriverDBv3 seeks to improve the study of integrative cancer omics data by identifying driver genes and contributes to cancer biology.

Download Full-text

Interpreting pathways to discover cancer driver genes with Moonlight

Nature Communications ◽

10.1038/s41467-019-13803-0 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 9

Author(s):

Antonio Colaprico ◽

Catharina Olsen ◽

Matthew H. Bailey ◽

Gabriel J. Odom ◽

Thilde Terkelsen ◽

...

Keyword(s):

Tumor Suppressors ◽

Molecular Mechanisms ◽

Dual Role ◽

Tissue Type ◽

Driver Gene ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Therapeutic Decisions ◽

Cancer Driver Genes

AbstractCancer driver gene alterations influence cancer development, occurring in oncogenes, tumor suppressors, and dual role genes. Discovering dual role cancer genes is difficult because of their elusive context-dependent behavior. We define oncogenic mediators as genes controlling biological processes. With them, we classify cancer driver genes, unveiling their roles in cancer mechanisms. To this end, we present Moonlight, a tool that incorporates multiple -omics data to identify critical cancer driver genes. With Moonlight, we analyze 8000+ tumor samples from 18 cancer types, discovering 3310 oncogenic mediators, 151 having dual roles. By incorporating additional data (amplification, mutation, DNA methylation, chromatin accessibility), we reveal 1000+ cancer driver genes, corroborating known molecular mechanisms. Additionally, we confirm critical cancer driver genes by analysing cell-line datasets. We discover inactivation of tumor suppressors in intron regions and that tissue type and subtype indicate dual role status. These findings help explain tumor heterogeneity and could guide therapeutic decisions.

Download Full-text

Diversity spectrum analysis identifies mutation-specific effects of cancer driver genes

Communications Biology ◽

10.1038/s42003-019-0736-4 ◽

2020 ◽

Vol 3 (1) ◽

Cited By ~ 2

Author(s):

Xiaobao Dong ◽

Dandan Huang ◽

Xianfu Yi ◽

Shijie Zhang ◽

Zhao Wang ◽

...

Keyword(s):

Clinical Trials ◽

Spectrum Analysis ◽

Driver Mutations ◽

Driver Gene ◽

Cancer Type ◽

Driver Genes ◽

Drug Responses ◽

Cancer Driver ◽

Specific Effects ◽

Cancer Driver Genes

AbstractMutation-specific effects of cancer driver genes influence drug responses and the success of clinical trials. We reasoned that these effects could unbalance the distribution of each mutation across different cancer types, as a result, the cancer preference can be used to distinguish the effects of the causal mutation. Here, we developed a network-based framework to systematically measure cancer diversity for each driver mutation. We found that half of the driver genes harbor cancer type-specific and pancancer mutations simultaneously, suggesting that the pervasive functional heterogeneity of the mutations from even the same driver gene. We further demonstrated that the specificity of the mutations could influence patient drug responses. Moreover, we observed that diversity was generally increased in advanced tumors. Finally, we scanned potentially novel cancer driver genes based on the diversity spectrum. Diversity spectrum analysis provides a new approach to define driver mutations and optimize off-label clinical trials.

Download Full-text

Bayesian inference of cancer driver genes using signatures of positive selection

10.1101/059360 ◽

2017 ◽

Author(s):

Luis Zapata ◽

Hana Susak ◽

Oliver Drechsel ◽

Marc R. Friedländer ◽

Xavier Estivill ◽

...

Keyword(s):

Bayesian Inference ◽

Large Fraction ◽

Driver Gene ◽

Tumor Type ◽

Sequencing Data ◽

Cancer Etiology ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Cancer Cell Fraction

AbstractTumors are composed of an evolving population of cells subjected to tissue-specific selection, which fuels tumor heterogeneity and ultimately complicates cancer driver gene identification. Here, we integrate cancer cell fraction, population recurrence, and functional impact of somatic mutations as signatures of selection into a Bayesian inference model for driver prediction. In an in-depth benchmark, we demonstrate that our model, cDriver, outperforms competing methods when analyzing solid tumors, hematological malignancies, and pan-cancer datasets. Applying cDriver to exome sequencing data of 21 cancer types from 6,870 individuals revealed 98 unreported tumor type-driver gene connections. These novel connections are highly enriched for chromatin-modifying proteins, hinting at a universal role of chromatin regulation in cancer etiology. Although infrequently mutated as single genes, we show that chromatin modifiers are altered in a large fraction of cancer patients. In summary, we demonstrate that integration of evolutionary signatures is key for identifying mutational driver genes, thereby facilitating the discovery of novel therapeutic targets for cancer treatment.

Download Full-text

Model-based analysis of positive selection significantly expands the list of cancer driver genes, including RNA methyltransferases

10.1101/366823 ◽

2018 ◽

Author(s):

Siming Zhao ◽

Jun Liu ◽

Pranav Nanga ◽

Yuwen Liu ◽

A. Ercument Cicek ◽

...

Keyword(s):

Positive Selection ◽

Cancer Biology ◽

Spatial Clustering ◽

Strong Support ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Model Based ◽

Cancer Driver Genes ◽

Model Based Analysis

AbstractIdentifying driver genes is a central problem in cancer biology, and many methods have been developed to identify driver genes from somatic mutation data. However, existing methods either lack explicit statistical models, or rely on very simple models that do not capture complex features in somatic mutations of driver genes. Here, we present driverMAPS (Model-based Analysis of Positive Selection), a more comprehensive model-based approach to driver gene identification. This new method explicitly models, at the single-base level, the effects of positive selection in cancer driver genes as well as highly heterogeneous background mutational process. Its selection model captures elevated mutation rates in functionally important sites using multiple external annotations, as well as spatial clustering of mutations. Its background mutation model accounts for both known covariates and unexplained local variation. Simulations under realistic evolutionary models demonstrate that driverMAPS greatly improves the power of driver gene detection over state-of-the-art approaches. Applying driverMAPS to TCGA data across 20 tumor types identified 159 new potential driver genes. Cross-referencing this list with data from external sources strongly supports these findings. The novel genes include the mRNA methytransferases METTL3-METTL14, and we experimentally validated METTL3 as a potential tumor suppressor gene in bladder cancer. Our results thus provide strong support to the emerging hypothesis that mRNA modification is an important biological process underlying tumorigenesis.

Download Full-text