Evaluating machine learning methodologies for identification of cancer driver genes

AbstractCancer is driven by distinctive sorts of changes and basic variations in genes. Recognizing cancer driver genes is basic for accurate oncological analysis. Numerous methodologies to distinguish and identify drivers presently exist, but efficient tools to combine and optimize them on huge datasets are few. Most strategies for prioritizing transformations depend basically on frequency-based criteria. Strategies are required to dependably prioritize organically dynamic driver changes over inert passengers in high-throughput sequencing cancer information sets. This study proposes a model namely PCDG-Pred which works as a utility capable of distinguishing cancer driver and passenger attributes of genes based on sequencing data. Keeping in view the significance of the cancer driver genes an efficient method is proposed to identify the cancer driver genes. Further, various validation techniques are applied at different levels to establish the effectiveness of the model and to obtain metrics like accuracy, Mathew’s correlation coefficient, sensitivity, and specificity. The results of the study strongly indicate that the proposed strategy provides a fundamental functional advantage over other existing strategies for cancer driver genes identification. Subsequently, careful experiments exhibit that the accuracy metrics obtained for self-consistency, independent set, and cross-validation tests are 91.08%., 87.26%, and 92.48% respectively.

Download Full-text

Erratum: Corrigendum: Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

Scientific Reports ◽

10.1038/srep32906 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Ho Jang ◽

Youngmi Hur ◽

Hyunju Lee

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genomic Alterations ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes

Download Full-text

Identification of cancer driver genes in focal genomic aberrations from whole-exome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btx620 ◽

2017 ◽

Vol 34 (3) ◽

pp. 519-521 ◽

Cited By ~ 1

Author(s):

Ho Jang ◽

Hyunju Lee

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Driver Genes ◽

Cancer Driver ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Genomic Aberrations ◽

Cancer Driver Genes

Download Full-text

Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

Scientific Reports ◽

10.1038/srep25582 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 3

Author(s):

Ho Jang ◽

Youngmi Hur ◽

Hyunju Lee

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Snp Array ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genomic Alterations ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes

Abstract DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.

Download Full-text

Contextual Classifications of Cancer Driver Genes

10.1101/715508 ◽

2019 ◽

Author(s):

Pramod Chandrashekar ◽

Navid Ahmadinejad ◽

Junwen Wang ◽

Aleksandar Sekulic ◽

Jan B. Egan ◽

...

Keyword(s):

Computational Method ◽

Cancer Type ◽

Sequencing Data ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Link Type ◽

Mutational Hotspots ◽

Cancer Types ◽

Cancer Driver Genes

ABSTRACTFunctions of cancer driver genes depend on cellular contexts that vary substantially across tissues and organs. Distinguishing oncogenes (OGs) and tumor suppressor genes (TSGs) for each cancer type is critical to identifying clinically actionable targets. However, current resources for context-aware classifications of cancer drivers are limited. In this study, we show that the direction and magnitude of somatic selection of missense and truncating mutations of a gene are suggestive of its contextual activities. By integrating these features with ratiometric and conservation measures, we developed a computational method to categorize OGs and TSGs using exome sequencing data. This new method, named genes under selection in tumors (GUST) shows an overall accuracy of 0.94 when tested on manually curated benchmarks. Application of GUST to 10,172 tumor exomes of 33 cancer types identified 98 OGs and 179 TSGs, >70% of which promote tumorigenesis in only one cancer type. In broad-spectrum drivers shared across multiple cancer types, we found heterogeneous mutational hotspots modifying distinct functional domains, implicating the synchrony of convergent and divergent disease mechanisms. We further discovered two novel OGs and 28 novel TSGs with high confidence. The GUST program is available at https://github.com/liliulab/gust. A database with pre-computed classifications is available at https://liliulab.shinyapps.io/gust

Download Full-text

Identification of Cancer Driver Genes from a Custom Set of Next Generation Sequencing Data

Methods in Molecular Biology - Cancer Driver Genes ◽

10.1007/978-1-4939-8967-6_2 ◽

2018 ◽

pp. 19-36

Author(s):

Shu-Hsuan Liu ◽

Wei-Chung Cheng

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes ◽

Generation Sequencing

Download Full-text

The consequences of variant calling decisions in secondary analyses of cancer sequencing data

10.1101/2020.01.29.924860 ◽

2020 ◽

Cited By ~ 1

Author(s):

Carlos Garcia-Prieto ◽

Francisco Martinez Jimenez ◽

Alfonso Valencia ◽

Eduard Porta-Pardo

Keyword(s):

Cancer Genomics ◽

Cell Transformation ◽

Variant Calling ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Driver Genes ◽

Somatic Variant ◽

Cancer Driver ◽

Wide Range ◽

Cancer Driver Genes

The analysis of cancer genomes provides fundamental information about its aetiology, the processes driving cell transformation or potential treatments. The first crucial step in the analysis of any tumor genome is the identification of somatic genetic variants that cancer cells have acquired during their evolution. For that purpose, a wide range of somatic variant callers have been developed in recent years. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Here we present a study to elucidate whether different variant callers (MuSE, MuTect2, SomaticSniper, VarScan2) and strategies to combine them (Consensus and Union) lead to different results in these three important downstream analyses of cancer genomics data: identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants. To this end, we tested how the results of these three analyses varied depending on the somatic mutation caller in five different projects from The Cancer Genome Atlas (TCGA). Our results show that variant calling decisions have a significant impact on these downstream analyses, creating important differences in driver genes identification and mutational processes attribution among variant call sets, as well as in the detection of clinically actionable targets. More importantly, it seems that Consensus, a very widely used strategy by the research community, is not the optimal strategy, as it can lead to the loss of some cancer driver genes and actionable mutations. On the other hand, the Union seems to be a legit strategy for some downstream analyses with a robust performance overall.

Download Full-text

Faculty Opinions recommendation of Evaluating the evaluation of cancer driver genes.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727060594.793535346 ◽

2017 ◽

Author(s):

Ron Shamir

Keyword(s):

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes

Download Full-text

driveR: a novel method for prioritizing cancer driver genes using somatic genomics data

BMC Bioinformatics ◽

10.1186/s12859-021-04203-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ege Ülgen ◽

O. Uğur Sezerman

Keyword(s):

Biological Knowledge ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Prior Biological Knowledge ◽

Wilcoxon Rank Sum Test ◽

Cancer Genomes ◽

Novel Method ◽

Cancer Driver Genes ◽

Batch Analysis

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.

Download Full-text