scholarly journals Evaluating machine learning methodologies for identification of cancer driver genes

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sharaf J. Malebary ◽  
Yaser Daanial Khan

AbstractCancer is driven by distinctive sorts of changes and basic variations in genes. Recognizing cancer driver genes is basic for accurate oncological analysis. Numerous methodologies to distinguish and identify drivers presently exist, but efficient tools to combine and optimize them on huge datasets are few. Most strategies for prioritizing transformations depend basically on frequency-based criteria. Strategies are required to dependably prioritize organically dynamic driver changes over inert passengers in high-throughput sequencing cancer information sets. This study proposes a model namely PCDG-Pred which works as a utility capable of distinguishing cancer driver and passenger attributes of genes based on sequencing data. Keeping in view the significance of the cancer driver genes an efficient method is proposed to identify the cancer driver genes. Further, various validation techniques are applied at different levels to establish the effectiveness of the model and to obtain metrics like accuracy, Mathew’s correlation coefficient, sensitivity, and specificity. The results of the study strongly indicate that the proposed strategy provides a fundamental functional advantage over other existing strategies for cancer driver genes identification. Subsequently, careful experiments exhibit that the accuracy metrics obtained for self-consistency, independent set, and cross-validation tests are 91.08%., 87.26%, and 92.48% respectively.

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Ho Jang ◽  
Youngmi Hur ◽  
Hyunju Lee

Abstract DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.


2019 ◽  
Author(s):  
Pramod Chandrashekar ◽  
Navid Ahmadinejad ◽  
Junwen Wang ◽  
Aleksandar Sekulic ◽  
Jan B. Egan ◽  
...  

ABSTRACTFunctions of cancer driver genes depend on cellular contexts that vary substantially across tissues and organs. Distinguishing oncogenes (OGs) and tumor suppressor genes (TSGs) for each cancer type is critical to identifying clinically actionable targets. However, current resources for context-aware classifications of cancer drivers are limited. In this study, we show that the direction and magnitude of somatic selection of missense and truncating mutations of a gene are suggestive of its contextual activities. By integrating these features with ratiometric and conservation measures, we developed a computational method to categorize OGs and TSGs using exome sequencing data. This new method, named genes under selection in tumors (GUST) shows an overall accuracy of 0.94 when tested on manually curated benchmarks. Application of GUST to 10,172 tumor exomes of 33 cancer types identified 98 OGs and 179 TSGs, >70% of which promote tumorigenesis in only one cancer type. In broad-spectrum drivers shared across multiple cancer types, we found heterogeneous mutational hotspots modifying distinct functional domains, implicating the synchrony of convergent and divergent disease mechanisms. We further discovered two novel OGs and 28 novel TSGs with high confidence. The GUST program is available at https://github.com/liliulab/gust. A database with pre-computed classifications is available at https://liliulab.shinyapps.io/gust


Author(s):  
Carlos Garcia-Prieto ◽  
Francisco Martinez Jimenez ◽  
Alfonso Valencia ◽  
Eduard Porta-Pardo

The analysis of cancer genomes provides fundamental information about its aetiology, the processes driving cell transformation or potential treatments. The first crucial step in the analysis of any tumor genome is the identification of somatic genetic variants that cancer cells have acquired during their evolution. For that purpose, a wide range of somatic variant callers have been developed in recent years. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Here we present a study to elucidate whether different variant callers (MuSE, MuTect2, SomaticSniper, VarScan2) and strategies to combine them (Consensus and Union) lead to different results in these three important downstream analyses of cancer genomics data: identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants. To this end, we tested how the results of these three analyses varied depending on the somatic mutation caller in five different projects from The Cancer Genome Atlas (TCGA). Our results show that variant calling decisions have a significant impact on these downstream analyses, creating important differences in driver genes identification and mutational processes attribution among variant call sets, as well as in the detection of clinically actionable targets. More importantly, it seems that Consensus, a very widely used strategy by the research community, is not the optimal strategy, as it can lead to the loss of some cancer driver genes and actionable mutations. On the other hand, the Union seems to be a legit strategy for some downstream analyses with a robust performance overall.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ege Ülgen ◽  
O. Uğur Sezerman

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.


EBioMedicine ◽  
2018 ◽  
Vol 27 ◽  
pp. 156-166 ◽  
Author(s):  
Magali Champion ◽  
Kevin Brennan ◽  
Tom Croonenborghs ◽  
Andrew J. Gentles ◽  
Nathalie Pochet ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document