The consequences of variant calling decisions in secondary analyses of cancer sequencing data

The analysis of cancer genomes provides fundamental information about its aetiology, the processes driving cell transformation or potential treatments. The first crucial step in the analysis of any tumor genome is the identification of somatic genetic variants that cancer cells have acquired during their evolution. For that purpose, a wide range of somatic variant callers have been developed in recent years. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Here we present a study to elucidate whether different variant callers (MuSE, MuTect2, SomaticSniper, VarScan2) and strategies to combine them (Consensus and Union) lead to different results in these three important downstream analyses of cancer genomics data: identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants. To this end, we tested how the results of these three analyses varied depending on the somatic mutation caller in five different projects from The Cancer Genome Atlas (TCGA). Our results show that variant calling decisions have a significant impact on these downstream analyses, creating important differences in driver genes identification and mutational processes attribution among variant call sets, as well as in the detection of clinically actionable targets. More importantly, it seems that Consensus, a very widely used strategy by the research community, is not the optimal strategy, as it can lead to the loss of some cancer driver genes and actionable mutations. On the other hand, the Union seems to be a legit strategy for some downstream analyses with a robust performance overall.

Download Full-text

Erratum: Corrigendum: Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

Scientific Reports ◽

10.1038/srep32906 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Ho Jang ◽

Youngmi Hur ◽

Hyunju Lee

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genomic Alterations ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes

Download Full-text

Identification of cancer driver genes in focal genomic aberrations from whole-exome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btx620 ◽

2017 ◽

Vol 34 (3) ◽

pp. 519-521 ◽

Cited By ~ 1

Author(s):

Ho Jang ◽

Hyunju Lee

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Driver Genes ◽

Cancer Driver ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Genomic Aberrations ◽

Cancer Driver Genes

Download Full-text

Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

Scientific Reports ◽

10.1038/srep25582 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 3

Author(s):

Ho Jang ◽

Youngmi Hur ◽

Hyunju Lee

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Snp Array ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genomic Alterations ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes

Abstract DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.

Download Full-text

Evaluating machine learning methodologies for identification of cancer driver genes

Scientific Reports ◽

10.1038/s41598-021-91656-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sharaf J. Malebary ◽

Yaser Daanial Khan

Keyword(s):

High Throughput Sequencing ◽

Independent Set ◽

Cancer Information ◽

Sequencing Data ◽

Driver Genes ◽

Cancer Driver ◽

Information Sets ◽

Functional Advantage ◽

Cancer Driver Genes ◽

Validation Tests

AbstractCancer is driven by distinctive sorts of changes and basic variations in genes. Recognizing cancer driver genes is basic for accurate oncological analysis. Numerous methodologies to distinguish and identify drivers presently exist, but efficient tools to combine and optimize them on huge datasets are few. Most strategies for prioritizing transformations depend basically on frequency-based criteria. Strategies are required to dependably prioritize organically dynamic driver changes over inert passengers in high-throughput sequencing cancer information sets. This study proposes a model namely PCDG-Pred which works as a utility capable of distinguishing cancer driver and passenger attributes of genes based on sequencing data. Keeping in view the significance of the cancer driver genes an efficient method is proposed to identify the cancer driver genes. Further, various validation techniques are applied at different levels to establish the effectiveness of the model and to obtain metrics like accuracy, Mathew’s correlation coefficient, sensitivity, and specificity. The results of the study strongly indicate that the proposed strategy provides a fundamental functional advantage over other existing strategies for cancer driver genes identification. Subsequently, careful experiments exhibit that the accuracy metrics obtained for self-consistency, independent set, and cross-validation tests are 91.08%., 87.26%, and 92.48% respectively.

Download Full-text

Contextual Classifications of Cancer Driver Genes

10.1101/715508 ◽

2019 ◽

Author(s):

Pramod Chandrashekar ◽

Navid Ahmadinejad ◽

Junwen Wang ◽

Aleksandar Sekulic ◽

Jan B. Egan ◽

...

Keyword(s):

Computational Method ◽

Cancer Type ◽

Sequencing Data ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Link Type ◽

Mutational Hotspots ◽

Cancer Types ◽

Cancer Driver Genes

ABSTRACTFunctions of cancer driver genes depend on cellular contexts that vary substantially across tissues and organs. Distinguishing oncogenes (OGs) and tumor suppressor genes (TSGs) for each cancer type is critical to identifying clinically actionable targets. However, current resources for context-aware classifications of cancer drivers are limited. In this study, we show that the direction and magnitude of somatic selection of missense and truncating mutations of a gene are suggestive of its contextual activities. By integrating these features with ratiometric and conservation measures, we developed a computational method to categorize OGs and TSGs using exome sequencing data. This new method, named genes under selection in tumors (GUST) shows an overall accuracy of 0.94 when tested on manually curated benchmarks. Application of GUST to 10,172 tumor exomes of 33 cancer types identified 98 OGs and 179 TSGs, >70% of which promote tumorigenesis in only one cancer type. In broad-spectrum drivers shared across multiple cancer types, we found heterogeneous mutational hotspots modifying distinct functional domains, implicating the synchrony of convergent and divergent disease mechanisms. We further discovered two novel OGs and 28 novel TSGs with high confidence. The GUST program is available at https://github.com/liliulab/gust. A database with pre-computed classifications is available at https://liliulab.shinyapps.io/gust

Download Full-text

Identification of Cancer Driver Genes from a Custom Set of Next Generation Sequencing Data

Methods in Molecular Biology - Cancer Driver Genes ◽

10.1007/978-1-4939-8967-6_2 ◽

2018 ◽

pp. 19-36

Author(s):

Shu-Hsuan Liu ◽

Wei-Chung Cheng

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes ◽

Generation Sequencing

Download Full-text

Utilizing patient information to identify subtype heterogeneity of cancer driver genes

Statistical Methods in Medical Research ◽

10.1177/09622802211055854 ◽

2021 ◽

pp. 096228022110558

Author(s):

Ho-Hsiang Wu ◽

Xing Hua ◽

Jianxin Shi ◽

Nilanjan Chatterjee ◽

Bin Zhu

Keyword(s):

Type I Error ◽

Smoking Status ◽

The Cancer Genome Atlas ◽

Type I ◽

Driver Genes ◽

Cancer Subtypes ◽

Cancer Driver ◽

The Status ◽

Cancer Genome Atlas ◽

Cancer Driver Genes

Identifying cancer driver genes is essential for understanding the mechanisms of carcinogenesis and designing therapeutic strategies. Although driver genes have been identified for many cancer types, it is still not clear whether the selection pressure of driver genes is homogeneous across cancer subtypes. We propose a statistical framework MutScot to improve the identification of driver genes and to investigate the heterogeneity of driver genes across cancer subtypes. Through simulation studies, we show that MutScot properly controls the type I error in detecting driver genes. In addition, we demonstrate that MutScot can identify subtype heterogeneity of driver genes. Applications to three studies in The Cancer Genome Atlas (TCGA) project showcase that MutScot has a desirable sensitivity for detecting driver genes and that MutScot identifies subtype heterogeneity of driver genes in breast cancer and lung cancer with regards to the status of hormone receptor and to the smoking status, respectively.

Download Full-text

Large-Scale Transcriptome Analysis Identified a Novel Cancer Driver Genes Signature for Predicting the Prognostic of Patients With Hepatocellular Carcinoma

Frontiers in Pharmacology ◽

10.3389/fphar.2021.638622 ◽

2021 ◽

Vol 12 ◽

Author(s):

Gao Li ◽

Xiaowei Du ◽

Xiaoxiong Wu ◽

Shen Wu ◽

Yufei Zhang ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Critical Role ◽

Risk Groups ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Functional Enrichment ◽

Risk Scores ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes

Background: Hepatocellular carcinoma (HCC) is a common malignant tumor with high mortality and heterogeneity. Genetic mutations caused by driver genes are important contributors to the formation of the tumor microenvironment. The purpose of this study is to discuss the expression of cancer driver genes in tumor tissues and their clinical value in predicting the prognosis of HCC.Methods: All data were sourced from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and Gene Expression Omnibus (GEO) public databases. Differentially expressed and prognostic genes were screened by the expression distribution of the cancer driver genes and their relationship with survival. Candidate genes were subjected to functional enrichment and transcription factor regulatory network. We further constructed a prognostic signature and analyzed the survival outcomes and immune status between different risk groups.Results: Most cancer driver genes are specifically expressed in cancer tissues. Driver genes may influence HCC progression through processes such as transcription, cell cycle, and T-cell receptor-related pathways. Patients in different risk groups had significant survival differences (p < 0.05), and risk scores showed high predictive efficacy (AUC>0.69). Besides, risk subgroups were also associated with multiple immune functions and immune cell content.Conclusion: We confirmed the critical role of cancer driver genes in mediating HCC progression and the immune microenvironment. Risk subgroups contribute to the assessment of prognostic value in different patients and explain the heterogeneity of HCC.

Download Full-text