MEXCOWalk: Mutual Exclusion and Coverage Based Random Walk to Identify Cancer Modules

AbstractMotivationGenomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules.ResultsWe present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions, mutual exclusion, and coverage to identify cancer driver modules. MEXCOWalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples, and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code, and useful scripts are available at:https://github.com/abu-compbio/[email protected]

Download Full-text

MEXCOwalk: mutual exclusion and coverage based random walk to identify cancer modules

Bioinformatics ◽

10.1093/bioinformatics/btz655 ◽

2019 ◽

Cited By ~ 2

Author(s):

Rafsan Ahmed ◽

Ilyes Baali ◽

Cesim Erten ◽

Evis Hoxha ◽

Hilal Kazan

Keyword(s):

Random Walk ◽

Supplementary Information ◽

Risk Scores ◽

Cancer Genes ◽

Mutual Exclusivity ◽

Cancer Driver ◽

Cancer Data ◽

Ppi Networks ◽

Cancer Types ◽

Pan Cancer

Abstract Motivation Genomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein–protein interaction (PPI) networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules. Results We present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein–protein interactions (PPIs), mutual exclusivity and coverage to identify cancer driver modules. MEXCOwalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/MEXCOwalk. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Download Full-text

Pan-cancer driver copy number alterations identified by joint expression/CNA data analysis

Scientific Reports ◽

10.1038/s41598-020-74276-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Gaojianyong Wang ◽

Dimitris Anastassiou

Keyword(s):

Gene Expression ◽

Copy Number ◽

The Cancer Genome Atlas ◽

Copy Number Alterations ◽

Multiple Cancer ◽

Cancer Driver ◽

Large Gene ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Pan Cancer

Abstract Analysis of large gene expression datasets from biopsies of cancer patients can identify co-expression signatures representing particular biomolecular events in cancer. Some of these signatures involve genomically co-localized genes resulting from the presence of copy number alterations (CNAs), for which analysis of the expression of the underlying genes provides valuable information about their combined role as oncogenes or tumor suppressor genes. Here we focus on the discovery and interpretation of such signatures that are present in multiple cancer types due to driver amplifications and deletions in particular regions of the genome after doing a comprehensive analysis combining both gene expression and CNA data from The Cancer Genome Atlas.

Download Full-text

Reconstructing and characterizing focal amplifications in cancer using AmpliconArchitect

10.1101/457333 ◽

2018 ◽

Cited By ~ 1

Author(s):

Viraj Deshpande ◽

Jens Luebeck ◽

Mehrdad Bakhtiari ◽

Nam-Phuong D Nguyen ◽

Kristen M Turner ◽

...

Keyword(s):

Cervical Cancer ◽

Fine Structure ◽

Tumor Growth ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Multiple Cancer ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractFocal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. We developed a tool, AmpliconArchitect (AA), which can robustly reconstruct the fine structure of focally amplified regions using whole genome sequencing. AA-reconstructed amplicons in pan-cancer data and in virus-driven cervical cancer samples revealed many novel insights about focal amplifications. Specifically, the findings lend support to extrachromosomally mediated mechanisms for copy number expansion, and oncoviral pathogenesis.

Download Full-text

DriverRWH: Discovering Cancer Driver Genes By Random Walk On a Gene Mutation Hypergraph

10.21203/rs.3.rs-1192205/v1 ◽

2021 ◽

Author(s):

Chenye Wang ◽

Junhan Shi ◽

Jiansheng Cai ◽

Yusen Zhang ◽

Xiaoqi Zheng ◽

...

Keyword(s):

Random Walk ◽

Candidate Genes ◽

Gene Mutation ◽

Network Data ◽

Cumulative Number ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Mutation Data ◽

Cancer Driver Genes

Abstract Background: Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few driver mutation genes from a much larger number of passenger mutation genes. However, majority of existing computational approaches underuse the co-occurrence information of the individuals, which deems to be important in tumorigenesis and tumor progression. Driver gene list predicted from these tools are prone to be false positive, recent research is far from achieving the ultimate goal of discovering a complete catalog of driver genes. Results: To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas (TCGA), DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve (AUC) scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data. Conclusion: DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies.

Download Full-text

Pan-cancer detection of driver genes at the single-patient resolution

Genome Medicine ◽

10.1186/s13073-021-00830-0 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Joel Nulsen ◽

Hrvoje Misetic ◽

Christopher Yau ◽

Francesca D. Ciccarelli

Keyword(s):

False Positive Rate ◽

Genetic Alterations ◽

Therapeutic Interventions ◽

Precision Oncology ◽

Cancer Type ◽

Rare Cancer ◽

Driver Genes ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

Abstract Background Identifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions. Results We present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways. Conclusions sysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).

Download Full-text

Contextual Classifications of Cancer Driver Genes

10.1101/715508 ◽

2019 ◽

Author(s):

Pramod Chandrashekar ◽

Navid Ahmadinejad ◽

Junwen Wang ◽

Aleksandar Sekulic ◽

Jan B. Egan ◽

...

Keyword(s):

Computational Method ◽

Cancer Type ◽

Sequencing Data ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Link Type ◽

Mutational Hotspots ◽

Cancer Types ◽

Cancer Driver Genes

ABSTRACTFunctions of cancer driver genes depend on cellular contexts that vary substantially across tissues and organs. Distinguishing oncogenes (OGs) and tumor suppressor genes (TSGs) for each cancer type is critical to identifying clinically actionable targets. However, current resources for context-aware classifications of cancer drivers are limited. In this study, we show that the direction and magnitude of somatic selection of missense and truncating mutations of a gene are suggestive of its contextual activities. By integrating these features with ratiometric and conservation measures, we developed a computational method to categorize OGs and TSGs using exome sequencing data. This new method, named genes under selection in tumors (GUST) shows an overall accuracy of 0.94 when tested on manually curated benchmarks. Application of GUST to 10,172 tumor exomes of 33 cancer types identified 98 OGs and 179 TSGs, >70% of which promote tumorigenesis in only one cancer type. In broad-spectrum drivers shared across multiple cancer types, we found heterogeneous mutational hotspots modifying distinct functional domains, implicating the synchrony of convergent and divergent disease mechanisms. We further discovered two novel OGs and 28 novel TSGs with high confidence. The GUST program is available at https://github.com/liliulab/gust. A database with pre-computed classifications is available at https://liliulab.shinyapps.io/gust

Download Full-text

LOTUS: a Single- and Multitask Machine Learning Algorithm for the Prediction of Cancer Driver Genes

10.1101/398537 ◽

2018 ◽

Cited By ~ 1

Author(s):

Olivier Collier ◽

Véronique Stoven ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Biological Networks ◽

Learning Strategy ◽

Gene Prediction ◽

Scoring Function ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Cancer Driver Genes

AbstractCancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types.In this paper we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including informations about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types.We empirically show that LOTUS outperforms three other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.Author summaryCancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by targeted therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous informations into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction softwares.

Download Full-text

DriveWays: A Method for Identifying Possibly Overlapping Driver Pathways in Cancer

10.1101/2020.04.01.015388 ◽

2020 ◽

Author(s):

Ilyes Baali ◽

Cesim Erten ◽

Hilal Kazan

Keyword(s):

Optimization Problem ◽

Network Connectivity ◽

Supplementary Information ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Drivers ◽

Definition Of ◽

Almost All ◽

Pan Cancer

AbstractMotivationThe majority of the previous methods for identifying cancer driver modules output non-overlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution.ResultsWe provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes.We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWays’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.AvailabilityThe data, the source code, and useful scripts are available at: https://github.com/abu-compbio/DriveWaysSupplementary informationSupplementary data are available at Biorxiv.

Download Full-text

DriveWays: a method for identifying possibly overlapping driver pathways in cancer

Scientific Reports ◽

10.1038/s41598-020-78852-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ilyes Baali ◽

Cesim Erten ◽

Hilal Kazan

Keyword(s):

Optimization Problem ◽

State Of The Art ◽

Network Connectivity ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Drivers ◽

Definition Of ◽

Almost All ◽

Pan Cancer

AbstractThe majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.

Download Full-text