A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations

Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated.Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.

Download Full-text

A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations

Scientific Reports ◽

10.1038/s41598-018-36401-4 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

Paul Ashford ◽

Camilla S. M. Pang ◽

Aurelio A. Moya-García ◽

Tolulope Adeyelu ◽

Christine A. Orengo

Keyword(s):

Driver Mutations ◽

Driver Genes ◽

Cancer Driver ◽

Functional Family ◽

Cancer Driver Genes ◽

Domain Functional ◽

Family Based

Download Full-text

Discovery of cancer driver genes based on nucleotide context

10.1101/485292 ◽

2018 ◽

Author(s):

Felix Dietlein ◽

Donate Weghorn ◽

Amaro Taylor-Weiner ◽

André Richters ◽

Brendan Reardon ◽

...

Keyword(s):

Mutation Rates ◽

Driver Mutations ◽

Supporting Evidence ◽

Cancer Genes ◽

Driver Genes ◽

High Background ◽

Cancer Driver ◽

Nucleotide Context ◽

Cancer Driver Genes ◽

Passenger Mutations

Many cancer genomes contain large numbers of somatic mutations, but few of these mutations drive tumor development. Current approaches to identify cancer driver genes are largely based on mutational recurrence, i.e. they search for genes with an increased number of nonsynonymous mutations relative to the local background mutation rate. Multiple studies have noted that the sensitivity of recurrence-based methods is limited in tumors with high background mutation rates, because passenger mutations dilute their statistical power. Here, we observe that passenger mutations tend to occur in characteristic nucleotide sequence contexts, while driver mutations follow a different distribution pattern determined by the location of functionally relevant genomic positions along the protein-coding sequence. To discover new cancer genes, we searched for genes with an excess of mutations in unusual nucleotide contexts that deviate from the characteristic context around passenger mutations. By applying this statistical framework to whole-exome sequencing data from 12,004 tumors, we discovered a long tail of novel candidate cancer genes with mutation frequencies as low as 1% and functional supporting evidence. Our results show that considering both the number and the nucleotide context around mutations helps identify novel cancer driver genes, particularly in tumors with high background mutation rates.

Download Full-text

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Download Full-text

Interpreting pathways to discover cancer driver genes with Moonlight

Nature Communications ◽

10.1038/s41467-019-13803-0 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 9

Author(s):

Antonio Colaprico ◽

Catharina Olsen ◽

Matthew H. Bailey ◽

Gabriel J. Odom ◽

Thilde Terkelsen ◽

...

Keyword(s):

Tumor Suppressors ◽

Molecular Mechanisms ◽

Dual Role ◽

Tissue Type ◽

Driver Gene ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Therapeutic Decisions ◽

Cancer Driver Genes

AbstractCancer driver gene alterations influence cancer development, occurring in oncogenes, tumor suppressors, and dual role genes. Discovering dual role cancer genes is difficult because of their elusive context-dependent behavior. We define oncogenic mediators as genes controlling biological processes. With them, we classify cancer driver genes, unveiling their roles in cancer mechanisms. To this end, we present Moonlight, a tool that incorporates multiple -omics data to identify critical cancer driver genes. With Moonlight, we analyze 8000+ tumor samples from 18 cancer types, discovering 3310 oncogenic mediators, 151 having dual roles. By incorporating additional data (amplification, mutation, DNA methylation, chromatin accessibility), we reveal 1000+ cancer driver genes, corroborating known molecular mechanisms. Additionally, we confirm critical cancer driver genes by analysing cell-line datasets. We discover inactivation of tumor suppressors in intron regions and that tissue type and subtype indicate dual role status. These findings help explain tumor heterogeneity and could guide therapeutic decisions.

Download Full-text

Diversity spectrum analysis identifies mutation-specific effects of cancer driver genes

Communications Biology ◽

10.1038/s42003-019-0736-4 ◽

2020 ◽

Vol 3 (1) ◽

Cited By ~ 2

Author(s):

Xiaobao Dong ◽

Dandan Huang ◽

Xianfu Yi ◽

Shijie Zhang ◽

Zhao Wang ◽

...

Keyword(s):

Clinical Trials ◽

Spectrum Analysis ◽

Driver Mutations ◽

Driver Gene ◽

Cancer Type ◽

Driver Genes ◽

Drug Responses ◽

Cancer Driver ◽

Specific Effects ◽

Cancer Driver Genes

AbstractMutation-specific effects of cancer driver genes influence drug responses and the success of clinical trials. We reasoned that these effects could unbalance the distribution of each mutation across different cancer types, as a result, the cancer preference can be used to distinguish the effects of the causal mutation. Here, we developed a network-based framework to systematically measure cancer diversity for each driver mutation. We found that half of the driver genes harbor cancer type-specific and pancancer mutations simultaneously, suggesting that the pervasive functional heterogeneity of the mutations from even the same driver gene. We further demonstrated that the specificity of the mutations could influence patient drug responses. Moreover, we observed that diversity was generally increased in advanced tumors. Finally, we scanned potentially novel cancer driver genes based on the diversity spectrum. Diversity spectrum analysis provides a new approach to define driver mutations and optimize off-label clinical trials.

Download Full-text

WITER: A powerful method for the estimation of cancer-driver genes using a weighted iterative regression accurately modelling background mutation rate

10.1101/437061 ◽

2018 ◽

Author(s):

Lin Jiang ◽

Jingjing Zheng ◽

Johnny Sheung Him Kwan ◽

Sheng Dai ◽

Cong Li ◽

...

Keyword(s):

Negative Binomial ◽

Negative Binomial Regression ◽

Alternative Methods ◽

Driver Mutations ◽

Driver Genes ◽

Cancer Driver ◽

Binomial Regression ◽

Background Mutation Rate ◽

Technical Advances ◽

Cancer Driver Genes

AbstractGenomic identification of driver mutations and genes in cancer cells are critical for precision medicine. Due to difficulty in modeling distribution of background mutations, existing statistical methods are often underpowered to discriminate driver genes from passenger genes. Here we propose a novel statistical approach, weighted iterative zero-truncated negative-binomial regression (WITER), to detect cancer-driver genes showing an excess of somatic mutations. By solving the problem of inaccurately modeling background mutations, this approach works even in small or moderate samples. Compared to alternative methods, it detected more significant and cancer-consensus genes in all tested cancers. Applying this approach, we estimated 178 driver genes in 26 different cancers types. In silico validation confirmed 90.5% of predicted genes as likely known drivers and 7 genes unique for individual cancers as likely new drivers. The technical advances of WITER enable the detection of driver genes in TCGA datasets as small as 30 subjects, rescuing more genes missed by alternative tools.

Download Full-text

WITER: a powerful method for estimation of cancer-driver genes using a weighted iterative regression modelling background mutation counts

Nucleic Acids Research ◽

10.1093/nar/gkz566 ◽

2019 ◽

Vol 47 (16) ◽

pp. e96-e96 ◽

Cited By ~ 6

Author(s):

Lin Jiang ◽

Jingjing Zheng ◽

Johnny S H Kwan ◽

Sheng Dai ◽

Cong Li ◽

...

Keyword(s):

Negative Binomial ◽

Negative Binomial Regression ◽

Alternative Methods ◽

Small Samples ◽

Driver Mutations ◽

Driver Genes ◽

Cancer Driver ◽

Binomial Regression ◽

Technical Advances ◽

Cancer Driver Genes

Abstract Genomic identification of driver mutations and genes in cancer cells are critical for precision medicine. Due to difficulty in modelling distribution of background mutation counts, existing statistical methods are often underpowered to discriminate cancer-driver genes from passenger genes. Here we propose a novel statistical approach, weighted iterative zero-truncated negative-binomial regression (WITER, http://grass.cgs.hku.hk/limx/witer or KGGSeq,http://grass.cgs.hku.hk/limx/kggseq/), to detect cancer-driver genes showing an excess of somatic mutations. By fitting the distribution of background mutation counts properly, this approach works well even in small or moderate samples. Compared to alternative methods, it detected more significant and cancer-consensus genes in most tested cancers. Applying this approach, we estimated 229 driver genes in 26 different types of cancers. In silico validation confirmed 78% of predicted genes as likely known drivers and many other genes as very likely new drivers for corresponding cancers. The technical advances of WITER enable the detection of driver genes in TCGA datasets as small as 30 subjects and rescue of more genes missed by alternative tools in moderate or small samples.

Download Full-text

In silico saturation mutagenesis of cancer genes

10.1101/2020.06.03.130211 ◽

2020 ◽

Author(s):

Ferran Muiños ◽

Francisco Martinez-Jimenez ◽

Oriol Pich ◽

Abel Gonzalez-Perez ◽

Nuria Lopez-Bigas

Keyword(s):

In Silico ◽

Tumor Development ◽

Saturation Mutagenesis ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Mutation Probability ◽

Tumor Types ◽

Tumor Somatic Mutations

SummaryExtensive bioinformatics analysis of datasets of tumor somatic mutations data have revealed the presence of some 500-600 cancer driver genes. The identification of all potential driver mutations affecting cancer genes is essential to implement precision cancer medicine and to understand the interplay of mutation probability and selection in tumor development. Here, we present an in silico saturation mutagenesis approach to identify all driver mutations in 568 cancer genes across 66 tumor types. For most cancer genes the mutation probability across tissues --underpinned by active mutational processes-- influences which driver variants have been observed, although this differs significantly between tumor suppressor and oncogenes. The role of selection is apparent in some of the latter, the observed and unobserved driver mutations of which are equally likely to occur. The number of potential driver mutations in a cancer gene roughly determines how many mutations are available for detection across newly sequenced tumors.

Download Full-text

LOTUS: a Single- and Multitask Machine Learning Algorithm for the Prediction of Cancer Driver Genes

10.1101/398537 ◽

2018 ◽

Cited By ~ 1

Author(s):

Olivier Collier ◽

Véronique Stoven ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Biological Networks ◽

Learning Strategy ◽

Gene Prediction ◽

Scoring Function ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Cancer Driver Genes

AbstractCancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types.In this paper we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including informations about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types.We empirically show that LOTUS outperforms three other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.Author summaryCancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by targeted therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous informations into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction softwares.

Download Full-text

Directional association test reveals high-quality putative cancer driver biomarkers including noncoding RNAs

BMC Medical Genomics ◽

10.1186/s12920-019-0565-9 ◽

2019 ◽

Vol 12 (S7) ◽

Cited By ~ 2

Author(s):

Hua Zhong ◽

Mingzhou Song

Keyword(s):

Myeloid Leukemia ◽

Noncoding Rnas ◽

Causative Role ◽

Cancer Genes ◽

High Quality ◽

Driver Genes ◽

Cancer Driver ◽

Functional Relationships ◽

Model Free ◽

Cancer Driver Genes

Abstract Background Most statistical methods used to identify cancer driver genes are either biased due to choice of assumed parametric models or insensitive to directional relationships important for causal inference. To overcome modeling biases and directional insensitivity, a recent statistical functional chi-squared test (FunChisq) detects directional association via model-free functional dependency. FunChisq examines patterns pointing from independent to dependent variables arising from linear, non-linear, or many-to-one functional relationships. Meanwhile, the Functional Annotation of Mammalian Genome 5 (FANTOM5) project surveyed gene expression at over 200,000 transcription start sites (TSSs) in nearly all human tissue types, primary cell types, and cancer cell lines. The data cover TSSs originated from both coding and noncoding genes. For the vast uncharacterized human TSSs that may exhibit complex patterns in cancer versus normal tissues, the model-free property of FunChisq provides us an unprecedented opportunity to assess the evidence for a gene’s directional effect on human cancer. Results We first evaluated FunChisq and six other methods using 719 curated cancer genes on the FANTOM5 data. FunChisq performed best in detecting known cancer driver genes from non-cancer genes. We also show the capacity of FunChisq to reveal non-monotonic patterns of functional association, to which typical differential analysis methods such as t-test are insensitive. Further applying FunChisq to screen unannotated TSSs in FANTOM5, we predicted 1108 putative cancer driver noncoding RNAs, stronger than 90% of curated cancer driver genes. Next, we compared leukemia samples against other samples in FANTOM5 and FunChisq predicted 332/79 potential biomarkers for lymphoid/myeloid leukemia, stronger than the TSSs of all 87/100 known driver genes in lymphoid/myeloid leukemia. Conclusions This study demonstrated the advantage of FunChisq in revealing directional association, especially in detecting non-monotonic patterns. Here, we also provide the most comprehensive catalog of high-quality biomarkers that may play a causative role in human cancers, including putative cancer driver noncoding RNAs and lymphoid/myeloid leukemia specific biomarkers.

Download Full-text