Discovery and characterization of coding and non-coding driver mutations in more than 2,500 whole cancer genomes

Mapping Intimacies ◽

10.1101/237313 ◽

2017 ◽

Cited By ~ 10

Author(s):

Esther Rheinbay ◽

Morten Muhlig Nielsen ◽

Federico Abascal ◽

Grace Tiao ◽

Henrik Hornshøj ◽

...

Keyword(s):

Recurrent Events ◽

Point Mutations ◽

Regulatory Elements ◽

Driver Mutations ◽

Cancer Genes ◽

Protein Coding ◽

Cancer Driver ◽

Protein Coding Genes ◽

Cancer Genomes ◽

Non Coding Rnas

AbstractDiscovery of cancer drivers has traditionally focused on the identification of protein-coding genes. Here we present a comprehensive analysis of putative cancer driver mutations in both protein-coding and non-coding genomic regions across >2,500 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We developed a statistically rigorous strategy for combining significance levels from multiple driver discovery methods and demonstrate that the integrated results overcome limitations of individual methods. We combined this strategy with careful filtering and applied it to protein-coding genes, promoters, untranslated regions (UTRs), distal enhancers and non-coding RNAs. These analyses redefine the landscape of non-coding driver mutations in cancer genomes, confirming a few previously reported elements and raising doubts about others, while identifying novel candidate elements across 27 cancer types. Novel recurrent events were found in the promoters or 5’UTRs ofTP53, RFTN1, RNF34,andMTG2,in the 3’UTRs ofNFKBIZandTOB1,and in the non-coding RNARMRP.We provide evidence that the previously reported non-coding RNAsNEAT1andMALAT1may be subject to a localized mutational process. Perhaps the most striking finding is the relative paucity of point mutations driving cancer in non-coding genes and regulatory elements. Though we have limited power to discover infrequent non-coding drivers in individual cohorts, combined analysis of promoters of known cancer genes show little excess of mutations beyondTERT.

Pathway and network analysis of more than 2,500 whole cancer genomes

10.1101/385294 ◽

2018 ◽

Cited By ~ 4

Author(s):

Matthew A. Reyna ◽

David Haan ◽

Marta Paczkowska ◽

Lieven P.C. Verbeke ◽

Miguel Vazquez ◽

...

Keyword(s):

Rna Splicing ◽

Driver Mutations ◽

Cancer Genes ◽

Protein Coding ◽

Cancer Driver ◽

Network Analyses ◽

Protein Coding Genes ◽

Cancer Genomes ◽

Tumor Types ◽

Promoter Mutations

AbstractThe catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notablyTERTpromoter mutations, have been reported. Motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes, we performed multi-faceted pathway and network analyses of non-coding mutations across 2,583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project. While few non-coding genomic elements were recurrently mutated in this cohort, we identified 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression inTP53, TLE4, andTCF4. We found that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing was primarily targeted by non-coding mutations in this cohort, with samples containing non-coding mutations exhibiting similar gene expression signatures as coding mutations in well-known RNA splicing factors. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.

A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations

10.1101/399014 ◽

2018 ◽

Author(s):

Paul Ashford ◽

Camilla S.M. Pang ◽

Aurelio A. Moya-García ◽

Tolulope Adeyelu ◽

Christine A. Orengo

Keyword(s):

Zinc Finger Protein ◽

Point Mutations ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Recurrent Point ◽

Cancer Driver ◽

Functional Sites ◽

Cancer Driver Genes ◽

Family Based

Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated.Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.

Coordinate regulation of long non-coding RNAs and protein-coding genes in germ-free mice

BMC Genomics ◽

10.1186/s12864-018-5235-3 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 15

Author(s):

Joseph Dempsey ◽

Angela Zhang ◽

Julia Yue Cui

Keyword(s):

Protein Coding ◽

Coordinate Regulation ◽

Germ Free ◽

Protein Coding Genes ◽

Non Coding Rnas

Identification and characterization of cis-regulatory elements for photoreceptor type-specific transcription in zebrafish

10.1101/683284 ◽

2019 ◽

Author(s):

Wei Fang ◽

Yi Wen ◽

Xiangyun Wei

Keyword(s):

Core Promoter ◽

Regulatory Elements ◽

Specific Gene ◽

Protein Coding ◽

Core Promoters ◽

Protein Coding Genes ◽

The Core ◽

Cell Type Specific ◽

Identification And Characterization

AbstractTissue-specific or cell type-specific transcription of protein-coding genes is controlled by both trans-regulatory elements (TREs) and cis-regulatory elements (CREs). However, it is challenging to identify TREs and CREs, which are unknown for most genes. Here, we describe a protocol for identifying two types of transcription-activating CREs—core promoters and enhancers—of zebrafish photoreceptor type-specific genes. This protocol is composed of three phases: bioinformatic prediction, experimental validation, and characterization of the CREs. To better illustrate the principles and logic of this protocol, we exemplify it with the discovery of the core promoter and enhancer of the mpp5b apical polarity gene (also known as ponli), whose red, green, and blue (RGB) cone-specific transcription requires its enhancer, a member of the rainbow enhancer family. While exemplified with an RGB cone-specific gene, this protocol is general and can be used to identify the core promoters and enhancers of other protein-coding genes.

Co‐regulation of long non‐coding RNAs and protein‐coding genes during cell quiescence

The FASEB Journal ◽

10.1096/fasebj.2021.35.s1.03263 ◽

2021 ◽

Vol 35 (S1) ◽

Author(s):

Hilary Coller ◽

Huiling Huang ◽

Mithun Mitra ◽

Kaiser Atai ◽

Kirthana Sarathy

Keyword(s):

Protein Coding ◽

Protein Coding Genes ◽

Cell Quiescence ◽

Non Coding Rnas

DNA methylation patterns of protein-coding genes and long non-coding RNAs in males with schizophrenia

Molecular Medicine Reports ◽

10.3892/mmr.2015.4249 ◽

2015 ◽

Vol 12 (5) ◽

pp. 6568-6576 ◽

Cited By ~ 6

Author(s):

QI LIAO ◽

YUNLIANG WANG ◽

JIA CHENG ◽

DONGJUN DAI ◽

XINGYU ZHOU ◽

...

Keyword(s):

Dna Methylation ◽

Protein Coding ◽

Protein Coding Genes ◽

Non Coding Rnas ◽

Methylation Patterns

Expression changes in protein-coding genes and long non-coding RNAs in denatured dermis following thermal injury

Burns ◽

10.1016/j.burns.2019.11.016 ◽

2020 ◽

Vol 46 (5) ◽

pp. 1128-1135 ◽

Cited By ~ 1

Author(s):

Wenchang Yu ◽

Zaiwen Guo ◽

Pengfei Liang ◽

Bimei Jiang ◽

Le Guo ◽

...

Keyword(s):

Thermal Injury ◽

Protein Coding ◽

Protein Coding Genes ◽

Non Coding Rnas

LncExpDB: an expression database of human long non-coding RNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa850 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D962-D968 ◽

Cited By ~ 2

Author(s):

Zhao Li ◽

Lin Liu ◽

Shuai Jiang ◽

Qianpeng Li ◽

Changrui Feng ◽

...

Keyword(s):

Expression Profiles ◽

Biological Functions ◽

Protein Coding ◽

Web Interfaces ◽

Functional Studies ◽

Protein Coding Genes ◽

Genes Expression ◽

Wide Range ◽

Non Coding Rnas ◽

User Friendly

Abstract Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.

OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers

Nucleic Acids Research ◽

10.1093/nar/gkaa1033 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1289-D1301 ◽

Cited By ~ 2

Author(s):

Tao Wang ◽

Shasha Ruan ◽

Xiaolu Zhao ◽

Xiaohui Shi ◽

Huajing Teng ◽

...

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Cell Population ◽

Cancer Types ◽

Neutral Mutations ◽

Analysis Platform

Abstract The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, ‘Mutation’, ‘Gene’, ‘Pathway’ and ‘Cancer’, to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.

Quantifying gene selection in cancer through protein functional alteration bias

Nucleic Acids Research ◽

10.1093/nar/gkz546 ◽

2019 ◽

Vol 47 (13) ◽

pp. 6642-6655 ◽

Cited By ~ 7

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Somatic Mutations ◽

Gene Selection ◽

De Novo ◽

Cancer Genes ◽

Driver Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Implicit And Explicit ◽

False Discoveries

Abstract Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.