DriverPower: Combined burden and functional impact tests for cancer driver discovery

Mapping Intimacies ◽

10.1101/215244 ◽

2017 ◽

Cited By ~ 4

Author(s):

Shimin Shuai ◽

Steven Gallinger ◽

Lincoln Stein ◽

Keyword(s):

Driver Mutations ◽

Functional Interpretation ◽

Functional Impact ◽

Genomic Features ◽

Cancer Driver ◽

Mutational Burden ◽

Mutation Model ◽

Whole Genomes ◽

Cancer Genomes ◽

Pan Cancer

AbstractWe describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify cancer driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1,373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across a variety of tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2,583 cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Group, DriverPower has the highest F1-score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery.

Pan-cancer analysis of whole genomes

10.1101/162784 ◽

2017 ◽

Cited By ~ 75

Author(s):

Peter J Campbell ◽

Gaddy Getz ◽

Joshua M Stuart ◽

Jan O Korbel ◽

Lincoln D Stein

Keyword(s):

Structural Variation ◽

Integrative Analysis ◽

Background Information ◽

Cancer Driver ◽

Normal Tissues ◽

Whole Genomes ◽

Cancer Genomes ◽

Pan Cancer ◽

Events Study

We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.

Pathway and network analysis of more than 2,500 whole cancer genomes

10.1101/385294 ◽

2018 ◽

Cited By ~ 4

Author(s):

Matthew A. Reyna ◽

David Haan ◽

Marta Paczkowska ◽

Lieven P.C. Verbeke ◽

Miguel Vazquez ◽

...

Keyword(s):

Rna Splicing ◽

Driver Mutations ◽

Cancer Genes ◽

Protein Coding ◽

Cancer Driver ◽

Network Analyses ◽

Protein Coding Genes ◽

Cancer Genomes ◽

Tumor Types ◽

Promoter Mutations

AbstractThe catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notablyTERTpromoter mutations, have been reported. Motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes, we performed multi-faceted pathway and network analyses of non-coding mutations across 2,583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project. While few non-coding genomic elements were recurrently mutated in this cohort, we identified 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression inTP53, TLE4, andTCF4. We found that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing was primarily targeted by non-coding mutations in this cohort, with samples containing non-coding mutations exhibiting similar gene expression signatures as coding mutations in well-known RNA splicing factors. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes

Cancers ◽

10.3390/cancers13102366 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2366

Author(s):

Shayantan Banerjee ◽

Karthik Raman ◽

Balaraman Ravindran

Keyword(s):

Cancer Progression ◽

Nucleotide Sequences ◽

Feature Representation ◽

Estimation Methods ◽

Driver Mutations ◽

Cancer Mutation ◽

Evolutionary Features ◽

Cancer Genomes ◽

Passenger Mutations ◽

Pan Cancer

Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.

Sequence neighborhoods enable reliable prediction of pathogenic mutations in cancer genomes

10.1101/2021.02.09.430460 ◽

2021 ◽

Author(s):

Shayantan Banerjee ◽

Karthik Raman ◽

Balaraman Ravindran

Keyword(s):

Cancer Progression ◽

Nucleotide Sequences ◽

Feature Representation ◽

Estimation Methods ◽

Driver Mutations ◽

Cancer Mutation ◽

Evolutionary Features ◽

Cancer Genomes ◽

Passenger Mutations ◽

Pan Cancer

AbstractIdentifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on utilizing the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5’ and 3’ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments gave comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with two other commonly used driver prediction tools (CONDEL and Mutation Taster) outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of utilizing raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.

Parallelized Latent Dirichlet Allocation Provides a Novel Interpretability of Mutation Signatures in Cancer Genomes

Genes ◽

10.3390/genes11101127 ◽

2020 ◽

Vol 11 (10) ◽

pp. 1127

Author(s):

Taro Matsutani ◽

Michiaki Hamada

Keyword(s):

Latent Dirichlet Allocation ◽

Extended Model ◽

Tumor Type ◽

Whole Genomes ◽

Cancer Genomes ◽

The One ◽

Tumor Types ◽

Apobec Family ◽

Pan Cancer ◽

Dirichlet Allocation

Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses.

Framework for quality assessment of whole genome cancer sequences

Nature Communications ◽

10.1038/s41467-020-18688-y ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Justin P. Whalley ◽

Ivo Buchhalter ◽

Esther Rheinbay ◽

Keiran M. Raine ◽

Miranda D. Stobbe ◽

...

Keyword(s):

Molecular Mechanisms ◽

Quality Measures ◽

Poor Quality ◽

Rating System ◽

Test Case ◽

Whole Genome ◽

Whole Genomes ◽

Cancer Genomes ◽

Pan Cancer

Abstract Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2.

Pan-cancer analysis of whole genomes reveals driver rearrangements promoted by LINE-1 retrotransposition in human tumours

10.1101/179705 ◽

2017 ◽

Cited By ~ 10

Author(s):

Bernardo Rodriguez-Martin ◽

Eva G. Alvarez ◽

Adrian Baez-Ortega ◽

Jorge Zamora ◽

Fran Supek ◽

...

Keyword(s):

Large Scale ◽

Tumour Suppressor Genes ◽

Cancer Subtypes ◽

Human Tumours ◽

Whole Genomes ◽

Cancer Genomes ◽

Relevant Role ◽

High Level ◽

Pan Cancer

AbstractAbout half of all cancers have somatic integrations of retrotransposons. To characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 37 histological cancer subtypes. We identified 19,166 somatically acquired retrotransposition events, affecting 35% of samples, and spanning a range of event types. L1 insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, sometimes removing tumour suppressor genes, as well as inducing complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications in the development of human tumours.

Mutational likeliness and entropy help to identify driver mutations and their functional role in cancer

10.1101/354324 ◽

2018 ◽

Author(s):

Giorgio Mattiuz ◽

Salvatore Di Giorgio ◽

Lorenzo Tofani ◽

Antonio Frandi ◽

Francesco Donati ◽

...

Keyword(s):

Cancer Progression ◽

Somatic Mutations ◽

Driver Mutations ◽

Cancer Evolution ◽

Loss Of Function ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Genomes ◽

Passenger Mutations ◽

Mutational Processes

AbstractAlterations in cancer genomes originate from mutational processes taking place throughout oncogenesis and cancer progression. We show that likeliness and entropy are two properties of somatic mutations crucial in cancer evolution, as cancer-driver mutations stand out, with respect to both of these properties, as being distinct from the bulk of passenger mutations. Our analysis can identify novel cancer driver genes and differentiate between gain and loss of function mutations.

Biallelic mutations in cancer genomes reveal local mutational determinants

10.1101/2021.03.29.437407 ◽

2021 ◽

Author(s):

Jonas Demeulemeester ◽

Stefan C Dentro ◽

Moritz Gerstung ◽

Peter Van Loo

Keyword(s):

Binding Sites ◽

Variant Calling ◽

Mutation Rates ◽

Uv Damage ◽

Sequencing Data ◽

Whole Genomes ◽

Cancer Genomes ◽

Infinite Sites Model ◽

Biallelic Mutations ◽

Pan Cancer

The infinite sites model of molecular evolution requires that every position in the genome is mutated at most once. It is a cornerstone of tumour phylogenetic analysis, and is often implied when calling, phasing and interpreting variants or studying the mutational landscape as a whole. Here we identify 20,555 biallelic mutations, where the same base is mutated independently on both parental copies, in 722 (26.0%) bulk sequencing samples from the Pan-Cancer Analysis of Whole Genomes study (PCAWG). Biallelic mutations reveal UV damage hotspots at ETS and NFAT binding sites, and hypermutable motifs in POLE-mutant and other cancers. We formulate recommendations for variant calling and provide frameworks to model and detect biallelic mutations. These results highlight the need for accurate models of mutation rates and tumour evolution, as well as their inference from sequencing data.

Passenger mutations in 2500 cancer genomes: Overall molecular functional impact and consequences

10.1101/280446 ◽

2018 ◽

Cited By ~ 5

Author(s):

Sushant Kumar ◽

Jonathan Warrell ◽

Shantao Li ◽

Patrick D. McGillivray ◽

William Meyerson ◽

...

Keyword(s):

Cancer Progression ◽

Complex Trait ◽

Driver Mutations ◽

Additive Variance ◽

Functional Impact ◽

Cancer Genomes ◽

Classical Models ◽

Cancer Phenotypes ◽

Passenger Mutations ◽

The Impact

AbstractThe Pan-cancer Analysis of Whole Genomes (PCAWG) project provides an unprecedented opportunity to comprehensively characterize a vast set of uniformly annotated coding and non-coding mutations present in thousands of cancer genomes. Classical models of cancer progression posit that only a small number of these mutations strongly drive tumor progression and that the remaining ones (termed “putative passengers”) are inconsequential for tumorigenesis. In this study, we leveraged the comprehensive variant data from PCAWG to ascertain the molecular functional impact of each variant. The impact distribution of PCAWG mutations shows that, in addition to high- and low-impact mutations, there is a group of medium-impact putative passengers predicted to influence gene activity. Moreover, the predicted impact relates to the underlying mutational signature: different signatures confer divergent impact, differentially affecting distinct regulatory subsystems and gene categories. We also find that impact varies based on subclonal architecture (i.e., early vs. late mutations) and can be related to patient survival. Finally, we note that insufficient power due to limited cohort sizes precludes identification of weak drivers using standard recurrence-based approaches. To address this, we adapted an additive effects model derived from complex trait studies to show that aggregating the impact of putative passenger variants (i.e. including yet undetected weak drivers) provides significant predictability for cancer phenotypes beyond the PCAWG identified driver mutations (12.5% additive variance). Furthermore, this framework allowed us to estimate the frequency of potential weak driver mutations in the subset of PCAWG samples lacking well-characterized driver alterations.