Pan-cancer analysis of whole genomes

We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.

Download Full-text

DriverPower: Combined burden and functional impact tests for cancer driver discovery

10.1101/215244 ◽

2017 ◽

Cited By ~ 4

Author(s):

Shimin Shuai ◽

Steven Gallinger ◽

Lincoln Stein ◽

Keyword(s):

Driver Mutations ◽

Functional Interpretation ◽

Functional Impact ◽

Genomic Features ◽

Cancer Driver ◽

Mutational Burden ◽

Mutation Model ◽

Whole Genomes ◽

Cancer Genomes ◽

Pan Cancer

AbstractWe describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify cancer driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1,373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across a variety of tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2,583 cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Group, DriverPower has the highest F1-score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery.

Download Full-text

Parallelized Latent Dirichlet Allocation Provides a Novel Interpretability of Mutation Signatures in Cancer Genomes

Genes ◽

10.3390/genes11101127 ◽

2020 ◽

Vol 11 (10) ◽

pp. 1127

Author(s):

Taro Matsutani ◽

Michiaki Hamada

Keyword(s):

Latent Dirichlet Allocation ◽

Extended Model ◽

Tumor Type ◽

Whole Genomes ◽

Cancer Genomes ◽

The One ◽

Tumor Types ◽

Apobec Family ◽

Pan Cancer ◽

Dirichlet Allocation

Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses.

Download Full-text

Framework for quality assessment of whole genome cancer sequences

Nature Communications ◽

10.1038/s41467-020-18688-y ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Justin P. Whalley ◽

Ivo Buchhalter ◽

Esther Rheinbay ◽

Keiran M. Raine ◽

Miranda D. Stobbe ◽

...

Keyword(s):

Molecular Mechanisms ◽

Quality Measures ◽

Poor Quality ◽

Rating System ◽

Test Case ◽

Whole Genome ◽

Whole Genomes ◽

Cancer Genomes ◽

Pan Cancer

Abstract Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2.

Download Full-text

Pan-cancer analysis of whole genomes reveals driver rearrangements promoted by LINE-1 retrotransposition in human tumours

10.1101/179705 ◽

2017 ◽

Cited By ~ 10

Author(s):

Bernardo Rodriguez-Martin ◽

Eva G. Alvarez ◽

Adrian Baez-Ortega ◽

Jorge Zamora ◽

Fran Supek ◽

...

Keyword(s):

Large Scale ◽

Tumour Suppressor Genes ◽

Cancer Subtypes ◽

Human Tumours ◽

Whole Genomes ◽

Cancer Genomes ◽

Relevant Role ◽

High Level ◽

Pan Cancer

AbstractAbout half of all cancers have somatic integrations of retrotransposons. To characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 37 histological cancer subtypes. We identified 19,166 somatically acquired retrotransposition events, affecting 35% of samples, and spanning a range of event types. L1 insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, sometimes removing tumour suppressor genes, as well as inducing complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications in the development of human tumours.

Download Full-text

Biallelic mutations in cancer genomes reveal local mutational determinants

10.1101/2021.03.29.437407 ◽

2021 ◽

Author(s):

Jonas Demeulemeester ◽

Stefan C Dentro ◽

Moritz Gerstung ◽

Peter Van Loo

Keyword(s):

Binding Sites ◽

Variant Calling ◽

Mutation Rates ◽

Uv Damage ◽

Sequencing Data ◽

Whole Genomes ◽

Cancer Genomes ◽

Infinite Sites Model ◽

Biallelic Mutations ◽

Pan Cancer

The infinite sites model of molecular evolution requires that every position in the genome is mutated at most once. It is a cornerstone of tumour phylogenetic analysis, and is often implied when calling, phasing and interpreting variants or studying the mutational landscape as a whole. Here we identify 20,555 biallelic mutations, where the same base is mutated independently on both parental copies, in 722 (26.0%) bulk sequencing samples from the Pan-Cancer Analysis of Whole Genomes study (PCAWG). Biallelic mutations reveal UV damage hotspots at ETS and NFAT binding sites, and hypermutable motifs in POLE-mutant and other cancers. We formulate recommendations for variant calling and provide frameworks to model and detect biallelic mutations. These results highlight the need for accurate models of mutation rates and tumour evolution, as well as their inference from sequencing data.

Download Full-text

Integrative Analysis Identifies Potential DNA Methylation Biomarkers for Pan-Cancer Diagnosis and Prognosis

SSRN Electronic Journal ◽

10.2139/ssrn.3244894 ◽

2018 ◽

Author(s):

Wubin Ding ◽

Geng Chen ◽

Tieliu Shi

Keyword(s):

Dna Methylation ◽

Cancer Diagnosis ◽

Integrative Analysis ◽

Diagnosis And Prognosis ◽

Cancer Diagnosis And Prognosis ◽

Pan Cancer

Download Full-text

Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning

Scientific Reports ◽

10.1038/s41598-021-93917-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Marleen M. Nieboer ◽

Luan Nguyen ◽

Jeroen de Ridder

Keyword(s):

Multiple Instance Learning ◽

Cancer Diagnostics ◽

Common Mechanism ◽

Open Chromatin ◽

Driver Genes ◽

3D Genome ◽

Whole Genomes ◽

Cancer Genomes ◽

Cancer Types ◽

The Impact

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.

Download Full-text

driveR: a novel method for prioritizing cancer driver genes using somatic genomics data

BMC Bioinformatics ◽

10.1186/s12859-021-04203-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ege Ülgen ◽

O. Uğur Sezerman

Keyword(s):

Biological Knowledge ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Prior Biological Knowledge ◽

Wilcoxon Rank Sum Test ◽

Cancer Genomes ◽

Novel Method ◽

Cancer Driver Genes ◽

Batch Analysis

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.

Download Full-text

Deleting in vivo β-catenin degradation domain in mouse hepatocytes drives hepatocellular carcinoma or hepatoblastoma-like tumors

10.1101/2021.07.04.450836 ◽

2021 ◽

Author(s):

Robin Loesch ◽

Stefano Caruso ◽

Valerie Paradis ◽

Cecile Godard ◽

Angelique Gougelet ◽

...

Keyword(s):

Mouse Models ◽

Liver Tumors ◽

Integrative Analysis ◽

Hepatic Tumors ◽

Normal Tissues ◽

Mouse Tumors ◽

Rnaseq Data ◽

Well Differentiated ◽

Low Prevalence

Background and aims: One-third of hepatocellular carcinomas (HCCs) have mutations that activate the β-catenin pathway with mostly CTNNB1 mutations. Mouse models using Adenomatous polyposis coli (Apc) loss-of-functions (LOF) are widely used to mimic β-catenin-dependent tumorigenesis. Considering the low prevalence of APC mutations in human HCCs we aimed to generate hepatic tumors through CTNNB1 exon 3 deletion (βcatΔex3) and to compare them to hepatic tumors with Apc LOF engineered through a frameshift in exon 15 (Apcfs-ex15). Methods: We used hepatic-specific and inducible Cre-lox mouse models as well as a hepatic-specific in vivo CRISPR/Cas9 approach using AAV vectors, to generate Apcfs-ex15 and βcatΔex3 hepatic tumors harboring activation of the β-catenin pathway. Tumors generated by the Cre-lox models were analyzed phenotypically using immunohistochemistry and were selected for transcriptomic analysis using RNA-sequencing. Mouse RNAseq data were compared to human RNAseq data (normal tissues (8), HCCs (48) and hepatoblastomas (9)) in an integrative analysis. Tumors generated via CRISPR were analyzed using DNA sequencing and immunohistochemistry. Results: Mice with βcatΔex3 alteration in hepatocytes developed liver tumors. Generated tumors were indistinguishable from those arising in Apcfs-ex15 mice. Both Apcfs-ex15 and βcatΔex3 mouse models induced two phenotypically distinct tumors (differentiated or undifferentiated). Integrative analysis of human and mouse tumors showed that mouse differentiated tumors are close to human well differentiated CTNNB1-mutated tumors, while undifferentiated ones are closer to human mesenchymal hepatoblastomas, and are activated for YAP signaling. Conclusion: Apcfs-ex15 and βcatΔex3 mouse models similarly induce tumors transcriptionally close to either well differentiated β-Catenin activated human HCCs or mesenchymal hepatoblastomas.

Download Full-text