scholarly journals Pan-cancer analysis of whole genomes

2017 ◽  
Author(s):  
Peter J Campbell ◽  
Gaddy Getz ◽  
Joshua M Stuart ◽  
Jan O Korbel ◽  
Lincoln D Stein

We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.

2017 ◽  
Author(s):  
Shimin Shuai ◽  
Steven Gallinger ◽  
Lincoln Stein ◽  

AbstractWe describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify cancer driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1,373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across a variety of tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2,583 cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Group, DriverPower has the highest F1-score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1127
Author(s):  
Taro Matsutani ◽  
Michiaki Hamada

Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Justin P. Whalley ◽  
Ivo Buchhalter ◽  
Esther Rheinbay ◽  
Keiran M. Raine ◽  
Miranda D. Stobbe ◽  
...  

Abstract Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2.


2017 ◽  
Author(s):  
Bernardo Rodriguez-Martin ◽  
Eva G. Alvarez ◽  
Adrian Baez-Ortega ◽  
Jorge Zamora ◽  
Fran Supek ◽  
...  

AbstractAbout half of all cancers have somatic integrations of retrotransposons. To characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 37 histological cancer subtypes. We identified 19,166 somatically acquired retrotransposition events, affecting 35% of samples, and spanning a range of event types. L1 insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, sometimes removing tumour suppressor genes, as well as inducing complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications in the development of human tumours.


2021 ◽  
Author(s):  
Jonas Demeulemeester ◽  
Stefan C Dentro ◽  
Moritz Gerstung ◽  
Peter Van Loo

The infinite sites model of molecular evolution requires that every position in the genome is mutated at most once. It is a cornerstone of tumour phylogenetic analysis, and is often implied when calling, phasing and interpreting variants or studying the mutational landscape as a whole. Here we identify 20,555 biallelic mutations, where the same base is mutated independently on both parental copies, in 722 (26.0%) bulk sequencing samples from the Pan-Cancer Analysis of Whole Genomes study (PCAWG). Biallelic mutations reveal UV damage hotspots at ETS and NFAT binding sites, and hypermutable motifs in POLE-mutant and other cancers. We formulate recommendations for variant calling and provide frameworks to model and detect biallelic mutations. These results highlight the need for accurate models of mutation rates and tumour evolution, as well as their inference from sequencing data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marleen M. Nieboer ◽  
Luan Nguyen ◽  
Jeroen de Ridder

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ege Ülgen ◽  
O. Uğur Sezerman

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.


2021 ◽  
Author(s):  
Robin Loesch ◽  
Stefano Caruso ◽  
Valerie Paradis ◽  
Cecile Godard ◽  
Angelique Gougelet ◽  
...  

Background and aims: One-third of hepatocellular carcinomas (HCCs) have mutations that activate the β-catenin pathway with mostly CTNNB1 mutations. Mouse models using Adenomatous polyposis coli (Apc) loss-of-functions (LOF) are widely used to mimic β-catenin-dependent tumorigenesis. Considering the low prevalence of APC mutations in human HCCs we aimed to generate hepatic tumors through CTNNB1 exon 3 deletion (βcatΔex3) and to compare them to hepatic tumors with Apc LOF engineered through a frameshift in exon 15 (Apcfs-ex15). Methods: We used hepatic-specific and inducible Cre-lox mouse models as well as a hepatic-specific in vivo CRISPR/Cas9 approach using AAV vectors, to generate Apcfs-ex15 and βcatΔex3 hepatic tumors harboring activation of the β-catenin pathway. Tumors generated by the Cre-lox models were analyzed phenotypically using immunohistochemistry and were selected for transcriptomic analysis using RNA-sequencing. Mouse RNAseq data were compared to human RNAseq data (normal tissues (8), HCCs (48) and hepatoblastomas (9)) in an integrative analysis. Tumors generated via CRISPR were analyzed using DNA sequencing and immunohistochemistry. Results: Mice with βcatΔex3 alteration in hepatocytes developed liver tumors. Generated tumors were indistinguishable from those arising in Apcfs-ex15 mice. Both Apcfs-ex15 and βcatΔex3 mouse models induced two phenotypically distinct tumors (differentiated or undifferentiated). Integrative analysis of human and mouse tumors showed that mouse differentiated tumors are close to human well differentiated CTNNB1-mutated tumors, while undifferentiated ones are closer to human mesenchymal hepatoblastomas, and are activated for YAP signaling. Conclusion: Apcfs-ex15 and βcatΔex3 mouse models similarly induce tumors transcriptionally close to either well differentiated β-Catenin activated human HCCs or mesenchymal hepatoblastomas.


2019 ◽  
Vol 15 (2) ◽  
pp. e1006799 ◽  
Author(s):  
Tyler Funnell ◽  
Allen W. Zhang ◽  
Diljot Grewal ◽  
Steven McKinney ◽  
Ali Bashashati ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document