scholarly journals Generating realistic null hypothesis of cancer mutational landscapes using SigProfilerSimulator

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Erik N. Bergstrom ◽  
Mark Barnes ◽  
Iñigo Martincorena ◽  
Ludmil B. Alexandrov

Abstract Background Performing a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries. Results Here we present SigProfilerSimulator, a powerful tool that is capable of simulating the mutational landscapes of thousands of cancer genomes at different resolutions within seconds. Applying SigProfilerSimulator to 2144 whole-genome sequenced cancers reveals: (i) that most doublet base substitutions are not due to two adjacent single base substitutions but likely occur as single genomic events; (ii) that an extended sequencing context of ± 2 bp is required to more completely capture the patterns of substitution mutational signatures in human cancer; (iii) information on false-positive discovery rate of commonly used bioinformatics tools for detecting driver genes. Conclusions SigProfilerSimulator’s breadth of features allows one to construct a tailored null hypothesis and use it for evaluating the accuracy of other bioinformatics tools or for downstream statistical analysis for biological discoveries. SigProfilerSimulator is freely available at https://github.com/AlexandrovLab/SigProfilerSimulator with an extensive documentation at https://osf.io/usxjz/wiki/home/.

Author(s):  
Erik N. Bergstrom ◽  
Mark Barnes ◽  
Iñigo Martincorena ◽  
Ludmil B. Alexandrov

ABSTRACTPerforming a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries. Here we present SigProfilerSimulator, a powerful tool that is capable of simulating the mutational landscapes of thousands of cancer genomes at different resolutions within seconds. Applying SigProfilerSimulator to 2,144 whole-genome sequenced cancers reveals: (i) that most doublet base substitutions are not due to two adjacent single base substitutions but likely occur as single genomic events; (ii) that an extended sequencing context of +/-2bp is required to more completely capture the patterns of substitution mutational signatures in human cancer; (iii) information on false-positive discovery rate of commonly used bioinformatics tools for detecting driver genes. SigProfilerSimulator’s breadth of features allows one to construct a tailored null hypothesis and use it for evaluating the accuracy of other bioinformatics tools or for downstream statistical analysis for biological discoveries.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 586
Author(s):  
Zhi Yang ◽  
Priyatama Pandey ◽  
Paul Marjoram ◽  
Kimberly D. Siegmund

There are two frameworks for characterizing mutational signatures which are commonly used to describe the nucleotide patterns that arise from mutational processes. Estimated mutational signatures from fitting these two methods in human cancer can be found online, in the Catalogue Of Somatic Mutations In Cancer (COSMIC) website or a GitHub repository. The two frameworks make differing assumptions regarding independence of base pairs and for that reason may produce different results. Consequently, there is a need to compare and contrast the results of the two methods, but no such tool currently exists. In this paper, we provide a simple and intuitive interface that allows such comparisons to be easily performed. When using our software, the user may download published mutational signatures of either type. Mutational signatures from the pmsignature data source are expanded to probabilistic vectors of 96-possible mutation types, the same model specification used by COSMIC, and then compared to COSMIC signatures. Cosine similarity measures the extent of signature similarity. iMutSig provides a simple and user-friendly web application allowing researchers to compare signatures from COSMIC to those from pmsignature, and vice versa. Furthermore, iMutSig allows users to input a self-defined mutational signature and examine its similarity to published signatures from both data sources. iMutSig is accessible online and source code is available for download on GitHub.


2021 ◽  
Author(s):  
Erik N Bergstrom ◽  
Jens-Christian Luebeck ◽  
Mia Petljak ◽  
Vineet Bafna ◽  
Paul S. Mischel ◽  
...  

Clustered somatic mutations are common in cancer genomes with prior analyses revealing several types of clustered single-base substitutions, including doublet- and multi-base substitutions, diffuse hypermutation termed omikli, and longer strand-coordinated events termed kataegis. Here, we provide a comprehensive characterization of clustered substitutions and clustered small insertions and deletions (indels) across 2,583 whole-genome sequenced cancers from 30 cancer types. While only 3.7% of substitutions and 0.9% of indels were found to be clustered, they contributed 8.4% and 6.9% of substitution and indel drivers, respectively. Multiple distinct mutational processes gave rise to clustered indels including signatures enriched in tobacco smokers and homologous-recombination deficient cancers. Doublet-base substitutions were caused by at least 12 mutational processes, while the majority of multi-base substitutions were generated by either tobacco smoking or exposure to ultraviolet light. Omikli events, previously attributed to the activity of APOBEC3 deaminases, accounted for a large proportion of clustered substitutions. However, only 16.2% of omikli matched APOBEC3 patterns with experimental validation confirming additional mutational processes giving rise to omikli. Kataegis was generated by multiple mutational processes with 76.1% of all kataegic events exhibiting AID/APOBEC3-associated mutational patterns. Co-occurrence of APOBEC3 kataegis and extrachromosomal-DNA (ecDNA) was observed in 31% of samples with ecDNA. Multiple distinct APOBEC3 kataegic events were observed on most mutated ecDNA. ecDNA containing known cancer genes exhibited both positive selection and kataegic hypermutation. Our results reveal the diversity of clustered mutational processes in human cancer and the role of APOBEC3 in recurrently mutating and fueling the evolution of ecDNA.


2021 ◽  
Author(s):  
Mia Petljak ◽  
Kevan Chu ◽  
Alexandra Dananberg ◽  
Erik N. Bergstrom ◽  
Patrick von Morgen ◽  
...  

ABSTRACTThe APOBEC3 family of cytidine deaminases is widely speculated to be a major source of somatic mutations in cancer1–3. However, causal links between APOBEC3 enzymes and mutations in human cancer cells have not been established. The identity of the APOBEC3 paralog(s) that may act as prime drivers of mutagenesis and the mechanisms underlying different APOBEC3-associated mutational signatures are unknown. To directly investigate the roles of APOBEC3 enzymes in cancer mutagenesis, candidate APOBEC3 genes were deleted from cancer cell lines recently found to naturally generate APOBEC3-associated mutations in episodic bursts4. Deletion of the APOBEC3A paralog severely diminished the acquisition of mutations of speculative APOBEC3 origins in breast cancer and lymphoma cell lines. APOBEC3 mutational burdens were undiminished in APOBEC3B knockout cell lines. APOBEC3A deletion reduced the appearance of the clustered mutation types kataegis and omikli, which are frequently found in cancer genomes. The uracil glycosylase UNG and the translesion polymerase REV1 were found to play critical roles in the generation of mutations induced by APOBEC3A. These data represent the first evidence for a long-postulated hypothesis that APOBEC3 deaminases generate prevalent clustered and non-clustered mutational signatures in human cancer cells, identify APOBEC3A as a driver of episodic mutational bursts, and dissect the roles of the relevant enzymes in generating the associated mutations in breast cancer and B cell lymphoma cell lines.


2021 ◽  
Author(s):  
John Maciejowski ◽  
Mia Petljak ◽  
Kevan Chu ◽  
Alexandra Dananberg ◽  
Erik Bergstrom ◽  
...  

Abstract The APOBEC3 family of cytidine deaminases is widely speculated to be a major source of somatic mutations in cancer1–3. However, causal links between APOBEC3 enzymes and mutations in human cancer cells have not been established. The identity of the APOBEC3 paralog(s) that may act as prime drivers of mutagenesis and the mechanisms underlying different APOBEC3-associated mutational signatures are unknown. To directly investigate the roles of APOBEC3 enzymes in cancer mutagenesis, candidate APOBEC3 genes were deleted from cancer cell lines recently found to naturally generate APOBEC3-associated mutations in episodic bursts4. Deletion of the APOBEC3A paralog severely diminished the acquisition of mutations of speculative APOBEC3 origins in breast cancer and lymphoma cell lines. APOBEC3 mutational burdens were undiminished in APOBEC3B knockout cell lines. APOBEC3A deletion reduced the appearance of the clustered mutation types kataegis and omikli, which are frequently found in cancer genomes. The uracil glycosylase UNG and the translesion polymerase REV1 were found to play critical roles in the generation of mutations induced by APOBEC3A. These data represent the first evidence for a long-postulated hypothesis that APOBEC3 deaminases generate prevalent clustered and non-clustered mutational signatures in human cancer cells, identify APOBEC3A as a driver of episodic mutational bursts, and dissect the roles of the relevant enzymes in generating the associated mutations in breast cancer and B cell lymphoma cell lines.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 586
Author(s):  
Zhi Yang ◽  
Priyatama Pandey ◽  
Paul Marjoram ◽  
Kimberly D. Siegmund

There are two frameworks for characterizing mutational signatures which are commonly used to describe the nucleotide patterns that arise from mutational processes. Estimated mutational signatures from fitting these two methods in human cancer can be found online, in the Catalogue Of Somatic Mutations In Cancer (COSMIC) website or a GitHub repository. The two frameworks make differing assumptions regarding independence of base pairs and for that reason may produce different results. Consequently, there is a need to compare and contrast the results of the two methods, but no such tool currently exists. In this paper, we provide a simple and intuitive interface that allows comparisons of pairs of mutational signatures to be easily performed. Cosine similarity measures the extent of signature similarity. To compare mutational signatures of different formats, one signature type (COSMIC or pmsignature) is converted to the format of the other before the signatures are compared. iMutSig provides a simple and user-friendly web application allowing researchers to download published mutational signatures of either type and to compare signatures from COSMIC to those from pmsignature, and vice versa. Furthermore, iMutSig allows users to input a self-defined mutational signature and examine its similarity to published signatures from both data sources. iMutSig is accessible online and source code is available for download from GitHub.


2018 ◽  
Author(s):  
Ludmil B Alexandrov ◽  
Jaegil Kim ◽  
Nicholas J Haradhvala ◽  
Mi Ni Huang ◽  
Alvin WT Ng ◽  
...  

ABSTRACTSomatic mutations in cancer genomes are caused by multiple mutational processes each of which generates a characteristic mutational signature. Using 84,729,690 somatic mutations from 4,645 whole cancer genome and 19,184 exome sequences encompassing most cancer types we characterised 49 single base substitution, 11 doublet base substitution, four clustered base substitution, and 17 small insertion and deletion mutational signatures. The substantial dataset size compared to previous analyses enabled discovery of new signatures, separation of overlapping signatures and decomposition of signatures into components that may represent associated, but distinct, DNA damage, repair and/or replication mechanisms. Estimation of the contribution of each signature to the mutational catalogues of individual cancer genomes revealed associations with exogenous and endogenous exposures and defective DNA maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes contributing to the development of human cancer including a comprehensive reference set of mutational signatures in human cancer.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marleen M. Nieboer ◽  
Luan Nguyen ◽  
Jeroen de Ridder

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.


Sign in / Sign up

Export Citation Format

Share Document