scholarly journals CANCERSIGN: a user-friendly and robust tool for identification and classification of mutational signatures and patterns in cancer genomes

2018 ◽  
Author(s):  
Masroor Bayati ◽  
Hamid Reza Rabiee ◽  
Mehrdad Mehrbod ◽  
Fatemeh Vafaee ◽  
Diako Ebrahimi ◽  
...  

Analyses of large somatic mutation datasets, using advanced computational algorithms, have revealed at least 30 independent mutational signatures in tumor samples. These studies have been instrumental in identification and quantification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a user-friendly graphical interface for analysis of cancer mutational signatures is necessary. In this manuscript, we introduce CANCERSIGN as an open access bioinformatics tool that uses raw mutation data (BED files) as input, and identifies 3-mer and 5-mer mutational signatures. CANCERSIGN enables users to identify signatures within whole genome, whole exome or pooled samples. It can also identify signatures in specific regions of the genome (defined by user). Additionally, this tool enables users to perform clustering on tumor samples based on the raw mutation counts as well as using the proportion of mutational signatures in each sample. Using this tool, we analysed all the whole genome somatic mutation datasets profiled by the International Cancer Genome Consortium (ICGC) and identified a number of novel signatures. By examining signatures found in exonic and non-exonic regions of the genome using WGS and comparing this to signatures found in WES data we observe that WGS can identify additional non-exonic signatures that are enriched in the non-coding regions of the genome while the deeper sequencing of WES may help identify weak signatures that are otherwise missed in shallower WGS data.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Masroor Bayati ◽  
Hamid R. Rabiee ◽  
Mehrdad Mehrbod ◽  
Fatemeh Vafaee ◽  
Diako Ebrahimi ◽  
...  

Author(s):  
Elana J. Fertig ◽  
Robbert Slebos ◽  
Christine H. Chung

Overview: Sequencing of the human genome was completed in 2001. Building on the technology and experience of whole-exome sequencing, numerous cancer genomes have been sequenced, including head and neck squamous cell carcinoma (HNSCC) in 2011. Although DNA sequencing data reveals a complex genome with numerous mutations, the biologic interaction and clinical significance of the overall genetic aberrations are largely unknown. Comprehensive analyses of the tumors using genomics and proteomics beyond sequencing data can potentially accelerate the rate and number of biomarker discoveries to improve biology-driven classification of tumors for prognosis and patient selection for a specific therapy. In this review, we will summarize the current genomic and proteomic technologies, general biomarker-discovery paradigms using the technology and published data in HNSCC—including potential clinical applications and limitations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jing Chen ◽  
Jun-tao Guo

AbstractInsertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 5072-5072
Author(s):  
Simon Yuen Fai Fu ◽  
Elie Ritch ◽  
Cameron Herberts ◽  
Steven Yip ◽  
Daniel Khalaf ◽  
...  

5072 Background: A small proportion of metastatic PC exhibit outlier somatic mutation (mut) rates exceeding the average of 4.4 mut/Mb. The incidence, clinical course and treatment response of pts with hypermutation (HM) is poorly characterised. Methods: We performed targeted sequencing from a panel of PC genes using plasma cell-free DNA samples collected from metastatic castration-resistant prostate cancer (mCRPC) pts and calculated somatic mutation burden. HM samples were additionally subjected to whole exome sequencing to determine trinucleotide mutational signatures and microsatellite instability (MSI). Clinical data was retrospectively collected and compared to a control cohort of 199 mCRPC pts. Results: 671 samples from 434 pts had ctDNA > 2% and were evaluable. 32 samples from 24 pts had > 11 mut/Mb and fell above the 95th percentile for mutation burden with a median mutation burden of 34 mut/Mb. 11 pts had deleterious mutations or homozygous deletions in mismatch repair (MMR) genes and 4 further pts had evidence of MMR deficiency (MMRd) from mutational signatures and MSI status. The remaining 9 pts had either BRCA2 mutations (n = 4), Kataegis (localized hypermutation, n = 3), or undefined causes for HM (n = 2). The incidence of MMRd was 3.5% (15/434), and germline MMRd was 0.2% (1/434). For MMRd pts with available clinical data (10/15) at diagnosis, the median age was 73.6 y, 70% had Gleason score ≥8, and 50% presented with M1 disease. Comparing the MMRd with the control cohort, median time from ADT to CRPC was 9.1 m (95% CI 6.9–11.4) vs. 18.2 m (95% CI 15.1–21.3), p = 0.001; median time from CRPC to death was 13.1 m (95% CI 0.3–25.9) vs. 40.1 m (95% CI 32.4–47.8), p < 0.001. Conclusions: HM and MMRd can be identified using liquid biopsy and could help to select pts for immunotherapy.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 2022-2022
Author(s):  
Pauline Robbe ◽  
Kate Ridout ◽  
Jennifer Becq ◽  
Miao He ◽  
Ruth Clifford ◽  
...  

Abstract Background Chemo-immunotherapy (CIT) with fludarabine, cyclophosphamide, and rituximab (FCR) is the standard of care in frontline treatment of CLL. With this approach, 25% of patients relapse within 24 months, whereas approximately one third of patients with hypermutated immunoglobulin heavy chains (IgHV) achieve a functional cure (Hallek et al. Lancet. 2010; Tam et al. Blood, 2014, Fischer et al, Blood 2015; Philip A. Thompson et al. Blood 2016). So far, mutations and/or deletions of TP53 remain the only predictive marker screened for in routine clinical practice, accounting for only one third of patients relapsing early after CIT. Recent next-generation sequencing (NGS) studies have revealed novel candidate predictors of early relapse including somatic mutations in RPS15 (Landau et al. Nature, 2015) and SAMHD1 (Clifford et al., submitted). Taken together with TP53disruption, these only occur in a subset of high-risk patients. Here, we present a comprehensive analysis of high-risk patients using Whole Genome Sequencing (WGS). Patients and Methods Using WGS we investigated 149 CLL patients from 5 national UK clinical trials: CLEAR (n=8), RIAltO (n=45), CLL 210 (n=22), ARCTIC (n=32) and AdMIRe (n=42). The two first line FCR-based clinical trials (ARCTIC and AdMIRe) were studied in most detail: 56 patients relapsed within 24 months; this group of patients will be referred to as high risk patients. Leukemia samples (peripheral blood) and germline samples (saliva) were collected for each patient. We performed WGS on the HiSeqX (Illumina). After read alignment, we detected somatic variants using Strelka 2.4.7 for small variants detection (SNV and InDels), Manta 0.28.0 for Structural variant (SV) detection, and Canvas 1.3.1 for Copy number variant (CNV) detection (Illumina). Non-coding regions were annotated with information from primary CLL, CLL cell lines and B-cell ENCODE databases. We interrogated the data at a gene scale and global level in order to identify patterns of early relapsing patients. Operative mutational signatures were analysed according to Alexandrov et al. (Nature, 2013). Putative regions of kataegis were calculated based on Lawrence et al. (Nature, 2013) and Alexandrov et al. (Nature, 2013). Results The mean coverage for CLL tumour and germline samples was 105.2X and 33.7X, respectively. The analysis of the whole cohort highlighted 1,723,603 somatic SNVs (mean= 11,570/sample) and 555,179 InDels (mean= 3,726/sample). Somatic SNVs spectrum consisted mainly of C>T/G>A mutations (30% of total SNVs reported) as previously described. The analysis of 13,490 somatic functional SNVs and InDels revealed novel candidate genes as most commonly mutated in the cohort. In high-risk patients, we noticed an enrichment of mutations in known genes such as TP53, genes of the NF-κB pathway and novel candidate genes previously reported in other cancers. A specific analysis of the functional coding mutations of known CLL driver genes revealed ATM, SF3B1 and IGLL5 as most commonly mutated genes in FCR responders compared to TP53, RPS15 and EGR2 in high risk patients. In depth analysis of somatic non-coding regions also identified potential new candidate regions associated with early relapse. Next, we investigated 52,871 CNAs (mean= 380/sample) and 29,080 SVs (mean= 195/sample) and identified as expected del13q, del17p, del11q and tri12 as the most frequent aberrations. In addition, we identified SVs across genes of interest in CLL, for instance TP53, ATM and BIRC3. Finally, we performed global genome analyses with investigation of mutational signatures and kataegis analyses highlighting hypermutated candidate regions, including the previously described IGLL5gene. Conclusion Here we present initial analysis of WGS data on 149 CLL patients from 5 UK clinical trials. Different patterns of mutations between low and high risk clinical groups are suggested. More detailed analysis with greater numbers of samples is ongoing and will determine the true clinical significance of these preliminary findings. The possibility of using WGS to aid clinical decision-making is becoming a realistic goal. Disclosures Becq: Illumina: Employment. He:Illumina: Employment. Pettitt:Celgene: Speakers Bureau; Gilead: Research Funding, Speakers Bureau; Roche: Research Funding, Speakers Bureau; Infinity: Research Funding. Hillmen:Pharmacyclics: Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; Gilead: Honoraria, Research Funding; Abbvie: Research Funding. Bentley:Illumina: Employment. Schuh:Gilead: Consultancy, Honoraria, Research Funding; Roche, Janssen, Novartis, Celgene, Abbvie: Consultancy, Honoraria.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Itay Sason ◽  
Yuexi Chen ◽  
Mark D.M. Leiserson ◽  
Roded Sharan

AbstractMutational signatures are key to understanding the processes that shape cancer genomes, yet their analysis requires relatively rich whole-genome or whole-exome mutation data. Recently, orders-of-magnitude sparser gene-panel-sequencing data have become increasingly available in the clinic. To deal with such sparse data, we suggest a novel mixture model, . In application to simulated and real gene-panel sequences, is shown to outperform current approaches and yield mutational signatures and patient stratifications that are in higher agreement with the literature. We further demonstrate its utility in several clinical settings, successfully predicting therapy benefit and patient groupings from MSK-IMPACT pan-cancer data. Availability: https://github.com/itaysason/Mix-MMM.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Flavia Mascagni ◽  
Gabriele Usai ◽  
Andrea Cavallini ◽  
Andrea Porceddu

AbstractWe identified and characterized the pseudogene complements of five plant species: four dicots (Arabidopsis thaliana, Vitis vinifera, Populus trichocarpa and Phaseolus vulgaris) and one monocot (Oryza sativa). Retroposition was considered of modest importance for pseudogene formation in all investigated species except V. vinifera, which showed an unusually high number of retro-pseudogenes in non coding genic regions. By using a pipeline for the classification of sequence duplicates in plant genomes, we compared the relative importance of whole genome, tandem, proximal, transposed and dispersed duplication modes in the pseudo and functional gene complements. Pseudogenes showed higher tendencies than functional genes to genomic dispersion. Dispersed pseudogenes were prevalently fragmented and showed high sequence divergence at flanking regions. On the contrary, those deriving from whole genome duplication were proportionally less than expected based on observations on functional loci and showed higher levels of flanking sequence conservation than dispersed pseudogenes. Pseudogenes deriving from tandem and proximal duplications were in excess compared to functional loci, probably reflecting the high evolutionary rate associated with these duplication modes in plant genomes. These data are compatible with high rates of sequence turnover at neutral sites and double strand break repairs mediated duplication mechanisms.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Xuan Zong ◽  
Ying Zhang ◽  
Xinxin Peng ◽  
Dongyan Cao ◽  
Mei Yu ◽  
...  

AbstractYolk sac tumors (YSTs) are a major histological subtype of malignant ovarian germ cell tumors with a relatively poor prognosis. The molecular basis of this disease has not been thoroughly characterized at the genomic level. Here we perform whole-exome and RNA sequencing on 41 clinical tumor samples from 30 YST patients, with distinct responses to cisplatin-based chemotherapy. We show that microsatellite instability status and mutational signatures are informative of chemoresistance. We identify somatic driver candidates, including significantly mutated genes KRAS and KIT and copy-number alteration drivers, including deleted ARID1A and PARK2, and amplified ZNF217, CDKN1B, and KRAS. YSTs have very infrequent TP53 mutations, whereas the tumors from patients with abnormal gonadal development contain both KRAS and TP53 mutations. We further reveal a role of OVOL2 overexpression in YST resistance to cisplatin. This study lays a critical foundation for understanding key molecular aberrations in YSTs and developing related therapeutic strategies.


Sign in / Sign up

Export Citation Format

Share Document