scholarly journals FIREVAT: finding reliable variants without artifacts in human cancer samples using etiologically relevant mutational signatures

2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Hyunbin Kim ◽  
Andy Jinseok Lee ◽  
Jongkeun Lee ◽  
Hyonho Chun ◽  
Young Seok Ju ◽  
...  

Abstract Background Accurate identification of real somatic variants is a primary part of cancer genome studies and precision oncology. However, artifacts introduced in various steps of sequencing obfuscate confidence in variant calling. Current computational approaches to variant filtering involve intensive interrogation of Binary Alignment Map (BAM) files and require massive computing power, data storage, and manual labor. Recently, mutational signatures associated with sequencing artifacts have been extracted by the Pan-cancer Analysis of Whole Genomes (PCAWG) study. These spectrums can be used to evaluate refinement quality of a given set of somatic mutations. Results Here we introduce a novel variant refinement software, FIREVAT (FInding REliable Variants without ArTifacts), which uses known spectrums of sequencing artifacts extracted from one of the largest publicly available catalogs of human tumor samples. FIREVAT performs a quick and efficient variant refinement that accurately removes artifacts and greatly improves the precision and specificity of somatic calls. We validated FIREVAT refinement performance using orthogonal sequencing datasets totaling 384 tumor samples with respect to ground truth. Our novel method achieved the highest level of performance compared to existing filtering approaches. Application of FIREVAT on additional 308 The Cancer Genome Atlas (TCGA) samples demonstrated that FIREVAT refinement leads to identification of more biologically and clinically relevant mutational signatures as well as enrichment of sequence contexts associated with experimental errors. FIREVAT only requires a Variant Call Format file (VCF) and generates a comprehensive report of the variant refinement processes and outcomes for the user. Conclusions In summary, FIREVAT facilitates a novel refinement strategy using mutational signatures to distinguish artifactual point mutations called in human cancer samples. We anticipate that FIREVAT results will further contribute to precision oncology efforts that rely on accurate identification of variants, especially in the context of analyzing mutational signatures that bear prognostic and therapeutic significance. FIREVAT is freely available at https://github.com/cgab-ncc/FIREVAT

2022 ◽  
Vol 13 (1) ◽  
Author(s):  
John K. L. Wong ◽  
Christian Aichmüller ◽  
Markus Schulze ◽  
Mario Hlevnjak ◽  
Shaymaa Elgaafary ◽  
...  

AbstractCancer driving mutations are difficult to identify especially in the non-coding part of the genome. Here, we present sigDriver, an algorithm dedicated to call driver mutations. Using 3813 whole-genome sequenced tumors from International Cancer Genome Consortium, The Cancer Genome Atlas Program, and a childhood pan-cancer cohort, we employ mutational signatures based on single-base substitution in the context of tri- and penta-nucleotide motifs for hotspot discovery. Knowledge-based annotations on mutational hotspots reveal enrichment in coding regions and regulatory elements for 6 mutational signatures, including APOBEC and somatic hypermutation signatures. APOBEC activity is associated with 32 hotspots of which 11 are known and 11 are putative regulatory drivers. Somatic single nucleotide variants clusters detected at hypermutation-associated hotspots are distinct from translocation or gene amplifications. Patients carrying APOBEC induced PIK3CA driver mutations show lower occurrence of signature SBS39. In summary, sigDriver uncovers mutational processes associated with known and putative tumor drivers and hotspots particularly in the non-coding regions of the genome.


Author(s):  
Pieter-Jan van Dam ◽  
Steven Van Laere

Recent efforts by worldwide consortia such as The Cancer Genome Atlas and the International Cancer Genome Consortium have greatly accelerated our knowledge of human cancer biology. Nowadays, complete sets of human tumours that have been characterized at the genomic, epigenomic, transcriptomic, or proteomic level are available to the research community. The generation of these data was made possible thanks to the application of high-throughput molecular profiling techniques such as microarrays and next-generation sequencing. The primary conclusion from current profiling experiments is that human cancer is a complex disease characterized by extreme molecular heterogeneity, both between and within the classical, tissue-defined cancer types. This molecular variety necessitates a paradigm shift in patient management, away from generalized therapy schemes and towards more personalized treatments. This chapter provides an overview of how molecular cancer profiling can assist in facilitating this transition. First, the state-of-the-art of molecular breast cancer profiling is reviewed to provide a general background. Then, the most pertinent high-throughput molecular profiling techniques along with various data mining techniques (i.e. unsupervised clustering, statistical learning) are discussed. Finally, the challenges and perspectives with respect to molecular cancer profiling, also from the perspective of personalized medicine, are summarized.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1508 ◽  
Author(s):  
Dakota Z. Derryberry ◽  
Matthew C. Cowperthwaite ◽  
Claus O. Wilke

We examined 55 technical sequencing replicates of Glioblastoma multiforme (GBM) tumors from The Cancer Genome Atlas (TCGA) to ascertain the degree of repeatability in calling single-nucleotide variants (SNVs). We used the same mutation-calling pipeline on all pairs of samples, and we measured the extent of the overlap between two replicates; that is, how many specific point mutations were found in both replicates. We further tested whether additional filtering increased or decreased the size of the overlap. We found that about half of the putative mutations identified in one sequencing run of a given sample were also identified in the second, and that this percentage remained steady throughout orders of magnitude of variation in the total number of mutations identified (from 23 to 10,966). We further found that using filtering after SNV-calling removed the overlap completely. We concluded that there is variation in the frequency of mutations in GBMs, and that while some filtering approaches preferentially removed putative mutations found in only one replicate, others removed a large fraction of putative mutations found in both.


2021 ◽  
Vol 12 ◽  
Author(s):  
Dongjie Shi ◽  
Lei Ao ◽  
Hua Yu ◽  
Yongzhi Xia ◽  
Juan Li ◽  
...  

Some emerging studies have suggested that chromobox homolog 8 (CBX8) may play a critical role in carcinogenesis and prognosis in human cancer. Based on The Cancer Genome Atlas (TCGA)’s available data and the Gene Expression Omnibus (GEO) database, we conducted a systematic analysis of the carcinogenic effects of the CBX8 gene. We used TIMER2, GEPIA2, UALCAN, cBioPortal, Kaplan-Meier plotter, OncoLnc, STRING, HPA, and Oncomine data analysis websites and R data analysis software to analyze available data. The results show that the level of expression of CBX8 was significantly different among 27 different types of tumors and adjacent normal tissues. Moreover, we found that CBX8 expression had a close relationship with prognosis in some kinds of cancers. The phosphorylation level of some protein sites (such as S256) was significantly increased in tumors. CD8 + T-cell, B-cell and cancer-associated fibroblast infiltration levels were associated with CBX8 expression. The results of enrichment analysis indicated that the main biological activities of CBX8 are connected to gene transcription and repair of DNA damage. In conclusion, the level of expression of CBX8 was closely related to carcinogenesis and prognosis of some kinds of tumors, which needs further experimental verification.


2020 ◽  
Author(s):  
Christoffer Flensburg ◽  
Alicia Oshlack ◽  
Ian J. Majewski

AbstractCalling copy number alterations (CNAs) from RNA-Seq is challenging, because differences in gene expression mean that read depth across genes varies by several orders of magnitude and there is a paucity of informative single nucleotide polymorphisms (SNPs). We previously developed SuperFreq to analyse exome data of tumours by combining variant calling and copy number estimation in an integrated pipeline. Here we have used the SuperFreq framework for the analysis of RNA sequencing (RNA-Seq) data, which allows for the detection of absolute and allele sensitive CNAs. SuperFreq uses an error-propagation framework to combine and maximise the information available in the read depth and B-allele frequencies of SNPs (BAFs) to make CNA calls on RNA-seq data. We used data from The Cancer Genome Atlas (TCGA) to evaluate the CNA called from RNA-Seq with those generated from SNP-arrays. When ploidy estimates were consistent, we found excellent agreement with CNAs called from DNA of over 98% of the genome for acute myeloid leukaemia (TCGA-AML, n=116) and 87% for colorectal cancer (TCGA-CRC, n=377), which has a much higher CNA burden. As expected, the sensitivity of CNA calling from RNA-Seq was dependent on gene density. Nonetheless, using RNA-Seq SuperFreq detected 78% of CNA calls covering 100 or more genes with a precision of 94%. Recall dropped markedly for focal events, but this also depended on the signal intensity. For example, in the CRC cohort SuperFreq identified 100% (7/7) of cases with high-level amplification of ERBB2, where the copy number was typically >20, but identified only 6% (1/17) of cases with moderate amplification of IGF2, typically 4 or 5 copies over a smaller region (median 5 flanking genes for IGF2, compared to 20 for ERBB2). We were able to reproduce the relationship between mutational load and CNA profile in CRC using RNA-Seq alone. SuperFreq offers an integrated platform for identification of CNAs and point mutations from RNA-seq in cancer transcriptomes.The software is implemented in R and is available through GitHub: https://github.com/ChristofferFlensburg/SuperFreq.


Author(s):  
Xiaoyu He ◽  
Shanyu Chen ◽  
Ruilin Li ◽  
Xinyin Han ◽  
Zhipeng He ◽  
...  

Abstract Next-generation sequencing (NGS) technology has revolutionised human cancer research, particularly via detection of genomic variants with its ultra-high-throughput sequencing and increasing affordability. However, the inundation of rich cancer genomics data has resulted in significant challenges in its exploration and translation into biological insights. One of the difficulties in cancer genome sequencing is software selection. Currently, multiple tools are widely used to process NGS data in four stages: raw sequence data pre-processing and quality control (QC), sequence alignment, variant calling and annotation and visualisation. However, the differences between these NGS tools, including their installation, merits, drawbacks and application, have not been fully appreciated. Therefore, a systematic review of the functionality and performance of NGS tools is required to provide cancer researchers with guidance on software and strategy selection. Another challenge is the multidimensional QC of sequencing data because QC can not only report varied sequence data characteristics but also reveal deviations in diverse features and is essential for a meaningful and successful study. However, monitoring of QC metrics in specific steps including alignment and variant calling is neglected in certain pipelines such as the ‘Best Practices Workflows’ in GATK. In this review, we investigated the most widely used software for the fundamental analysis and QC of cancer genome sequencing data and provided instructions for selecting the most appropriate software and pipelines to ensure precise and efficient conclusions. We further discussed the prospects and new research directions for cancer genomics.


2015 ◽  
Author(s):  
Lihua Zou

Despite large-scale efforts to systematically map the cancer genome, little is known about how the interplay of genetic and epigenetic alternations shapes the architecture of the transcriptome of human cancer. With the goal of constructing a system-level view of the deregulated pathways in cancer cells, we systematically investigated the functional organization of the transcriptomes of 10 tumor types using data sets generated by The Cancer Genome Atlas project (TCGA). Our analysis indicates that the human cancer transcriptome is organized into well-conserved modules of co-expressed genes. In particular, our analysis identified a set of conserved gene modules with distinct cancer hallmark themes involving cell cycle regulation, angiogenesis, innate and adaptive immune response, differentiation, metabolism and regulation of protein phosphorylation. Our analysis provided global views of convergent transcriptome architecture of human cancer. The result of our analysis can serve as a foundation to link diverse genomic alternations to common transcriptomic features in human cancer.


2020 ◽  
Vol 36 (12) ◽  
pp. 3637-3644 ◽  
Author(s):  
Mark F Rogers ◽  
Tom R Gaunt ◽  
Colin Campbell

Abstract Motivation Next-generation sequencing technologies have accelerated the discovery of single nucleotide variants in the human genome, stimulating the development of predictors for classifying which of these variants are likely functional in disease, and which neutral. Recently, we proposed CScape, a method for discriminating between cancer driver mutations and presumed benign variants. For the neutral class, this method relied on benign germline variants found in the 1000 Genomes Project database. Discrimination could, therefore, be influenced by the distinction of germline versus somatic, rather than neutral versus disease driver. This motivates this article in which we consider predictive discrimination between recurrent and rare somatic single point mutations based solely on using cancer data, and the distinction between these two somatic classes and germline single point mutations. Results For somatic point mutations in coding and non-coding regions of the genome, we propose CScape-somatic, an integrative classifier for predictively discriminating between recurrent and rare variants in the human cancer genome. In this study, we use purely cancer genome data and investigate the distinction between minimal occurrence and significantly recurrent somatic single point mutations in the human cancer genome. We show that this type of predictive distinction can give novel insight, and may deliver more meaningful prediction in both coding and non-coding regions of the cancer genome. Tested on somatic mutations, CScape-somatic outperforms alternative methods, reaching 74% balanced accuracy in coding regions and 69% in non-coding regions, whereas even higher accuracy may be achieved using thresholds to isolate high-confidence predictions. Availability and implementation Predictions and software are available at http://CScape-somatic.biocompute.org.uk/. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 36 (4_suppl) ◽  
pp. 50-50 ◽  
Author(s):  
Ari Rosenberg ◽  
Derek Wainwright ◽  
Alfred Rademaker ◽  
Carlos Galvez ◽  
Matthew Genet ◽  
...  

50 Background: Immune checkpoint inhibition of PD-L1 is emerging as an important therapeutic target for patients with advanced esophageal cancer. However, response rates to therapy remain low. IDO1 is a rate-limiting immunosuppressive enzyme that has emerged as an important immunotherapeutic target in human cancer. The role, expression pattern, and relevance of IDO1 in esophageal cancer are currently unknown. Here, we utilize gene expression analysis of the cancer genome atlas and quantitative immunohistochemistry (IHC) to understand whether IDO1 contributes to a poor esophageal cancer patient prognosis. Methods: mRNA expression was assessed using Hi-RNA sequencing in an esophageal squamous cell carcinoma (SCC) cohort of 87 patients and an adenocarcinoma (AC) cohort of 97 patients. Survival data was obtained from the Cancer Genome Atlas. Patient survival was analyzed by the Kaplan-Meier Method. IHC for a second cohort of 93 cases of esophageal SCC were stained for IDO1, PD-L1, and CD3ε, followed by light microscopic immunoscoring analysis. Correlation between markers was analyzed using Fisher’s exact test. Results: The median OS for high versus low IDO1 mRNA levels was 15.9 months vs 41.5 months, respectively (p =0.02) in the SCC cohort. The median OS was 20.1 months and 58.6 months in the high vs low IDO1 mRNA levels, respectively (p = 0.036) in the esophageal AC cohort. High co-expression for IDO1 and PD-L1 vs low co-expression of these markers, demonstrated a median OS of 15.1 months and 41.5 months, respectively, in the SCC cohort, and 13.7 months and 41.5 months, respectively, in the AC cohort. IHC for IDO1 SCC showed a significant correlation with PD-L1 (p = < 0.0001) and CD3ε (p = < 0.0001). PD-L1 expression also significantly correlated with CD3ε expression (p = < 0.0001). Conclusions: Esophageal cancer with high IDO1 and PD-L1 levels is associated with significantly decreased survival. The expression of IDO1 and PD-L1 is significantly enhanced by the coincident intratumoral increase of T cells. These data suggest that combinatorial approaches for combination therapies that simultaneously inhibit IDO1 and PD-(L)1 may enhance T-cell mediated control of esophageal cancer in patients.


Sign in / Sign up

Export Citation Format

Share Document