scholarly journals A mixture model for signature discovery from sparse mutation data

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Itay Sason ◽  
Yuexi Chen ◽  
Mark D.M. Leiserson ◽  
Roded Sharan

AbstractMutational signatures are key to understanding the processes that shape cancer genomes, yet their analysis requires relatively rich whole-genome or whole-exome mutation data. Recently, orders-of-magnitude sparser gene-panel-sequencing data have become increasingly available in the clinic. To deal with such sparse data, we suggest a novel mixture model, . In application to simulated and real gene-panel sequences, is shown to outperform current approaches and yield mutational signatures and patient stratifications that are in higher agreement with the literature. We further demonstrate its utility in several clinical settings, successfully predicting therapy benefit and patient groupings from MSK-IMPACT pan-cancer data. Availability: https://github.com/itaysason/Mix-MMM.

Author(s):  
Elana J. Fertig ◽  
Robbert Slebos ◽  
Christine H. Chung

Overview: Sequencing of the human genome was completed in 2001. Building on the technology and experience of whole-exome sequencing, numerous cancer genomes have been sequenced, including head and neck squamous cell carcinoma (HNSCC) in 2011. Although DNA sequencing data reveals a complex genome with numerous mutations, the biologic interaction and clinical significance of the overall genetic aberrations are largely unknown. Comprehensive analyses of the tumors using genomics and proteomics beyond sequencing data can potentially accelerate the rate and number of biomarker discoveries to improve biology-driven classification of tumors for prognosis and patient selection for a specific therapy. In this review, we will summarize the current genomic and proteomic technologies, general biomarker-discovery paradigms using the technology and published data in HNSCC—including potential clinical applications and limitations.


2020 ◽  
Author(s):  
Chao Chen ◽  
Songming Liu ◽  
Heng Xiong ◽  
Xi Zhang ◽  
Bo Li

AbstractThis study was aimed to investigate the mutations in Esophageal Carcinoma (EC) for recurrent neoantigen identification. A total of 733 samples with whole exome sequencing (WES) mutation data and 1153 samples with target region sequencing data were obtained from 7 published studies and GENIE database. Common HLA-I and HLA-II genotypes in both TCGA cohort and Chinese were used to predict the probability of ‘public’ neoantigens in the dataset. Based on the integrated data, we not only obtained the most comprehensive EC mutation landscape so far, but also found 253 mutation sites which could be identified in at least 3 or more patients, including, TP53 p.R248Q, PIK3CA p.E545K, PIK3CA p.E542K, KRAS p.G12D, PIK3CA p.H1047R and TP53 p.C83F. These mutations can be recognized by multiple common HLA molecules (HLA-A11:01, HLA-B57:01, HLA-A03:01, DRB1-0301, DRB1-1202, et al.) in Chinese and TCGA cohort as potential public neoantigens. Overall, our analysis provides some potential targets for EC immunotherapy.


2018 ◽  
Author(s):  
Masroor Bayati ◽  
Hamid Reza Rabiee ◽  
Mehrdad Mehrbod ◽  
Fatemeh Vafaee ◽  
Diako Ebrahimi ◽  
...  

Analyses of large somatic mutation datasets, using advanced computational algorithms, have revealed at least 30 independent mutational signatures in tumor samples. These studies have been instrumental in identification and quantification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a user-friendly graphical interface for analysis of cancer mutational signatures is necessary. In this manuscript, we introduce CANCERSIGN as an open access bioinformatics tool that uses raw mutation data (BED files) as input, and identifies 3-mer and 5-mer mutational signatures. CANCERSIGN enables users to identify signatures within whole genome, whole exome or pooled samples. It can also identify signatures in specific regions of the genome (defined by user). Additionally, this tool enables users to perform clustering on tumor samples based on the raw mutation counts as well as using the proportion of mutational signatures in each sample. Using this tool, we analysed all the whole genome somatic mutation datasets profiled by the International Cancer Genome Consortium (ICGC) and identified a number of novel signatures. By examining signatures found in exonic and non-exonic regions of the genome using WGS and comparing this to signatures found in WES data we observe that WGS can identify additional non-exonic signatures that are enriched in the non-coding regions of the genome while the deeper sequencing of WES may help identify weak signatures that are otherwise missed in shallower WGS data.


2018 ◽  
Vol 39 (3) ◽  
pp. 159-167 ◽  
Author(s):  
Keiichi HATAKEYAMA ◽  
Takeshi NAGASHIMA ◽  
Kenichi URAKAMI ◽  
Keiichi OHSHIMA ◽  
Masakuni SERIZAWA ◽  
...  

2015 ◽  
Vol 32 (6) ◽  
pp. 926-928 ◽  
Author(s):  
Xuefeng Wang ◽  
Mengjie Chen ◽  
Xiaoqing Yu ◽  
Natapol Pornputtapong ◽  
Hao Chen ◽  
...  

Abstract Summary: In this article, we introduce a robust and efficient strategy for deriving global and allele-specific copy number alternations (CNA) from cancer whole exome sequencing data based on Log R ratios and B-allele frequencies. Applying the approach to the analysis of over 200 skin cancer samples, we demonstrate its utility for discovering distinct CNA events and for deriving ancillary information such as tumor purity. Availability and implementation: https://github.com/xfwang/CLOSE Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jong Seop Kim ◽  
Hyoungseok Jeon ◽  
Hyeran Lee ◽  
Jung Min Ko ◽  
Yonghwan Kim ◽  
...  

AbstractAn 11-year-old Korean boy presented with short stature, hip dysplasia, radial head dislocation, carpal coalition, genu valgum, and fixed patellar dislocation and was clinically diagnosed with Steel syndrome. Scrutinizing the trio whole-exome sequencing data revealed novel compound heterozygous mutations of COL27A1 (c.[4229_4233dup]; [3718_5436del], p.[Gly1412Argfs*157];[Gly1240_Lys1812del]) in the proband, which were inherited from heterozygous parents. The maternal mutation was a large deletion encompassing exons 38–60, which was challenging to detect.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Xuan Zong ◽  
Ying Zhang ◽  
Xinxin Peng ◽  
Dongyan Cao ◽  
Mei Yu ◽  
...  

AbstractYolk sac tumors (YSTs) are a major histological subtype of malignant ovarian germ cell tumors with a relatively poor prognosis. The molecular basis of this disease has not been thoroughly characterized at the genomic level. Here we perform whole-exome and RNA sequencing on 41 clinical tumor samples from 30 YST patients, with distinct responses to cisplatin-based chemotherapy. We show that microsatellite instability status and mutational signatures are informative of chemoresistance. We identify somatic driver candidates, including significantly mutated genes KRAS and KIT and copy-number alteration drivers, including deleted ARID1A and PARK2, and amplified ZNF217, CDKN1B, and KRAS. YSTs have very infrequent TP53 mutations, whereas the tumors from patients with abnormal gonadal development contain both KRAS and TP53 mutations. We further reveal a role of OVOL2 overexpression in YST resistance to cisplatin. This study lays a critical foundation for understanding key molecular aberrations in YSTs and developing related therapeutic strategies.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Kelley Paskov ◽  
Jae-Yoon Jung ◽  
Brianna Chrisman ◽  
Nate T. Stockham ◽  
Peter Washington ◽  
...  

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Takumi Miura ◽  
Satoshi Yasuda ◽  
Yoji Sato

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.


Sign in / Sign up

Export Citation Format

Share Document