scholarly journals SMuRF: Portable and accurate ensemble-based somatic variant calling

2018 ◽  
Author(s):  
Weitai Huang ◽  
Yu Amanda Guo ◽  
Karthik Muthukumar ◽  
Probhonjon Baruah ◽  
Meimei Chang ◽  
...  

ABSTARCTSummarySMuRF is an ensemble method for prediction of somatic point mutations (SNVs) and small insertions/deletions (indels) in cancer genomes. The method integrates predictions and auxiliary features from different somatic mutation callers using a Random Forest machine learning approach. SMuRF is trained on community-curated tumor whole genome sequencing data, is robust across cancer types, and achieves improved accuracy for both SNV and indel predictions of genome and exome-level data. The software is user-friendly and portable by design, operating as an add-on to the community-developed bcbio-nextgen somatic variant calling [email protected]

2019 ◽  
Vol 35 (17) ◽  
pp. 3157-3159 ◽  
Author(s):  
Weitai Huang ◽  
Yu Amanda Guo ◽  
Karthik Muthukumar ◽  
Probhonjon Baruah ◽  
Mei Mei Chang ◽  
...  

Abstract Summary Somatic Mutation calling method using a Random Forest (SMuRF) integrates predictions and auxiliary features from multiple somatic mutation callers using a supervised machine learning approach. SMuRF is trained on community-curated matched tumor and normal whole genome sequencing data. SMuRF predicts both SNVs and indels with high accuracy in genome or exome-level sequencing data. Furthermore, the method is robust across multiple tested cancer types and predicts low allele frequency variants with high accuracy. In contrast to existing ensemble-based somatic mutation calling approaches, SMuRF works out-of-the-box and is orders of magnitudes faster. Availability and implementation The method is implemented in R and available at https://github.com/skandlab/SMuRF. SMuRF operates as an add-on to the community-developed bcbio-nextgen somatic variant calling pipeline. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Tingting Gong ◽  
Vanessa M Hayes ◽  
Eva KF Chan

AbstractSomatic structural variants are an important contributor to cancer development and evolution. Accurate detection of these complex variants from whole genome sequencing data is influenced by a multitude of parameters. However, there are currently no tools for guiding study design nor are there applications that could predict the performance of somatic structural variant detection. To address this gap, we developed Shiny-SoSV, a user-friendly web-based calculator for determining the impact of common variables on the sensitivity and precision of somatic structural variant detection, including choice of variant detection tool, sequencing depth of coverage, variant allele fraction, and variant breakpoint resolution. Using simulation studies, we determined singular and combinatoric effects of these variables, modelled the results using a generalised additive model, allowing structural variant detection performance to be predicted for any combination of predictors. Shiny-SoSV provides an interactive and visual platform for users to easily compare individual and combined impact of different parameters. It predicts the performance of a proposed study design, on somatic structural variant detection, prior to the commencement of benchwork. Shiny-SoSV is freely available at https://hcpcg.shinyapps.io/Shiny-SoSV with accompanying user’s guide and example use-cases.


2020 ◽  
Vol 10 (9) ◽  
pp. 3009-3014 ◽  
Author(s):  
Mitchell A Ellison ◽  
Jennifer L Walker ◽  
Patrick J Ropp ◽  
Jacob D Durrant ◽  
Karen M Arndt

Abstract MutantHuntWGS is a user-friendly pipeline for analyzing Saccharomyces cerevisiae whole-genome sequencing data. It uses available open-source programs to: (1) perform sequence alignments for paired and single-end reads, (2) call variants, and (3) predict variant effect and severity. MutantHuntWGS outputs a shortlist of variants while also enabling access to all intermediate files. To demonstrate its utility, we use MutantHuntWGS to assess multiple published datasets; in all cases, it detects the same causal variants reported in the literature. To encourage broad adoption and promote reproducibility, we distribute a containerized version of the MutantHuntWGS pipeline that allows users to install and analyze data with only two commands. The MutantHuntWGS software and documentation can be downloaded free of charge from https://github.com/mae92/MutantHuntWGS.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Marwan A. Hawari ◽  
Celine S. Hong ◽  
Leslie G. Biesecker

Abstract Background Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance. Results SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results. Conclusions SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim.


2020 ◽  
Author(s):  
Mitchell A. Ellison ◽  
Jennifer L. Walker ◽  
Patrick J. Ropp ◽  
Jacob D. Durrant ◽  
Karen M. Arndt

ABSTRACTMutantHuntWGS is a user-friendly pipeline for analyzing Saccharomyces cerevisiae whole-genome sequencing data. It uses available open-source programs to: (1) perform sequence alignments for paired and single-end reads, (2) call variants, and (3) predict variant effect and severity. MutantHuntWGS outputs a shortlist of variants while also enabling access to all intermediate files. To demonstrate its utility, we use MutantHuntWGS to assess multiple published datasets; in all cases, it detects the same causal variants reported in the literature. To encourage broad adoption and promote reproducibility, we distribute a containerized version of the MutantHuntWGS pipeline that allows users to install and analyze data with only two commands. The MutantHuntWGS software and documentation can be downloaded free of charge from https://github.com/mae92/MutantHuntWGS.


2018 ◽  
Author(s):  
Chang Xu ◽  
Xiujing Gu ◽  
Raghavendra Padmanabhan ◽  
Zhong Wu ◽  
Quan Peng ◽  
...  

AbstractMotivationLow-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end-repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling.ResultsWe developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit at 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data.AvailabilityThe entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license.


2018 ◽  
Author(s):  
Erica K. Barnell ◽  
Peter Ronning ◽  
Katie M. Campbell ◽  
Kilannin Krysiak ◽  
Benjamin J. Ainscough ◽  
...  

AbstractPurposeManual review of aligned sequencing reads is required to develop a high-quality list of somatic variants from massively parallel sequencing data (MPS). Despite widespread use in analyzing MPS data, there has been little attempt to describe methods for manual review, resulting in high inter- and intra-lab variability in somatic variant detection and characterization of tumors.MethodsOpen source software was used to develop an optimal method for manual review setup. We also developed a systemic approach to visually inspect each variant during manual review.ResultsWe present a standard operating procedures for somatic variant refinement for use by manual reviewers. The approach is enhanced through representative examples of 4 different manual review categories that indicate a reviewer’s confidence in the somatic variant call and 19 annotation tags that contextualize commonly observed sequencing patterns during manual review. Representative examples provide detailed instructions on how to classify variants during manual review to rectify lack of confidence in automated somatic variant detection.ConclusionStandardization of somatic variant refinement through systematization of manual review will improve the consistency and reproducibility of identifying true somatic variants after automated variant calling.


2018 ◽  
Author(s):  
Isidro Cortés-Ciriano ◽  
June-Koo Lee ◽  
Ruibin Xi ◽  
Dhawal Jain ◽  
Youngsook L. Jung ◽  
...  

SummaryChromothripsis is a newly discovered mutational phenomenon involving massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in cancer suggest that chromothripsis may be far more common than initially inferred from low resolution DNA copy number data. Here, we analyze the patterns of chromothripsis across 2,658 tumors spanning 39 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of >50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy number states, a considerable fraction of the events involves multiple chromosomes as well as additional structural alterations. In addition to non-homologous end-joining, we detect signatures of replicative processes and templated insertions. Chromothripsis contributes to oncogene amplification as well as to inactivation of genes such as mismatch-repair related genes. These findings show that chromothripsis is a major process driving genome evolution in human cancer.


2021 ◽  
Author(s):  
Hanna Sigeman ◽  
Bella Sinclair ◽  
Bengt Hansson

Sex chromosomes have evolved numerous times, as revealed by recent genomic studies. However, large gaps in our knowledge of sex chromosome diversity across the tree of life remain. Filling these gaps, through the study of novel species, is crucial for improved understanding of why and how sex chromosomes evolve. Characterization of sex chromosomes in already well-studied organisms is also important to avoid misinterpretations of population genomic patterns caused by undetected sex chromosome variation. Here we present findZX, an automated Snakemake-based computational pipeline for detecting and visualizing sex chromosomes through differences in genome coverage and heterozygosity between males and females. FindZX is user-friendly and scalable to suit different computational platforms and works with any number of male and female samples. An option to perform a genome coordinate lift-over to a reference genome of another species allows users to inspect sex- linked regions over larger contiguous chromosome regions, while also providing important between- species synteny information. To demonstrate its effectiveness, we applied findZX to publicly available genomic data from species belonging to widely different taxonomic groups (mammals, birds, reptiles, fish, and insects), with sex chromosome systems of different ages, sizes, and levels of differentiation. We also demonstrate that the lift-over method is robust over large phylogenetic distances (>80 million years of evolution).


2018 ◽  
Author(s):  
Ke Yuan ◽  
Geoff Macintyre ◽  
Wei Liu ◽  
Florian Markowetz ◽  

AbstractEstimating and clustering cancer cell fractions of genomic alterations are central tasks for studying intratumour heterogeneity. We present Ccube, a probabilistic framework for inferring the cancer cell fraction of somatic point mutations and the subclonal composition from whole-genome sequencing data. We develop a variational inference method for model fitting, which allows us to handle samples with large number of the variants (more than 2 million) while quantifying uncertainty in a Bayesian fashion. Ccube is available at https://github.com/keyuan/ccube.


Sign in / Sign up

Export Citation Format

Share Document