scholarly journals Calling Somatic SNVs and Indels with Mutect2

2019 ◽  
Author(s):  
David Benjamin ◽  
Takuto Sato ◽  
Kristian Cibulskis ◽  
Gad Getz ◽  
Chip Stewart ◽  
...  

AbstractMutect2 is a somatic variant caller that uses local assembly and realignment to detect SNVs and indels. Assembly implies whole haplotypes and read pairs, rather than single bases, as the atomic units of biological variation and sequencing evidence, improving variant calling. Beyond local assembly and alignment, Mutect2 is based on several probabilistic models for genotyping and filtering that work well with and without a matched normal sample and for all sequencing depths.


2017 ◽  
Author(s):  
Giuseppe Narzisi ◽  
André Corvelo ◽  
Kanika Arora ◽  
Ewa A. Bergmann ◽  
Minita Shah ◽  
...  

Reliable detection of somatic variations is of critical importance in cancer research. Lancet is an accurate and sensitive somatic variant caller which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored DeBruijn graphs. Extensive experimental comparison on synthetic and real whole-genome sequencing datasets demonstrates that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system which is essential for variant prioritization and detects low frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet.



Author(s):  
S Hollizeck ◽  
S Q Wong ◽  
B Solomon ◽  
D Chandranada ◽  
S-J Dawson

Abstract Summary This work describes two novel workflows for variant calling that extend the widely used algorithms of Strelka2 and FreeBayes to call somatic mutations from multiple related tumour samples and one matched normal sample. We show that these workflows offer higher precision and recall than their single tumour-normal pair equivalents in both simulated and clinical sequencing data. Availability and Implementation Source code freely available at the following link: https://atlassian.petermac.org.au/bitbucket/projects/DAW/repos/multisamplevariantcalling and executable through Janis (https://github.com/PMCC-BioinformaticsCore/janis) under the GPLv3 licence. Supplementary information Supplementary data are available at Bioinformatics online.



2018 ◽  
Author(s):  
Rebecca F. Halperin ◽  
Winnie S. Liang ◽  
Sidharth Kulkarni ◽  
Erica E. Tassone ◽  
Jonathan Adkins ◽  
...  

AbstractArchival tumor samples represent a potential rich resource of annotated specimens for translational genomics research. However, standard variant calling approaches require a matched normal sample from the same individual, which is often not available in the retrospective setting, making it difficult to distinguish between true somatic variants and germline variants that are private to the individual. Archival sections often contain adjacent normal tissue, but this normal tissue can include infiltrating tumor cells. Comparative somatic variant callers are designed to exclude variants present in the normal sample, so a novel approach is required to leverage sequencing of adjacent normal tissue for somatic variant calling. Here we present LumosVar 2.0, a software package designed to jointly analyze multiple samples from the same patient. The approach is based on the concept that the allelic fraction of somatic variants, but not germline variants, would be reduced in samples with low tumor content. LumosVar 2.0 estimates allele specific copy number and tumor sample fractions from the data, and uses the model to determine expected allelic fractions for somatic and germline variants and classify variants accordingly. To evaluate using LumosVar 2.0 to jointly call somatic variants with tumor and adjacent normal samples, we used a glioblastoma dataset with matched high tumor content, low tumor content, and germline exome sequencing data (to define true somatic variants) available for each patient. We show that both sensitivity and positive predictive value are improved by analyzing the high tumor and low tumor samples jointly compared to analyzing the samples individually or compared to in-silico pooling of the two samples. Finally, we applied this approach to a set of breast and prostate archival tumor samples for which normal samples were not available for germline sequencing, but tumor blocks containing adjacent normal tissue were available for sequencing. Joint analysis using LumosVar 2.0 detected several variants, including known cancer hotspot mutations that were not detected by standard somatic variant calling tools using the adjacent normal as a reference. Together, these results demonstrate the potential utility of leveraging paired tissue samples to improve somatic variant calling when a constitutional DNA sample is not available.



GigaScience ◽  
2022 ◽  
Vol 11 (1) ◽  
Author(s):  
Dries Decap ◽  
Louise de Schaetzen van Brienen ◽  
Maarten Larmuseau ◽  
Pascal Costanza ◽  
Charlotte Herzeel ◽  
...  

Abstract Background The accurate detection of somatic variants from sequencing data is of key importance for cancer treatment and research. Somatic variant calling requires a high sequencing depth of the tumor sample, especially when the detection of low-frequency variants is also desired. In turn, this leads to large volumes of raw sequencing data to process and hence, large computational requirements. For example, calling the somatic variants according to the GATK best practices guidelines requires days of computing time for a typical whole-genome sequencing sample. Findings We introduce Halvade Somatic, a framework for somatic variant calling from DNA sequencing data that takes advantage of multi-node and/or multi-core compute platforms to reduce runtime. It relies on Apache Spark to provide scalable I/O and to create and manage data streams that are processed on different CPU cores in parallel. Halvade Somatic contains all required steps to process the tumor and matched normal sample according to the GATK best practices recommendations: read alignment (BWA), sorting of reads, preprocessing steps such as marking duplicate reads and base quality score recalibration (GATK), and, finally, calling the somatic variants (Mutect2). Our approach reduces the runtime on a single 36-core node to 19.5 h compared to a runtime of 84.5 h for the original pipeline, a speedup of 4.3 times. Runtime can be further decreased by scaling to multiple nodes, e.g., we observe a runtime of 1.36 h using 16 nodes, an additional speedup of 14.4 times. Halvade Somatic supports variant calling from both whole-genome sequencing and whole-exome sequencing data and also supports Strelka2 as an alternative or complementary variant calling tool. We provide a Docker image to facilitate single-node deployment. Halvade Somatic can be executed on a variety of compute platforms, including Amazon EC2 and Google Cloud. Conclusions To our knowledge, Halvade Somatic is the first somatic variant calling pipeline that leverages Big Data processing platforms and provides reliable, scalable performance. Source code is freely available.



2018 ◽  
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Haplotype-based variant callers, which consider physical linkage between variant sites, are currently among the best tools for germline variation discovery and genotyping from short-read sequencing data. However, almost all such tools were designed specifically for detecting common germline variation in diploid populations, and give sub-optimal results in other scenarios. Here we present Octopus, a versatile haplotype-based variant caller that uses a polymorphic Bayesian genotyping model capable of modeling sequencing data from a range of experimental designs within a unified haplotype-aware framework. We show that Octopus accurately calls de novo mutations in parent-offspring trios and germline variants in individuals, including SNVs, indels, and small complex replacements such as microinversions. In addition, using a carefully designed synthetic-tumour data set derived from clean sequencing data from a sample with known germline haplotypes, and observed mutations in large cohort of tumour samples, we show that Octopus accurately characterizes germline and somatic variation in tumours, both with and without a paired normal sample. Sequencing reads and prior information are combined to phase called genotypes of arbitrary ploidy, including those with somatic mutations. Octopus also outputs realigned evidence BAMs to aid validation and interpretation.



2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Mingyi Wang ◽  
Wen Luo ◽  
Kristine Jones ◽  
Xiaopeng Bian ◽  
Russell Williams ◽  
...  


Author(s):  
Rajeeva Musunuri ◽  
Kanika Arora ◽  
André Corvelo ◽  
Minita Shah ◽  
Jennifer Shelton ◽  
...  

Abstract Summary We present a new version of the popular somatic variant caller, Lancet, that supports the analysis of linked-reads sequencing data. By seamlessly integrating barcodes and haplotype read assignments within the colored De Bruijn graph local-assembly framework, Lancet computes a barcode-aware coverage and identifies variants that disagree with the local haplotype structure. Availability and implementation Lancet is implemented in C++ and available for academic and non-commercial research purposes as an open-source package at https://github.com/nygenome/lancet. Supplementary information Supplementary data are available at Bioinformatics online.



2020 ◽  
Author(s):  
Nicholas Phillips ◽  
Patrick Jongeneel ◽  
John West ◽  
Richard Chen ◽  
Jason Harris


2017 ◽  
Author(s):  
Jeremiah Wala ◽  
Pratiti Bandopadhayay ◽  
Noah Greenwald ◽  
Ryan O’Rourke ◽  
Ted Sharpe ◽  
...  

AbstractStructural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at-scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA’s performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improved detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types, and found that templated-sequence insertions occur in ~4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized SVs.



Sign in / Sign up

Export Citation Format

Share Document