scholarly journals Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs

2017 ◽  
Author(s):  
Giuseppe Narzisi ◽  
André Corvelo ◽  
Kanika Arora ◽  
Ewa A. Bergmann ◽  
Minita Shah ◽  
...  

Reliable detection of somatic variations is of critical importance in cancer research. Lancet is an accurate and sensitive somatic variant caller which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored DeBruijn graphs. Extensive experimental comparison on synthetic and real whole-genome sequencing datasets demonstrates that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system which is essential for variant prioritization and detects low frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet.


2017 ◽  
Author(s):  
Jeremiah Wala ◽  
Pratiti Bandopadhayay ◽  
Noah Greenwald ◽  
Ryan O’Rourke ◽  
Ted Sharpe ◽  
...  

AbstractStructural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at-scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA’s performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improved detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types, and found that templated-sequence insertions occur in ~4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized SVs.



2016 ◽  
Author(s):  
Jeran Stratford ◽  
Gunjan Hariani ◽  
Jeff Jasper ◽  
Chad Brown ◽  
Wendell Jones ◽  
...  


2018 ◽  
Vol 1 (1) ◽  
Author(s):  
Giuseppe Narzisi ◽  
André Corvelo ◽  
Kanika Arora ◽  
Ewa A. Bergmann ◽  
Minita Shah ◽  
...  


2019 ◽  
Author(s):  
David Benjamin ◽  
Takuto Sato ◽  
Kristian Cibulskis ◽  
Gad Getz ◽  
Chip Stewart ◽  
...  

AbstractMutect2 is a somatic variant caller that uses local assembly and realignment to detect SNVs and indels. Assembly implies whole haplotypes and read pairs, rather than single bases, as the atomic units of biological variation and sequencing evidence, improving variant calling. Beyond local assembly and alignment, Mutect2 is based on several probabilistic models for genotyping and filtering that work well with and without a matched normal sample and for all sequencing depths.



Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 507
Author(s):  
Bernd Timo Hermann ◽  
Sebastian Pfeil ◽  
Nicole Groenke ◽  
Samuel Schaible ◽  
Robert Kunze ◽  
...  

Detection of genetic variants in clinically relevant genomic hot-spot regions has become a promising application of next-generation sequencing technology in precision oncology. Effective personalized diagnostics requires the detection of variants with often very low frequencies. This can be achieved by targeted, short-read sequencing that provides high sequencing depths. However, rare genetic variants can contain crucial information for early cancer detection and subsequent treatment success, an inevitable level of background noise usually limits the accuracy of low frequency variant calling assays. To address this challenge, we developed DEEPGENTM, a variant calling assay intended for the detection of low frequency variants within liquid biopsy samples. We processed reference samples with validated mutations of known frequencies (0%–0.5%) to determine DEEPGENTM’s performance and minimal input requirements. Our findings confirm DEEPGENTM’s effectiveness in discriminating between signal and noise down to 0.09% variant allele frequency and an LOD(90) at 0.18%. A superior sensitivity was also confirmed by orthogonal comparison to a commercially available liquid biopsy-based assay for cancer detection.



2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.



2011 ◽  
Vol 89 (6) ◽  
pp. 751-759 ◽  
Author(s):  
Andrew Dauber ◽  
Yongguo Yu ◽  
Michael C. Turchin ◽  
Charleston W. Chiang ◽  
Yan A. Meng ◽  
...  


2019 ◽  
Author(s):  
Xinyue You ◽  
Suresh Thiruppathi ◽  
Weiying Liu ◽  
Yiyi Cao ◽  
Mikihiko Naito ◽  
...  

ABSTRACTTo improve the accuracy and the cost-efficiency of next-generation sequencing in ultralow-frequency mutation detection, we developed the Paired-End and Complementary Consensus Sequencing (PECC-Seq), a PCR-free duplex consensus sequencing approach. PECC-Seq employed shear points as endogenous barcodes to identify consensus sequences from the overlap in the shortened, complementary DNA strands-derived paired-end reads for sequencing error correction. With the high accuracy of PECC-Seq, we identified the characteristic base substitution errors introduced by the end-repair process of mechanical fragmentation-based library preparations, which were prominent at the terminal 6 bp of the library fragments in the 5’-NpCpA-3’ or 5’-NpCpT-3’ trinucleotide context. As demonstrated at the human genome scale (TK6 cells), after removing these potential end-repair artifacts from the terminal 6 bp, PECC-Seq could reduce the sequencing error frequency to mid-10−7 with a relatively low sequencing depth. For TA base pairs, the background error rate could be suppressed to mid-10−8. In mutagen-treated TK6, slight increases in mutagen treatment-related mutant frequencies could be detected, indicating the potential of PECC-Seq in detecting genome-wide ultra-rare mutations. In addition, our finding on the patterns of end-repair artifacts may provide new insights in further reducing technical errors not only for PECC-Seq, but also for other next-generation sequencing techniques.



2019 ◽  
Author(s):  
Olivera Grujic ◽  
Tanya N. Phung ◽  
Soo Bin Kwon ◽  
Adriana Arneson ◽  
Yuju Lee ◽  
...  

AbstractAnnotations of evolutionarily constraint provide important information for variant prioritization. Genome-wide maps of epigenomic marks and transcription factor binding provide complementary information for interpreting a subset of such prioritized variants. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the human genome being in a constrained non-exonic element from over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting constrained non-exonic bases from such data. However, a subset of such bases are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) using conservation state and constrained element annotations that is predictive of those bases. Using human genetic variation, regulatory sequence motifs, mouse epigenomic data, and retrospectively considered additional human data we further characterize the nature of constrained non-exonic bases with low CNEP scores.



Sign in / Sign up

Export Citation Format

Share Document