scholarly journals SimFFPE and FilterFFPE: improving structural variant calling in FFPE samples

GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Lanying Wei ◽  
Martin Dugas ◽  
Sarah Sandmann

Abstract Background Artifact chimeric reads are enriched in next-generation sequencing data generated from formalin-fixed paraffin-embedded (FFPE) samples. Previous work indicated that these reads are characterized by erroneous split-read support that is interpreted as evidence of structural variants. Thus, a large number of false-positive structural variants are detected. To our knowledge, no tool is currently available to specifically call or filter structural variants in FFPE samples. To overcome this gap, we developed 2 R packages: SimFFPE and FilterFFPE. Results SimFFPE is a read simulator, specifically designed for next-generation sequencing data from FFPE samples. A mixture of characteristic artifact chimeric reads, as well as normal reads, is generated. FilterFFPE is a filtration algorithm, removing artifact chimeric reads from sequencing data while keeping real chimeric reads. To evaluate the performance of FilterFFPE, we performed structural variant calling with 3 common tools (Delly, Lumpy, and Manta) with and without prior filtration with FilterFFPE. After applying FilterFFPE, the mean positive predictive value improved from 0.27 to 0.48 in simulated samples and from 0.11 to 0.27 in real samples, while sensitivity remained basically unchanged or even slightly increased. Conclusions FilterFFPE improves the performance of SV calling in FFPE samples. It was validated by analysis of simulated and real data.

2017 ◽  
Author(s):  
Jade C.S. Chung ◽  
Swaine L. Chen

AbstractNext-generation sequencing data is accompanied by quality scores that quantify sequencing error. Inaccuracies in these quality scores propagate through all subsequent analyses; thus base quality score recalibration is a standard step in many next-generation sequencing workflows, resulting in improved variant calls. Current base quality score recalibration algorithms rely on the assumption that sequencing errors are already known; for human resequencing data, relatively complete variant databases facilitate this. However, because existing databases are still incomplete, recalibration is still inaccurate; and most organisms do not have variant databases, exacerbating inaccuracy for non-human data. To overcome these logical and practical problems, we introduce Lacer, which recalibrates base quality scores without assuming knowledge of correct and incorrect bases and without requiring knowledge of common variants. Lacer is the first logically sound, fully general, and truly accurate base recalibrator. Lacer enhances variant identification accuracy for resequencing data of human as well as other organisms (which are not accessible to current recalibrators), simultaneously improving and extending the benefits of base quality score recalibration to nearly all ongoing sequencing projects. Lacer is available at: https://github.com/swainechen/lacer.


2018 ◽  
Author(s):  
Tamsen Dunn ◽  
Gwenn Berry ◽  
Dorothea Emig-Agius ◽  
Yu Jiang ◽  
Serena Lei ◽  
...  

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.


2017 ◽  
Author(s):  
Merly Escalona ◽  
Sara Rocha ◽  
David Posada

AbstractMotivationAdvances in sequencing technologies have made it feasible to obtain massive datasets for phylogenomic inference, often consisting of large numbers of loci from multiple species and individuals. The phylogenomic analysis of next-generation sequencing (NGS) data implies a complex computational pipeline where multiple technical and methodological decisions are necessary that can influence the final tree obtained, like those related to coverage, assembly, mapping, variant calling and/or phasing.ResultsTo assess the influence of these variables we introduce NGSphy, an open-source tool for the simulation of Illumina reads/read counts obtained from haploid/diploid individual genomes with thousands of independent gene families evolving under a common species tree. In order to resemble real NGS experiments, NGSphy includes multiple options to model sequencing coverage (depth) heterogeneity across species, individuals and loci, including off-target or uncaptured loci. For comprehensive simulations covering multiple evolutionary scenarios, parameter values for the different replicates can be sampled from user-defined statistical distributions.AvailabilitySource code, full documentation and tutorials including a quick start guide are available at http://github.com/merlyescalona/[email protected]. [email protected]


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Sarah Sandmann ◽  
Aniek O. de Graaf ◽  
Mohsen Karimi ◽  
Bert A. van der Reijden ◽  
Eva Hellström-Lindberg ◽  
...  

PLoS ONE ◽  
2013 ◽  
Vol 8 (10) ◽  
pp. e75402 ◽  
Author(s):  
Shunichi Kosugi ◽  
Satoshi Natsume ◽  
Kentaro Yoshida ◽  
Daniel MacLean ◽  
Liliana Cano ◽  
...  

2019 ◽  
Author(s):  
Tingting Gong ◽  
Vanessa M Hayes ◽  
Eva KF Chan

AbstractSomatic structural variants (SVs) play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of eight commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the eight SV callers examined in this paper. As the importance of large structural variants become increasingly recognised in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection and guidance on selecting an appropriate SV caller.


2019 ◽  
Vol 2 (4) ◽  
pp. e201900336 ◽  
Author(s):  
Eric Olivier Audemard ◽  
Patrick Gendron ◽  
Albert Feghaly ◽  
Vincent-Philippe Lavallée ◽  
Josée Hébert ◽  
...  

Mutations identified in acute myeloid leukemia patients are useful for prognosis and for selecting targeted therapies. Detection of such mutations using next-generation sequencing data requires a computationally intensive read mapping step followed by several variant calling methods. Targeted mutation identification drastically shifts the usual tradeoff between accuracy and performance by concentrating all computations over a small portion of sequence space. Here, we present km, an efficient approach leveraging k-mer decomposition of reads to identify targeted mutations. Our approach is versatile, as it can detect single-base mutations, several types of insertions and deletions, as well as fusions. We used two independent cohorts (The Cancer Genome Atlas and Leucegene) to show that mutation detection by km is fast, accurate, and mainly limited by sequencing depth. Therefore, km allows the establishment of fast diagnostics from next-generation sequencing data and could be suitable for clinical applications.


2021 ◽  
Author(s):  
King Wai Lau ◽  
Michelle Kleeman ◽  
Caroline Reuter ◽  
Attila Lorincz

AbstractSummaryExtremely large datasets are impossible or very difficult for humans to comprehend by standard mental approaches. Intuitive visualization of genetic variants in genomic sequencing data could help in the review and confirmation process of variants called by automated variant calling programs. To help facilitate interpretation of genetic variant next-generation sequencing (NGS) data we developed VisVariant, a customizable visualization tool that creates a figure showing the overlapping sequence information of thousands of individual reads including the variant and flanking regions.Availability and implementationDetailed information on how to download, install and run VisVariant together with an example is available on our github website [https://github.com/hugging-biorxiv/visvariant].


Sign in / Sign up

Export Citation Format

Share Document