Next-generation Sequence-analysis Toolkit (NeST): A standardized bioinformatics framework for analyzing Single Nucleotide Polymorphisms in next-generation sequencing data

AbstractRapid advancements in next-generation sequencing (NGS) technologies have led to the development of numerous bioinformatics tools and pipelines. As these tools vary in their output function and complexity and some are not well-standardized, it is harder to choose a suitable pipeline to identify variants in NGS data. Here, we present NeST (NGS-analysis Toolkit), a modular consensus-based variant calling framework. NeST uses a combination of variant callers to overcome potential biases of an individual method used alone. NeST consists of four modules, that integrate open-source bioinformatics tools, a custom Variant Calling Format (VCF) parser and a summarization utility, that generate high-quality consensus variant calls. NeST was validated using targeted-amplicon deep sequencing data from 245 Plasmodium falciparum isolates to identify single-nucleotide polymorphisms conferring drug resistance. The results were verified using Sanger sequencing data for the same dataset in a supporting publication [28]. NeST offers a user-friendly pipeline for variant calling with standardized outputs and minimal computational demands for easy deployment for use with various organisms and applications.

Download Full-text

A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btt172 ◽

2013 ◽

Vol 29 (11) ◽

pp. 1361-1366 ◽

Cited By ~ 26

Author(s):

B. D. O'Fallon ◽

W. Wooderchak-Donahue ◽

D. K. Crockett

Keyword(s):

Support Vector Machine ◽

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Next Generation Sequencing Data ◽

Support Vector ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Sequencing Data ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data

Computational and Structural Biotechnology Journal ◽

10.1016/j.csbj.2018.01.003 ◽

2018 ◽

Vol 16 ◽

pp. 15-24 ◽

Cited By ~ 96

Author(s):

Chang Xu

Keyword(s):

Next Generation Sequencing ◽

Variant Calling ◽

Single Nucleotide Variant ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

Quantification of fetal DNA in the plasma of pregnant women using next generation sequencing of frequent single nucleotide polymorphisms

Bulletin of Russian State Medical University ◽

10.24075/brsmu.2018.031 ◽

2018 ◽

pp. 29-33

Author(s):

J. Shubina ◽

◽

T Jankevic ◽

A. Yu. Goltsov ◽

I. S. Mukosey ◽

...

Keyword(s):

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Pregnant Women ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Single Nucleotide ◽

Fetal Dna ◽

Generation Sequencing

Download Full-text

Simultaneous human platelet antigen genotyping and detection of novel single nucleotide polymorphisms by targeted next-generation sequencing

Transfusion ◽

10.1111/trf.14092 ◽

2017 ◽

Vol 57 (6) ◽

pp. 1497-1504 ◽

Cited By ~ 4

Author(s):

Sue Davey ◽

Cristina Navarrete ◽

Colin Brown

Keyword(s):

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Human Platelet ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Single Nucleotide ◽

Targeted Next Generation Sequencing ◽

Platelet Antigen ◽

Human Platelet Antigen ◽

Generation Sequencing

Download Full-text

Lacer: accurate base quality score recalibration for improving variant calling from next-generation sequencing data in any organism

10.1101/130732 ◽

2017 ◽

Author(s):

Jade C.S. Chung ◽

Swaine L. Chen

Keyword(s):

Next Generation Sequencing ◽

Variant Calling ◽

Quality Score ◽

Identification Accuracy ◽

Next Generation Sequencing Data ◽

Sequencing Error ◽

Next Generation ◽

Sequencing Data ◽

Base Quality Score ◽

Generation Sequencing

AbstractNext-generation sequencing data is accompanied by quality scores that quantify sequencing error. Inaccuracies in these quality scores propagate through all subsequent analyses; thus base quality score recalibration is a standard step in many next-generation sequencing workflows, resulting in improved variant calls. Current base quality score recalibration algorithms rely on the assumption that sequencing errors are already known; for human resequencing data, relatively complete variant databases facilitate this. However, because existing databases are still incomplete, recalibration is still inaccurate; and most organisms do not have variant databases, exacerbating inaccuracy for non-human data. To overcome these logical and practical problems, we introduce Lacer, which recalibrates base quality scores without assuming knowledge of correct and incorrect bases and without requiring knowledge of common variants. Lacer is the first logically sound, fully general, and truly accurate base recalibrator. Lacer enhances variant identification accuracy for resequencing data of human as well as other organisms (which are not accessible to current recalibrators), simultaneously improving and extending the benefits of base quality score recalibration to nearly all ongoing sequencing projects. Lacer is available at: https://github.com/swainechen/lacer.

Download Full-text

Pisces: An Accurate and Versatile Variant Caller for Somatic and Germline Next-Generation Sequencing Data

10.1101/291641 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tamsen Dunn ◽

Gwenn Berry ◽

Dorothea Emig-Agius ◽

Yu Jiang ◽

Serena Lei ◽

...

Keyword(s):

Next Generation Sequencing ◽

Gene Mutations ◽

Variant Calling ◽

Amplicon Sequencing ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ras Gene ◽

Generation Sequencing

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.

Download Full-text

NGSphy: phylogenomic simulation of next-generation sequencing data

10.1101/197715 ◽

2017 ◽

Author(s):

Merly Escalona ◽

Sara Rocha ◽

David Posada

Keyword(s):

Next Generation Sequencing ◽

Variant Calling ◽

Gene Families ◽

Common Species ◽

Next Generation Sequencing Data ◽

Phylogenomic Analysis ◽

Next Generation ◽

Sequencing Data ◽

Sequencing Technologies ◽

Generation Sequencing

AbstractMotivationAdvances in sequencing technologies have made it feasible to obtain massive datasets for phylogenomic inference, often consisting of large numbers of loci from multiple species and individuals. The phylogenomic analysis of next-generation sequencing (NGS) data implies a complex computational pipeline where multiple technical and methodological decisions are necessary that can influence the final tree obtained, like those related to coverage, assembly, mapping, variant calling and/or phasing.ResultsTo assess the influence of these variables we introduce NGSphy, an open-source tool for the simulation of Illumina reads/read counts obtained from haploid/diploid individual genomes with thousands of independent gene families evolving under a common species tree. In order to resemble real NGS experiments, NGSphy includes multiple options to model sequencing coverage (depth) heterogeneity across species, individuals and loci, including off-target or uncaptured loci. For comprehensive simulations covering multiple evolutionary scenarios, parameter values for the different replicates can be sampled from user-defined statistical distributions.AvailabilitySource code, full documentation and tutorials including a quick start guide are available at http://github.com/merlyescalona/[email protected]. [email protected]

Download Full-text