scholarly journals VikNGS: A C ++ Variant Integration Kit for Next Generation Sequencing Association Analysis

Author(s):  
Zeynep Baskurt ◽  
Scott Mastromatteo ◽  
Jiafen Gong ◽  
Richard F Wintle ◽  
Stephen W Scherer ◽  
...  

Abstract Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining data sets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several data sets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. Availability The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Zeynep Baskurt ◽  
Scott Mastromatteo ◽  
Jiafen Gong ◽  
Richard F. Wintle ◽  
Stephen W. Scherer ◽  
...  

AbstractMotivationIntegration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. Unfortunately, if differential genotype uncertainty across studies is not accounted for, combining data sets can also produce spurious association results. The robust variance score statistic (RVS) for genetic association of rare and common variants has been shown to effectively adjust for bias caused by the differences in read depth in case-control genetic association studies when the two groups were sequenced using different experimental designs. To enable consortium research, the aggregation of several data sets for genetic association analysis of quantitative and binary traits with covariate adjustment is required, and we developed the Variant Integration Kit for NGS (VikNGS) that expands the functionality of RVS (vRVS) for this purpose.ResultsVikNGS is a fast and computationally efficient cross-platform software package that provides an implementation for vRVS, as well as conventional rare and common variant genotype-based association analysis approaches. The package includes a graphical user interface that contains power simulation functionality and data visualization tools.Availability and ImplementationThe VikNGS package can be downloaded at http://www.tcag.ca/tools/index.htmlDocumentation can be found at https://VikNGSdocs.readthedocs.io/en/latest/[email protected] informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Tamsen Dunn ◽  
Gwenn Berry ◽  
Dorothea Emig-Agius ◽  
Yu Jiang ◽  
Serena Lei ◽  
...  

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.


2017 ◽  
Author(s):  
Sungsoo Park ◽  
Bonggun Shin ◽  
Yoonjung Choi ◽  
Kilsoo Kang ◽  
Keunsoo Kang

AbstractMotivationNext-generation sequencing (NGS), which allows the simultaneous sequencing of billions of DNA fragments simultaneously, has revolutionized how we study genomics and molecular biology by generating genome-wide molecular maps of molecules of interest. For example, an NGS-based transcriptomic assay called RNA-seq can be used to estimate the abundance of approximately 190,000 transcripts together. As the cost of next-generation sequencing sharply declines, researchers in many fields have been conducting research using NGS. The amount of information produced by NGS has made it difficult for researchers to choose the optimal set of target genes (or genomic loci).ResultsWe have sought to resolve this issue by developing a neural network-based feature (gene) selection algorithm called Wx. The Wx algorithm ranks genes based on the discriminative index (DI) score that represents the classification power for distinguishing given groups. With a gene list ranked by DI score, researchers can institutively select the optimal set of genes from the highest-ranking ones. We applied the Wx algorithm to a TCGA pan-cancer gene-expression cohort to identify an optimal set of gene-expression biomarker (universal gene-expression biomarkers) candidates that can distinguish cancer samples from normal samples for 12 different types of cancer. The 14 gene-expression biomarker candidates identified by Wx were comparable to or outperformed previously reported universal gene expression biomarkers, highlighting the usefulness of the Wx algorithm for next-generation sequencing data. Thus, we anticipate that the Wx algorithm can complement current state-of-the-art analytical applications for the identification of biomarker candidates as an alternative method.Availabilityhttps://github.com/deargen/[email protected] informationSupplementary data are available at online.


2016 ◽  
Vol 79 (4) ◽  
pp. 574-581 ◽  
Author(s):  
TRENNA BLAGDEN ◽  
WILLIAM SCHNEIDER ◽  
ULRICH MELCHER ◽  
JON DANIELS ◽  
JACQUELINE FLETCHER

ABSTRACT The Centers for Disease Control and Prevention recently emphasized the need for enhanced technologies to use in investigations of outbreaks of foodborne illnesses. To address this need, e-probe diagnostic nucleic acid analysis (EDNA) was adapted and validated as a tool for the rapid, effective identification and characterization of multiple pathogens in a food matrix. In EDNA, unassembled next generation sequencing data sets from food sample metagenomes are queried using pathogen-specific sequences known as electronic probes (e-probes). In this study, the query of mock sequence databases demonstrated the potential of EDNA for the detection of foodborne pathogens. The method was then validated using next generation sequencing data sets created by sequencing the metagenome of alfalfa sprouts inoculated with Escherichia coli O157:H7. Nonspecific hits in the negative control sample indicated the need for additional filtration of the e-probes to enhance specificity. There was no significant difference in the ability of an e-probe to detect the target pathogen based upon the length of the probe set oligonucleotides. The results from the queries of the sample database using E. coli e-probe sets were significantly different from those obtained using random decoy probe sets and exhibited 100% precision. The results support the use of EDNA as a rapid response methodology in foodborne outbreaks and investigations for establishing comprehensive microbial profiles of complex food samples.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S12) ◽  
Author(s):  
Maximillian Westphal ◽  
David Frankhouser ◽  
Carmine Sonzone ◽  
Peter G. Shields ◽  
Pearlly Yan ◽  
...  

Abstract Background Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not. Methods We select about six thousand SNPs in the human genome and develop a Bayesian framework that is able to robustly identify sample matches between next generation sequencing data sets. Results We validate our approach on a variety of data sets. Most importantly, we show that our approach can establish identity between different omics data types such as Exome, RNA-Seq, and MethylCap-Seq. We demonstrate how identity detection degrades with sample quality and read coverage, but show that twenty million reads of a fairly low quality RNA-Seq sample are still sufficient for reliable sample identification. Conclusion Our tool, SMASH, is able to identify sample mismatches in next generation sequencing data sets between different sequencing modalities and for low quality sequencing data.


2011 ◽  
Vol 40 (D1) ◽  
pp. D720-D728 ◽  
Author(s):  
J. Martin ◽  
S. Abubucker ◽  
E. Heizer ◽  
C. M. Taylor ◽  
M. Mitreva

2018 ◽  
Vol 35 (14) ◽  
pp. 2521-2522 ◽  
Author(s):  
Zheng Kuang ◽  
Ying Wang ◽  
Lei Li ◽  
Xiaozeng Yang

Abstract Motivation Two major challenges arise when employing next-generation sequencing methods to comprehensively identify microRNAs (miRNAs) in plants: (i) how to minimize the false-positive inheritable to computational predictions and (ii) how to minimize the computational time required for analyzing the miRNA transcriptome in plants with complex and large genomes. Results We updated miRDeep-P to miRDeep-P2 (miRDP2) by employing a new filtering strategy and overhauling the algorithm. miRDP2 has been tested against miRNA transcriptomes in plants with increasing genome sizes that included Arabidopsis, rice, tomato, maize and wheat. Compared with miRDeep-P and several other computational tools, miRDP2 processes next-generation sequencing data with superior speed. By incorporating newly updated plant miRNA annotation criteria and developing a new scoring system, the accuracy of miRDP2 outperformed other programs. Taken together, our results demonstrate miRDP2 as a fast and accurate tool for analyzing the miRNA transcriptome in plants. Availability and implementation The miRDP2 are freely available from https://sourceforge.net/projects/mirdp2/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document