scholarly journals annotatr: Genomic regions in context

2016 ◽  
Author(s):  
Raymond G. Cavalcante ◽  
Maureen A. Sartor

AbstractMotivation:Analysis of next-generation sequencing data often results in a list of genomic regions. These may include differentially methylated CpGs/regions, transcription factor binding sites, interacting chromatin regions, or GWAS-associated SNPs, among others. A common analysis step is to annotate such genomic regions to genomic annotations (promoters, exons, enhancers, etc.). Existing tools are limited by a lack of annotation sources and flexible options, the time it takes to annotate regions, an artificial one-to-one region-to-annotation mapping, a lack of visualization options to easily summarize data, or some combination thereof.Results:We developed the annotatr Bioconductor package to flexibly and quickly summarize and plot annotations of genomic regions. The annotatr package reports all intersections of regions and annotations, giving a better understanding of the genomic context of the regions. A variety of graphics functions are implemented to easily plot numerical or categorical data associated with the regions across the annotations, and across annotation intersections, providing insight into how characteristics of the regions differ across the annotations. We demonstrate that annotatr is up to 27x faster than comparable R packages. Overall, annotatr enables a richer biological interpretation of experiments.Availability:http://bioconductor.org/packages/annotatr/Contact:[email protected] information:Supplementary data are available at Bioinformatics online.

Author(s):  
Zeynep Baskurt ◽  
Scott Mastromatteo ◽  
Jiafen Gong ◽  
Richard F Wintle ◽  
Stephen W Scherer ◽  
...  

Abstract Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining data sets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several data sets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. Availability The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Tamsen Dunn ◽  
Gwenn Berry ◽  
Dorothea Emig-Agius ◽  
Yu Jiang ◽  
Serena Lei ◽  
...  

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.


2017 ◽  
Author(s):  
Sungsoo Park ◽  
Bonggun Shin ◽  
Yoonjung Choi ◽  
Kilsoo Kang ◽  
Keunsoo Kang

AbstractMotivationNext-generation sequencing (NGS), which allows the simultaneous sequencing of billions of DNA fragments simultaneously, has revolutionized how we study genomics and molecular biology by generating genome-wide molecular maps of molecules of interest. For example, an NGS-based transcriptomic assay called RNA-seq can be used to estimate the abundance of approximately 190,000 transcripts together. As the cost of next-generation sequencing sharply declines, researchers in many fields have been conducting research using NGS. The amount of information produced by NGS has made it difficult for researchers to choose the optimal set of target genes (or genomic loci).ResultsWe have sought to resolve this issue by developing a neural network-based feature (gene) selection algorithm called Wx. The Wx algorithm ranks genes based on the discriminative index (DI) score that represents the classification power for distinguishing given groups. With a gene list ranked by DI score, researchers can institutively select the optimal set of genes from the highest-ranking ones. We applied the Wx algorithm to a TCGA pan-cancer gene-expression cohort to identify an optimal set of gene-expression biomarker (universal gene-expression biomarkers) candidates that can distinguish cancer samples from normal samples for 12 different types of cancer. The 14 gene-expression biomarker candidates identified by Wx were comparable to or outperformed previously reported universal gene expression biomarkers, highlighting the usefulness of the Wx algorithm for next-generation sequencing data. Thus, we anticipate that the Wx algorithm can complement current state-of-the-art analytical applications for the identification of biomarker candidates as an alternative method.Availabilityhttps://github.com/deargen/[email protected] informationSupplementary data are available at online.


2018 ◽  
Vol 35 (14) ◽  
pp. 2521-2522 ◽  
Author(s):  
Zheng Kuang ◽  
Ying Wang ◽  
Lei Li ◽  
Xiaozeng Yang

Abstract Motivation Two major challenges arise when employing next-generation sequencing methods to comprehensively identify microRNAs (miRNAs) in plants: (i) how to minimize the false-positive inheritable to computational predictions and (ii) how to minimize the computational time required for analyzing the miRNA transcriptome in plants with complex and large genomes. Results We updated miRDeep-P to miRDeep-P2 (miRDP2) by employing a new filtering strategy and overhauling the algorithm. miRDP2 has been tested against miRNA transcriptomes in plants with increasing genome sizes that included Arabidopsis, rice, tomato, maize and wheat. Compared with miRDeep-P and several other computational tools, miRDP2 processes next-generation sequencing data with superior speed. By incorporating newly updated plant miRNA annotation criteria and developing a new scoring system, the accuracy of miRDP2 outperformed other programs. Taken together, our results demonstrate miRDP2 as a fast and accurate tool for analyzing the miRNA transcriptome in plants. Availability and implementation The miRDP2 are freely available from https://sourceforge.net/projects/mirdp2/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3607-3609
Author(s):  
Louis J Taylor ◽  
Arwa Abbas ◽  
Frederic D Bushman

Abstract Summary High-throughput sequencing is a powerful technique for addressing biological questions. Grabseqs streamlines access to publicly available metagenomic data by providing a single, easy-to-use interface to download data and metadata from multiple repositories, including the Sequence Read Archive, the Metagenomics Rapid Annotation through Subsystems Technology server and iMicrobe. Users can download data and metadata in a standardized format from any number of samples or projects from a given repository with a single grabseqs command. Availability and implementation Grabseqs is an open-source tool implemented in Python and licensed under the MIT license. The source code is freely available at https://github.com/louiejtaylor/grabseqs, the Python Package Index and Anaconda Cloud repository. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3815-3817 ◽  
Author(s):  
Alejandra Cervera ◽  
Ville Rantanen ◽  
Kristian Ovaska ◽  
Marko Laakso ◽  
Javier Nuñez-Fontarnau ◽  
...  

Abstract Summary Anduril is an analysis and integration framework that facilitates the design, use, parallelization and reproducibility of bioinformatics workflows. Anduril has been upgraded to use Scala for pipeline construction, which simplifies software maintenance, and facilitates design of complex pipelines. Additionally, Anduril’s bioinformatics repository has been expanded with multiple components, and tutorial pipelines, for next-generation sequencing data analysis. Availabilityand implementation Freely available at http://anduril.org. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document