TRTools: a toolkit for genome-wide analysis of tandem repeats

Bioinformatics ◽

10.1093/bioinformatics/btaa736 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nima Mousavi ◽

Jonathan Margoliash ◽

Neha Pusarla ◽

Shubham Saini ◽

Richard Yanicky ◽

...

Keyword(s):

Quality Control ◽

Tandem Repeats ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Genome Wide Analysis ◽

Genome Wide ◽

Wide Range ◽

Downstream Analysis

Abstract Summary A rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and suite of command line tools for filtering, merging and quality control of TR genotype files. TRTools utilizes an internal harmonization module, making it compatible with outputs from a wide range of TR genotypers. Availability and implementation TRTools is freely available at https://github.com/gymreklab/TRTools. Detailed documentation is available at https://trtools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TRTools: a toolkit for genome-wide analysis of tandem repeats

10.1101/2020.03.17.996033 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nima Mousavi ◽

Jonathan Margoliash ◽

Neha Pusarla ◽

Shubham Saini ◽

Richard Yanicky ◽

...

Keyword(s):

Quality Control ◽

Tandem Repeats ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Genome Wide Analysis ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Downstream Analysis

AbstractSummaryA rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and a suite of command-line tools for filtering, merging, and quality control of TR genotype files. TRTools utilizes an internal harmonization module making it compatible with outputs from a wide range of TR genotypers.AvailabilityTRTools is freely available at https://github.com/gymreklab/[email protected] informationSupplementary data are available at bioRxiv.

Download Full-text

snakePipes: facilitating flexible, scalable and integrative epigenomic analysis

Bioinformatics ◽

10.1093/bioinformatics/btz436 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4757-4759 ◽

Cited By ~ 18

Author(s):

Vivek Bhardwaj ◽

Steffen Heyne ◽

Katarzyna Sikora ◽

Leily Rabbani ◽

Michael Rauer ◽

...

Keyword(s):

Single Cell ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Rna Seq ◽

Downstream Analysis ◽

Scalable Analysis

Abstract Summary Due to the rapidly increasing scale and diversity of epigenomic data, modular and scalable analysis workflows are of wide interest. Here we present snakePipes, a workflow package for processing and downstream analysis of data from common epigenomic assays: ChIP-seq, RNA-seq, Bisulfite-seq, ATAC-seq, Hi-C and single-cell RNA-seq. snakePipes enables users to assemble variants of each workflow and to easily install and upgrade the underlying tools, via its simple command-line wrappers and yaml files. Availability and implementation snakePipes can be installed via conda: `conda install -c mpi-ie -c bioconda -c conda-forge snakePipes’. Source code (https://github.com/maxplanck-ie/snakepipes) and documentation (https://snakepipes.readthedocs.io/en/latest/) are available online. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

capC-MAP: software for analysis of Capture-C data

Bioinformatics ◽

10.1093/bioinformatics/btz480 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4773-4775 ◽

Cited By ~ 1

Author(s):

Adam Buckle ◽

Nick Gilbert ◽

Davide Marenduzzo ◽

Chris A Brackley

Keyword(s):

Software Package ◽

Experimental Methods ◽

Ease Of Use ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Chromosome Conformation ◽

Chromatin Interactions ◽

Genome Wide ◽

Genomic Locations

Abstract Summary Capture-C is a member of the chromosome-conformation-capture family of experimental methods which probes the 3D organization of chromosomes within the cell nucleus. It provides high-resolution information on the genome-wide chromatin interactions from a set of ‘target’ genomic locations, and is growing in popularity as a tool for improving our understanding of cis-regulation and gene function. Yet, analysis of the data is complicated, and to date there has been no dedicated or easy-to-use software to automate the process. We present capC-MAP, a software package for the analysis of Capture-C data. Availability and implementation Implemented with both ease of use and flexibility in mind, capC-MAP is a suit of programs written in C++ and Python, where each program can be run separately, or an entire analysis can be performed with a single command line. It is available under an open-source licence at https://github.com/cbrackley/capC-MAP, as well as via the conda package manager, and should run on any standard Unix-style system. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RICOPILI: Rapid Imputation for COnsortias PIpeLIne

10.1101/587196 ◽

2019 ◽

Cited By ~ 7

Author(s):

Max Lam ◽

Swapnil Awasthi ◽

Hunna J. Watson ◽

Jackie Goldstein ◽

Georgia Panagiotaropoulou ◽

...

Keyword(s):

Quality Control ◽

Complex Traits ◽

High Performance ◽

Large Scale ◽

Genome Wide Association Study ◽

Meta Analysis ◽

Supplementary Information ◽

Manuscript Preparation ◽

Genome Wide ◽

Wide Range

AbstractMotivationGenome-wide association study (GWAS) analyses, at sufficient sample sizes and power, have successfully revealed biological insights for several complex traits. RICOPILI, an open sourced Perl-based pipeline was developed to address the challenges of rapidly processing large scale multi-cohort GWAS studies including quality control, imputation and downstream analyses. The pipeline is computationally efficient with portability to a wide range of high-performance computing (HPC) environments.SummaryRICOPILI was created as the Psychiatric Genomics Consortium (PGC) pipeline for GWAS and has been adopted by other users. The pipeline features i) technical and genomic quality control in case-control and trio cohorts ii) genome-wide phasing and imputation iv) association analysis v) meta-analysis vi) polygenic risk scoring and vii) replication analysis. Notably, a major differentiator from other GWAS pipelines, RICOPILI leverages on automated parallelization and cluster job management approaches for rapid production of imputed genome-wide data. A comprehensive meta-analysis of simulated GWAS data has been incorporated demonstrating each step of the pipeline. This includes all of the associated visualization plots, to allow ease of data interpretation and manuscript preparation. Simulated GWAS datasets are also packaged with the pipeline for user training tutorials and developer work.Availability and ImplementationRICOPILI has a flexible architecture to allow for ongoing development and incorporation of newer available algorithms and is adaptable to various HPC environments (QSUB, BSUB, SLURM and others). Specific links for genomic resources are either directly provided in this paper or via tutorials and external links. The central location hosting scripts and tutorials is found at this URL:https://sites.google.com/a/broadinstitute.org/RICOPILI/[email protected] informationSupplementary data are available.

Download Full-text

Epidemiological modeling in StochSS Live!

Bioinformatics ◽

10.1093/bioinformatics/btab061 ◽

2021 ◽

Author(s):

Richard Jiang ◽

Bruno Jacob ◽

Matthew Geiger ◽

Sean Matthew ◽

Bryan Rumsey ◽

...

Keyword(s):

Stochastic Model ◽

Epidemiological Model ◽

Supplementary Information ◽

Supplementary Data ◽

Web Based ◽

Epidemiological Modeling ◽

Modeling Simulation ◽

Wide Range ◽

Biochemical Systems

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data

Bioinformatics ◽

10.1093/bioinformatics/btaa070 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3263-3265 ◽

Cited By ~ 14

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Command Line Tool ◽

High Level

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Spliceogen: an integrative, scalable tool for the discovery of splice-altering variants

Bioinformatics ◽

10.1093/bioinformatics/btz263 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4405-4407 ◽

Cited By ~ 1

Author(s):

Steven Monger ◽

Michael Troup ◽

Eddie Ip ◽

Sally L Dunwoodie ◽

Eleni Giannoulatou

Keyword(s):

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

In Silico Prediction ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Prediction Tools ◽

Motif Prediction ◽

Command Line Tool ◽

Genome Scale

Abstract Motivation In silico prediction tools are essential for identifying variants which create or disrupt cis-splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. Results We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. Availability and implementation Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github.com/VCCRI/Spliceogen). SNV databases with prediction scores are also available, covering all possible SNVs at all genomic positions within all Gencode-annotated multi-exon transcripts. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

aCLImatise: automated generation of tool definitions for bioinformatics workflows

Bioinformatics ◽

10.1093/bioinformatics/btaa1033 ◽

2020 ◽

Author(s):

Michael Milton ◽

Natalie Thorne

Keyword(s):

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Automated Generation ◽

Base Camp ◽

Python Package ◽

Bioinformatics Workflow ◽

Bioinformatics Workflows

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text