capC-MAP: software for analysis of Capture-C data

Adam Buckle; Nick Gilbert; Davide Marenduzzo; Chris A Brackley

doi:10.1093/bioinformatics/btz480

capC-MAP: software for analysis of Capture-C data

Bioinformatics ◽

10.1093/bioinformatics/btz480 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4773-4775 ◽

Cited By ~ 1

Author(s):

Adam Buckle ◽

Nick Gilbert ◽

Davide Marenduzzo ◽

Chris A Brackley

Keyword(s):

Software Package ◽

Experimental Methods ◽

Ease Of Use ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Chromosome Conformation ◽

Chromatin Interactions ◽

Genome Wide ◽

Genomic Locations

Abstract Summary Capture-C is a member of the chromosome-conformation-capture family of experimental methods which probes the 3D organization of chromosomes within the cell nucleus. It provides high-resolution information on the genome-wide chromatin interactions from a set of ‘target’ genomic locations, and is growing in popularity as a tool for improving our understanding of cis-regulation and gene function. Yet, analysis of the data is complicated, and to date there has been no dedicated or easy-to-use software to automate the process. We present capC-MAP, a software package for the analysis of Capture-C data. Availability and implementation Implemented with both ease of use and flexibility in mind, capC-MAP is a suit of programs written in C++ and Python, where each program can be run separately, or an entire analysis can be performed with a single command line. It is available under an open-source licence at https://github.com/cbrackley/capC-MAP, as well as via the conda package manager, and should run on any standard Unix-style system. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TRTools: a toolkit for genome-wide analysis of tandem repeats

Bioinformatics ◽

10.1093/bioinformatics/btaa736 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nima Mousavi ◽

Jonathan Margoliash ◽

Neha Pusarla ◽

Shubham Saini ◽

Richard Yanicky ◽

...

Keyword(s):

Quality Control ◽

Tandem Repeats ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Genome Wide Analysis ◽

Genome Wide ◽

Wide Range ◽

Downstream Analysis

Abstract Summary A rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and suite of command line tools for filtering, merging and quality control of TR genotype files. TRTools utilizes an internal harmonization module, making it compatible with outputs from a wide range of TR genotypers. Availability and implementation TRTools is freely available at https://github.com/gymreklab/TRTools. Detailed documentation is available at https://trtools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TRTools: a toolkit for genome-wide analysis of tandem repeats

10.1101/2020.03.17.996033 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nima Mousavi ◽

Jonathan Margoliash ◽

Neha Pusarla ◽

Shubham Saini ◽

Richard Yanicky ◽

...

Keyword(s):

Quality Control ◽

Tandem Repeats ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Genome Wide Analysis ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Downstream Analysis

AbstractSummaryA rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and a suite of command-line tools for filtering, merging, and quality control of TR genotype files. TRTools utilizes an internal harmonization module making it compatible with outputs from a wide range of TR genotypers.AvailabilityTRTools is freely available at https://github.com/gymreklab/[email protected] informationSupplementary data are available at bioRxiv.

Download Full-text

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Bioinformatics ◽

10.1093/bioinformatics/btab083 ◽

2021 ◽

Author(s):

Yanrong Ji ◽

Zhihan Zhou ◽

Han Liu ◽

Ramana V Davuluri

Keyword(s):

Dna Sequences ◽

Regulatory Elements ◽

Ease Of Use ◽

Fine Tuning ◽

Supplementary Information ◽

Sequence Motifs ◽

Semantic Relationship ◽

Accurate Identification ◽

Conserved Sequence ◽

Genome Wide

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An integrative approach for fine-mapping chromatin interactions

Bioinformatics ◽

10.1093/bioinformatics/btz843 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1704-1711

Author(s):

Artur Jaroszewicz ◽

Jason Ernst

Keyword(s):

Gene Regulation ◽

High Resolution ◽

Biological Significance ◽

Computational Method ◽

Supplementary Information ◽

Integrative Approach ◽

Genome Architecture ◽

Open Chromatin ◽

Chromatin Interactions ◽

Genome Wide

Abstract Motivation Chromatin interactions play an important role in genome architecture and gene regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g. 5-25 kb), which is substantially coarser than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. Results To predict the sources of Hi-C-identified interactions at a high resolution (e.g. 100 bp), we developed a computational method that integrates data from DNase-seq and ChIP-seq of TFs and histone marks. Our method, χ-CNN, uses this data to first train a convolutional neural network (CNN) to discriminate between called Hi-C interactions and non-interactions. χ-CNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also show χ-CNN predictions enrich for evolutionarily conserved bases, eQTLs and CTCF motifs, supporting their biological significance. χ-CNN provides an approach for analyzing important aspects of genome architecture and gene regulation at a higher resolution than previously possible. Availability and implementation χ-CNN software is available on GitHub (https://github.com/ernstlab/X-CNN). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data

Bioinformatics ◽

10.1093/bioinformatics/btaa070 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3263-3265 ◽

Cited By ~ 14

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Command Line Tool ◽

High Level

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Spliceogen: an integrative, scalable tool for the discovery of splice-altering variants

Bioinformatics ◽

10.1093/bioinformatics/btz263 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4405-4407 ◽

Cited By ~ 1

Author(s):

Steven Monger ◽

Michael Troup ◽

Eddie Ip ◽

Sally L Dunwoodie ◽

Eleni Giannoulatou

Keyword(s):

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

In Silico Prediction ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Prediction Tools ◽

Motif Prediction ◽

Command Line Tool ◽

Genome Scale

Abstract Motivation In silico prediction tools are essential for identifying variants which create or disrupt cis-splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. Results We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. Availability and implementation Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github.com/VCCRI/Spliceogen). SNV databases with prediction scores are also available, covering all possible SNVs at all genomic positions within all Gencode-annotated multi-exon transcripts. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

aCLImatise: automated generation of tool definitions for bioinformatics workflows

Bioinformatics ◽

10.1093/bioinformatics/btaa1033 ◽

2020 ◽

Author(s):

Michael Milton ◽

Natalie Thorne

Keyword(s):

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Automated Generation ◽

Base Camp ◽

Python Package ◽

Bioinformatics Workflow ◽

Bioinformatics Workflows

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Visualization of circular RNAs and their internal splicing events from transcriptomic data

Bioinformatics ◽

10.1093/bioinformatics/btaa033 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2934-2935 ◽

Cited By ~ 1

Author(s):

Yi Zheng ◽

Fangqing Zhao

Keyword(s):

Supplementary Information ◽

Circular Rnas ◽

Visualization Tool ◽

Command Line ◽

Supplementary Data ◽

Transcriptomic Data ◽

Command Line Tool ◽

Transcriptome Comparison ◽

Multiple Samples ◽

Splicing Patterns

Abstract Summary Circular RNAs (circRNAs) are proved to have unique compositions and splicing events distinct from canonical mRNAs. However, there is no visualization tool designed for the exploration of complex splicing patterns in circRNA transcriptomes. Here, we present CIRI-vis, a Java command-line tool for quantifying and visualizing circRNAs by integrating the alignments and junctions of circular transcripts. CIRI-vis can be applied to visualize the internal structure and isoform abundance of circRNAs and perform circRNA transcriptome comparison across multiple samples. Availability and implementation https://sourceforge.net/projects/ciri/files/CIRI-vis. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CAFE 5 models variation in evolutionary rates among gene families

Bioinformatics ◽

10.1093/bioinformatics/btaa1022 ◽

2020 ◽

Author(s):

Fábio K Mendes ◽

Dan Vanderpool ◽

Ben Fulton ◽

Matthew W Hahn

Keyword(s):

Software Package ◽

Computational Analysis ◽

Source Code ◽

Gene Families ◽

Gene Family Evolution ◽

Supplementary Information ◽

Rate Variation ◽

Gene Gain ◽

Command Line ◽

Gains And Losses

Abstract Motivation Genome sequencing projects have revealed frequent gains and losses of genes between species. Previous versions of our software, Computational Analysis of gene Family Evolution (CAFE), have allowed researchers to estimate parameters of gene gain and loss across a phylogenetic tree. However, the underlying model assumed that all gene families had the same rate of evolution, despite evidence suggesting a large amount of variation in rates among families. Results Here, we present CAFE 5, a completely re-written software package with numerous performance and user-interface enhancements over previous versions. These include improved support for multithreading, the explicit modeling of rate variation among families using gamma-distributed rate categories, and command-line arguments that preclude the use of accessory scripts. Availability and implementation CAFE 5 source code, documentation, test data and a detailed manual with examples are freely available at https://github.com/hahnlab/CAFE5/releases. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Knot_pull—python package for biopolymer smoothing and knot detection

Bioinformatics ◽

10.1093/bioinformatics/btz644 ◽

2019 ◽

Cited By ~ 1

Author(s):

Aleksandra I Jarmolinska ◽

Anna Gambin ◽

Joanna I Sulkowska

Keyword(s):

Learning Curve ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Steep Learning Curve ◽

Independent Source ◽

Python Package

Abstract Summary The biggest hurdle in studying topology in biopolymers is the steep learning curve for actually seeing the knots in structure visualization. Knot_pull is a command line utility designed to simplify this process—it presents the user with a smoothing trajectory for provided structures (any number and length of protein, RNA or chromatin chains in PDB, CIF or XYZ format), and calculates the knot type (including presence of any links, and slipknots when a subchain is specified). Availability and implementation Knot_pull works under Python >=2.7 and is system independent. Source code and documentation are available at http://github.com/dzarmola/knot_pull under GNU GPL license and include also a wrapper script for PyMOL for easier visualization. Examples of smoothing trajectories can be found at: https://www.youtube.com/watch?v=IzSGDfc1vAY. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text