Automatic curation of large comparative animal MicroRNA datasets

Abstract Motivation MicroRNAs form an important class of RNA regulators that has been studied extensively. The miRBase and Rfam database provide rich, frequently updated information on both pre-miRNAs and their mature forms. These data sources, however, rely on individual data submission and thus are neither complete nor consistent in their coverage across different miRNA families. Quantitative studies of miRNA evolution therefore are difficult or impossible on this basis. Results We present here a workflow and a corresponding implementation, MIRfix, that automatically curates miRNA datasets by improving alignments of their precursors, the consistency of the annotation of mature miR and miR* sequence, and the phylogenetic coverage. MIRfix produces alignments that are comparable across families and sets the stage for improved homology search as well as quantitative analyses. Availability and implementation MIRfix can be downloaded from https://github.com/Bierinformatik/MIRfix. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ccNetViz: a WebGL-based JavaScript library for visualization of large networks

Bioinformatics ◽

10.1093/bioinformatics/btaa559 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4527-4529

Author(s):

Ales Saska ◽

David Tichy ◽

Robert Moore ◽

Achilles Rasquinha ◽

Caner Akdas ◽

...

Keyword(s):

Systems Biology ◽

Complex Networks ◽

Open Source ◽

High Speed ◽

A Priori ◽

Supplementary Information ◽

Network Visualization ◽

Supplementary Data ◽

Web Based ◽

Flow Of Information

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Epidemiological modeling in StochSS Live!

Bioinformatics ◽

10.1093/bioinformatics/btab061 ◽

2021 ◽

Author(s):

Richard Jiang ◽

Bruno Jacob ◽

Matthew Geiger ◽

Sean Matthew ◽

Bryan Rumsey ◽

...

Keyword(s):

Stochastic Model ◽

Epidemiological Model ◽

Supplementary Information ◽

Supplementary Data ◽

Web Based ◽

Epidemiological Modeling ◽

Modeling Simulation ◽

Wide Range ◽

Biochemical Systems

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

KEC: unique sequence search by K-mer exclusion

Bioinformatics ◽

10.1093/bioinformatics/btab196 ◽

2021 ◽

Author(s):

Pavel Beran ◽

Dagmar Stehlíková ◽

Stephen P Cohen ◽

Vladislav Čurn

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Source Code ◽

Unique Sequence ◽

Supplementary Information ◽

Supplementary Data ◽

Laptop Computers ◽

Sequence Search ◽

Target Sequences ◽

Cross Reference

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CorGAT: a tool for the functional annotation of SARS-CoV-2 genomes

Bioinformatics ◽

10.1093/bioinformatics/btaa1047 ◽

2020 ◽

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Marco Antonio Tangaro ◽

Pietro Mandreoli ◽

David S Horner ◽

...

Keyword(s):

Functional Annotation ◽

Ad Hoc ◽

State Of The Art ◽

Supplementary Information ◽

Genomic Sequences ◽

Supplementary Data ◽

Evolutionary Patterns ◽

Genomic Variants ◽

Art Methods ◽

Available Resources

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

UniBioDicts: Unified access to Biological Dictionaries

Bioinformatics ◽

10.1093/bioinformatics/btaa1065 ◽

2020 ◽

Author(s):

John Zobolas ◽

Vasundra Touré ◽

Martin Kuiper ◽

Steven Vercruysse

Keyword(s):

User Interface ◽

Life Science ◽

Biological Data ◽

Supplementary Information ◽

Supplementary Data ◽

Query Interface ◽

Controlled Vocabularies ◽

Search String ◽

Software Packages ◽

The Right

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CONCUR: quick and robust calculation of codon usage from ribosome profiling data

Bioinformatics ◽

10.1093/bioinformatics/btaa733 ◽

2020 ◽

Author(s):

Michaela Frye ◽

Susanne Bornelöv

Keyword(s):

Codon Usage ◽

Ribosome Profiling ◽

Supplementary Information ◽

Supplementary Data ◽

Usage Analysis

Abstract Summary CONCUR is a standalone tool for codon usage analysis in ribosome profiling experiments. CONCUR uses the aligned reads in BAM format to estimate codon counts at the ribosome E-, P- and A-sites and at flanking positions. Availability and implementation CONCUR is written in Perl and is freely available at https://github.com/susbo/concur. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MorphOT: transport-based interpolation between EM maps with UCSF ChimeraX

Bioinformatics ◽

10.1093/bioinformatics/btaa1019 ◽

2020 ◽

Author(s):

Arthur Ecoffet ◽

Frédéric Poitevin ◽

Khanh Dao Duc

Keyword(s):

Optimal Transport ◽

Three Dimensional ◽

Linear Interpolation ◽

The Other ◽

Supplementary Information ◽

Conformational Heterogeneity ◽

Supplementary Data ◽

Image Dataset ◽

Standard Linear ◽

Unique Potential

Abstract Motivation Cryogenic electron microscopy (cryo-EM) offers the unique potential to capture conformational heterogeneity, by solving multiple three-dimensional classes that co-exist within a single cryo-EM image dataset. To investigate the extent and implications of such heterogeneity, we propose to use an optimal-transport-based metric to interpolate barycenters between EM maps and produce morphing trajectories. Results While standard linear interpolation mostly fails to produce realistic transitions, our method yields continuous trajectories that displace densities to morph one map into the other, instead of blending them. Availability and implementation Our method is implemented as a plug-in for ChimeraX called MorphOT, which allows the use of both CPU or GPU resources. The code is publicly available on GitHub (https://github.com/kdd-ubc/MorphOT.git), with documentation containing tutorial and datasets. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BioCommons: a robust java library for RNA structural bioinformatics

Bioinformatics ◽

10.1093/bioinformatics/btab069 ◽

2021 ◽

Author(s):

Tomasz Zok

Keyword(s):

Source Code ◽

Structural Bioinformatics ◽

Supplementary Information ◽

Supplementary Data ◽

Bioinformatic Tools ◽

Data Formats ◽

Central Repository ◽

Diverse Data ◽

2D And 3D ◽

Java Library

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

EARRINGS: an efficient and accurate adapter trimmer entails no a priori adapter sequences

Bioinformatics ◽

10.1093/bioinformatics/btab025 ◽

2021 ◽

Author(s):

Ting-Hsuan Wang ◽

Cheng-Ching Huang ◽

Jui-Hung Hung

Keyword(s):

Open Source Software ◽

Large Scale ◽

A Priori ◽

Supplementary Information ◽

Supplementary Data ◽

Comparable Accuracy ◽

Meta Analyses ◽

Next Generation Sequencing Ngs ◽

Adapter Trimming ◽

Generation Sequencing

Abstract Motivation Cross-sample comparisons or large-scale meta-analyses based on the next generation sequencing (NGS) involve replicable and universal data preprocessing, including removing adapter fragments in contaminated reads (i.e. adapter trimming). While modern adapter trimmers require users to provide candidate adapter sequences for each sample, which are sometimes unavailable or falsely documented in the repositories (such as GEO or SRA), large-scale meta-analyses are therefore jeopardized by suboptimal adapter trimming. Results Here we introduce a set of fast and accurate adapter detection and trimming algorithms that entail no a priori adapter sequences. These algorithms were implemented in modern C++ with SIMD and multithreading to accelerate its speed. Our experiments and benchmarks show that the implementation (i.e. EARRINGS), without being given any hint of adapter sequences, can reach comparable accuracy and higher throughput than that of existing adapter trimmers. EARRINGS is particularly useful in meta-analyses of a large batch of datasets and can be incorporated in any sequence analysis pipelines in all scales. Availability and implementation EARRINGS is open-source software and is available at https://github.com/jhhung/EARRINGS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PINSPlus: a tool for tumor subtype discovery in integrated genomic data

Bioinformatics ◽

10.1093/bioinformatics/bty1049 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2843-2846 ◽

Cited By ~ 15

Author(s):

Hung Nguyen ◽

Sangam Shrestha ◽

Sorin Draghici ◽

Tin Nguyen

Keyword(s):

Personal Computer ◽

Genomic Data ◽

Supplementary Information ◽

Omics Data ◽

Tumor Subtype ◽

Supplementary Data ◽

Significant Survival ◽

Survival Differences

Abstract Summary Since cancer is a heterogeneous disease, tumor subtyping is crucial for improved treatment and prognosis. We have developed a subtype discovery tool, called PINSPlus, that is: (i) robust against noise and unstable quantitative assays, (ii) able to integrate multiple types of omics data in a single analysis and (iii) dramatically superior to established approaches in identifying known subtypes and novel subgroups with significant survival differences. Our validation on 12,158 samples from 44 datasets shows that PINSPlus vastly outperforms other approaches. The software is easy-to-use and can partition hundreds of patients in a few minutes on a personal computer. Availability and implementation The package is available at https://cran.r-project.org/package=PINSPlus. Data and R script used in this manuscript are available at https://bioinformatics.cse.unr.edu/software/PINSPlus/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text