Automatic curation of large comparative animal MicroRNA datasets

2019 ◽  
Vol 35 (22) ◽  
pp. 4553-4559
Author(s):  
Ali M Yazbeck ◽  
Peter F Stadler ◽  
Kifah Tout ◽  
Jörg Fallmann

Abstract Motivation MicroRNAs form an important class of RNA regulators that has been studied extensively. The miRBase and Rfam database provide rich, frequently updated information on both pre-miRNAs and their mature forms. These data sources, however, rely on individual data submission and thus are neither complete nor consistent in their coverage across different miRNA families. Quantitative studies of miRNA evolution therefore are difficult or impossible on this basis. Results We present here a workflow and a corresponding implementation, MIRfix, that automatically curates miRNA datasets by improving alignments of their precursors, the consistency of the annotation of mature miR and miR* sequence, and the phylogenetic coverage. MIRfix produces alignments that are comparable across families and sets the stage for improved homology search as well as quantitative analyses. Availability and implementation MIRfix can be downloaded from https://github.com/Bierinformatik/MIRfix. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (16) ◽  
pp. 4527-4529
Author(s):  
Ales Saska ◽  
David Tichy ◽  
Robert Moore ◽  
Achilles Rasquinha ◽  
Caner Akdas ◽  
...  

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Richard Jiang ◽  
Bruno Jacob ◽  
Matthew Geiger ◽  
Sean Matthew ◽  
Bryan Rumsey ◽  
...  

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Pavel Beran ◽  
Dagmar Stehlíková ◽  
Stephen P Cohen ◽  
Vladislav Čurn

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Marco Antonio Tangaro ◽  
Pietro Mandreoli ◽  
David S Horner ◽  
...  

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy   http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
John Zobolas ◽  
Vasundra Touré ◽  
Martin Kuiper ◽  
Steven Vercruysse

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Michaela Frye ◽  
Susanne Bornelöv

Abstract Summary CONCUR is a standalone tool for codon usage analysis in ribosome profiling experiments. CONCUR uses the aligned reads in BAM format to estimate codon counts at the ribosome E-, P- and A-sites and at flanking positions. Availability and implementation CONCUR is written in Perl and is freely available at https://github.com/susbo/concur. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Arthur Ecoffet ◽  
Frédéric Poitevin ◽  
Khanh Dao Duc

Abstract Motivation Cryogenic electron microscopy (cryo-EM) offers the unique potential to capture conformational heterogeneity, by solving multiple three-dimensional classes that co-exist within a single cryo-EM image dataset. To investigate the extent and implications of such heterogeneity, we propose to use an optimal-transport-based metric to interpolate barycenters between EM maps and produce morphing trajectories. Results While standard linear interpolation mostly fails to produce realistic transitions, our method yields continuous trajectories that displace densities to morph one map into the other, instead of blending them. Availability and implementation Our method is implemented as a plug-in for ChimeraX called MorphOT, which allows the use of both CPU or GPU resources. The code is publicly available on GitHub (https://github.com/kdd-ubc/MorphOT.git), with documentation containing tutorial and datasets. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Tomasz Zok

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Ting-Hsuan Wang ◽  
Cheng-Ching Huang ◽  
Jui-Hung Hung

Abstract Motivation Cross-sample comparisons or large-scale meta-analyses based on the next generation sequencing (NGS) involve replicable and universal data preprocessing, including removing adapter fragments in contaminated reads (i.e. adapter trimming). While modern adapter trimmers require users to provide candidate adapter sequences for each sample, which are sometimes unavailable or falsely documented in the repositories (such as GEO or SRA), large-scale meta-analyses are therefore jeopardized by suboptimal adapter trimming. Results Here we introduce a set of fast and accurate adapter detection and trimming algorithms that entail no a priori adapter sequences. These algorithms were implemented in modern C++ with SIMD and multithreading to accelerate its speed. Our experiments and benchmarks show that the implementation (i.e. EARRINGS), without being given any hint of adapter sequences, can reach comparable accuracy and higher throughput than that of existing adapter trimmers. EARRINGS is particularly useful in meta-analyses of a large batch of datasets and can be incorporated in any sequence analysis pipelines in all scales. Availability and implementation EARRINGS is open-source software and is available at https://github.com/jhhung/EARRINGS. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (16) ◽  
pp. 2843-2846 ◽  
Author(s):  
Hung Nguyen ◽  
Sangam Shrestha ◽  
Sorin Draghici ◽  
Tin Nguyen

Abstract Summary Since cancer is a heterogeneous disease, tumor subtyping is crucial for improved treatment and prognosis. We have developed a subtype discovery tool, called PINSPlus, that is: (i) robust against noise and unstable quantitative assays, (ii) able to integrate multiple types of omics data in a single analysis and (iii) dramatically superior to established approaches in identifying known subtypes and novel subgroups with significant survival differences. Our validation on 12,158 samples from 44 datasets shows that PINSPlus vastly outperforms other approaches. The software is easy-to-use and can partition hundreds of patients in a few minutes on a personal computer. Availability and implementation The package is available at https://cran.r-project.org/package=PINSPlus. Data and R script used in this manuscript are available at https://bioinformatics.cse.unr.edu/software/PINSPlus/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document