PHIST: fast and accurate prediction of prokaryotic hosts from metagenomic viral sequences

AbstractSummaryPHIST (Phage-Host Interaction Search Tool) predicts prokaryotic hosts of viruses from their genomic sequences. It improves host prediction accuracy at species level over current alignment-based tools (on average by 3 percentage points) as well as alignment-free and CRISPR-based tools (by 14–20 percentage points). PHIST is also two orders of magnitude faster than alignment-based tools making it suitable for metagenomics studies.Availability and implementationGNU-licensed C++ code wrapped in Python API available at: https://github.com/refresh-bio/[email protected], [email protected] informationSupplementary data are available at publisher Web site.

Download Full-text

CorGAT: a tool for the functional annotation of SARS-CoV-2 genomes

Bioinformatics ◽

10.1093/bioinformatics/btaa1047 ◽

2020 ◽

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Marco Antonio Tangaro ◽

Pietro Mandreoli ◽

David S Horner ◽

...

Keyword(s):

Functional Annotation ◽

Ad Hoc ◽

State Of The Art ◽

Supplementary Information ◽

Genomic Sequences ◽

Supplementary Data ◽

Evolutionary Patterns ◽

Genomic Variants ◽

Art Methods ◽

Available Resources

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PhyloFold: Precise and Swift Prediction of RNA Secondary Structures to Incorporate Phylogeny among Homologs

10.1101/2020.03.05.975797 ◽

2020 ◽

Author(s):

Masaki Tagashira

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Structural Alignment ◽

Source Code ◽

Secondary Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Structural Alignments

AbstractMotivationThe simultaneous consideration of sequence alignment and RNA secondary structure, or structural alignment, is known to help predict more accurate secondary structures of homologs. However, the consideration is heavy and can be done only roughly to decompose structural alignments.ResultsThe PhyloFold method, which predicts secondary structures of homologs considering likely pairwise structural alignments, was developed in this study. The method shows the best prediction accuracy while demanding comparable running time compared to conventional methods.AvailabilityThe source code of the programs implemented in this study is available on “https://github.com/heartsh/phylofold” and “https://github.com/heartsh/phyloalifold“.Contact“[email protected]”.Supplementary informationSupplementary data are available.

Download Full-text

Taxonomy-aware, sequence similarity ranking reliably predicts phage-host relationships

10.1101/2021.01.05.425417 ◽

2021 ◽

Author(s):

Andrzej Zielezinski ◽

Jakub Barylski ◽

Wojciech M. Karlowski

Keyword(s):

Prediction Accuracy ◽

Phage Genome ◽

Sequence Similarity ◽

Area Under The Curve ◽

Genome Sequences ◽

Host Interactions ◽

Host Interaction ◽

Percentage Points ◽

Host Relationships ◽

Blast Alignment

ABSTRACTMotivationSimilar regions in virus and host genomes provide strong evidence for phage-host interaction, and BLAST is one of the leading tools to predict hosts from phage sequences. However, BLAST-based host prediction has three limitations: (i) top-scoring prokaryotic sequences do not always point to the actual host, (ii) mosaic phage genomes may produce matches to many, typically related, bacteria, and (iii) phage and host sequences may diverge beyond the point where their relationship can be detected by a BLAST alignment.ResultsWe created an extension to BLAST, named Phirbo, that improves host prediction quality beyond what is obtainable from standard BLAST searches. The tool harnesses information concerning sequence similarity and bacteria relatedness to predict phage-host interactions. Phirbo was evaluated on two benchmark sets of known phage-host pairs, and it improved precision and recall by 25 percentage points, as well as the discriminatory power for the recognition of phage-host relationships by 10 percentage points (Area Under the Curve = 0.95). Phirbo also yielded a mean host prediction accuracy of 60% and 70% at the genus and family levels, respectively, representing a 5% improvement over BLAST. When using only a fraction of phage genome sequences (3 kb), the prediction accuracy of Phirbo was 5-11% higher than BLAST at all taxonomic levels.ConclusionOur results suggest that Phirbo is an effective, unsupervised tool for predicting phage-host relationships.AvailabilityPhirbo is available at https://github.com/aziele/phirbo.

Download Full-text

Shark: fishing relevant reads in an RNA-Seq sample

Bioinformatics ◽

10.1093/bioinformatics/btaa779 ◽

2020 ◽

Author(s):

Luca Denti ◽

Yuri Pirola ◽

Marco Previtali ◽

Tamara Ceccato ◽

Gianluca Della Vedova ◽

...

Keyword(s):

Alternative Splicing ◽

High Throughput ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Differential Splicing ◽

Massive Datasets ◽

Gene Assignment ◽

Alignment Free ◽

Input Dataset

Abstract Motivation Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. Results We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. Availability and implementation The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ccNetViz: a WebGL-based JavaScript library for visualization of large networks

Bioinformatics ◽

10.1093/bioinformatics/btaa559 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4527-4529

Author(s):

Ales Saska ◽

David Tichy ◽

Robert Moore ◽

Achilles Rasquinha ◽

Caner Akdas ◽

...

Keyword(s):

Systems Biology ◽

Complex Networks ◽

Open Source ◽

High Speed ◽

A Priori ◽

Supplementary Information ◽

Network Visualization ◽

Supplementary Data ◽

Web Based ◽

Flow Of Information

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Epidemiological modeling in StochSS Live!

Bioinformatics ◽

10.1093/bioinformatics/btab061 ◽

2021 ◽

Author(s):

Richard Jiang ◽

Bruno Jacob ◽

Matthew Geiger ◽

Sean Matthew ◽

Bryan Rumsey ◽

...

Keyword(s):

Stochastic Model ◽

Epidemiological Model ◽

Supplementary Information ◽

Supplementary Data ◽

Web Based ◽

Epidemiological Modeling ◽

Modeling Simulation ◽

Wide Range ◽

Biochemical Systems

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

KEC: unique sequence search by K-mer exclusion

Bioinformatics ◽

10.1093/bioinformatics/btab196 ◽

2021 ◽

Author(s):

Pavel Beran ◽

Dagmar Stehlíková ◽

Stephen P Cohen ◽

Vladislav Čurn

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Source Code ◽

Unique Sequence ◽

Supplementary Information ◽

Supplementary Data ◽

Laptop Computers ◽

Sequence Search ◽

Target Sequences ◽

Cross Reference

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

UniBioDicts: Unified access to Biological Dictionaries

Bioinformatics ◽

10.1093/bioinformatics/btaa1065 ◽

2020 ◽

Author(s):

John Zobolas ◽

Vasundra Touré ◽

Martin Kuiper ◽

Steven Vercruysse

Keyword(s):

User Interface ◽

Life Science ◽

Biological Data ◽

Supplementary Information ◽

Supplementary Data ◽

Query Interface ◽

Controlled Vocabularies ◽

Search String ◽

Software Packages ◽

The Right

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics

BMC Biology ◽

10.1186/s12915-020-00938-6 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Congyu Lu ◽

Zheng Zhang ◽

Zena Cai ◽

Zhaozhong Zhu ◽

Ye Qiu ◽

...

Keyword(s):

Prediction Accuracy ◽

Functional Characterization ◽

Gaussian Model ◽

Software Tool ◽

Biological Properties ◽

Rapid Identification ◽

Taxonomic Assignment ◽

Genus Level ◽

Alignment Free ◽

Archaeal Viruses

Abstract Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.

Download Full-text

CONCUR: quick and robust calculation of codon usage from ribosome profiling data

Bioinformatics ◽

10.1093/bioinformatics/btaa733 ◽

2020 ◽

Author(s):

Michaela Frye ◽

Susanne Bornelöv

Keyword(s):

Codon Usage ◽

Ribosome Profiling ◽

Supplementary Information ◽

Supplementary Data ◽

Usage Analysis

Abstract Summary CONCUR is a standalone tool for codon usage analysis in ribosome profiling experiments. CONCUR uses the aligned reads in BAM format to estimate codon counts at the ribosome E-, P- and A-sites and at flanking positions. Availability and implementation CONCUR is written in Perl and is freely available at https://github.com/susbo/concur. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text