scholarly journals A network-based integrated framework for predicting virus–prokaryote interactions

2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Weili Wang ◽  
Jie Ren ◽  
Kujin Tang ◽  
Emily Dart ◽  
Julio Cesar Ignacio-Espinoza ◽  
...  

Abstract Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus–prokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures ($s_2^*$ and WIsH). Evaluation of this method on a benchmark set of 1462 known virus–prokaryote pairs yielded host prediction accuracy of 59% and 86% at the genus and phylum levels, representing 16–27% and 6–10% improvement, respectively, over previous single-feature prediction approaches. We applied our host prediction tool to crAssphage, a human gut phage, and two metagenomic virus datasets: marine viruses and viral contigs recovered from globally distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to nearly 3-fold more (n > 27 000), greatly expanding the diversity of known virus–host interactions.

2018 ◽  
Author(s):  
Weili Wang ◽  
Jie Ren ◽  
Kujin Tang ◽  
Emily Dart ◽  
Julio Cesar Ignacio-Espinoza ◽  
...  

AbstractMetagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus-host interactions using multiple, integrated features: CRISPR sequences, sequence homology, and alignment-free similarity measures (and WIsH). Evaluation of this method on a benchmark set of 1,075 known viruses-host pairs yielded host prediction accuracy of 62% and 85% at the genus and phylum levels, representing 12-27% and 10-18% improvement respectively over previous single-feature prediction approaches. We applied our host-prediction tool to three metagenomic virus datasets: human gut crAss-like phages, marine viruses, and viruses recovered from globally-distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to 6-fold more (n>60,000), greatly expanding the diversity of known virus-host interactions.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Congyu Lu ◽  
Zheng Zhang ◽  
Zena Cai ◽  
Zhaozhong Zhu ◽  
Ye Qiu ◽  
...  

Abstract Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
David Pellow ◽  
Alvah Zorea ◽  
Maraike Probst ◽  
Ori Furman ◽  
Arik Segal ◽  
...  

Abstract Background Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples. Results We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)—an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets. Conclusions SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP.


2016 ◽  
Vol 82 (9) ◽  
pp. 2854-2861 ◽  
Author(s):  
Omri M. Finkel ◽  
Tom O. Delmont ◽  
Anton F. Post ◽  
Shimshon Belkin

ABSTRACTThe leaves ofTamarix aphylla, a globally distributed, salt-secreting desert tree, are dotted with alkaline droplets of high salinity. To successfully inhabit these organic carbon-rich droplets, bacteria need to be adapted to multiple stress factors, including high salinity, high alkalinity, high UV radiation, and periodic desiccation. To identify genes that are important for survival in this harsh habitat, microbial community DNA was extracted from the leaf surfaces of 10Tamarix aphyllatrees along a 350-km longitudinal gradient. Shotgun metagenomic sequencing, contig assembly, and binning yielded 17 genome bins, six of which were >80% complete. These genomic bins, representing three phyla (Proteobacteria,Bacteroidetes, andFirmicutes), were closely related to halophilic and alkaliphilic taxa isolated from aquatic and soil environments. Comparison of these genomic bins to the genomes of their closest relatives revealed functional traits characteristic of bacterial populations inhabiting theTamarixphyllosphere, independent of their taxonomic affiliation. These functions, most notably light-sensing genes, are postulated to represent important adaptations toward colonization of this habitat.IMPORTANCEPlant leaves are an extensive and diverse microbial habitat, forming the main interface between solar energy and the terrestrial biosphere. There are hundreds of thousands of plant species in the world, exhibiting a wide range of morphologies, leaf surface chemistries, and ecological ranges. In order to understand the core adaptations of microorganisms to this habitat, it is important to diversify the type of leaves that are studied. This study provides an analysis of the genomic content of the most abundant bacterial inhabitants of the globally distributed, salt-secreting desert treeTamarix aphylla. Draft genomes of these bacteria were assembled, using the culture-independent technique of assembly and binning of metagenomic data. Analysis of the genomes reveals traits that are important for survival in this habitat, most notably, light-sensing and light utilization genes.


Viruses ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 667 ◽  
Author(s):  
Ling Deng ◽  
Ronalds Silins ◽  
Josué L. Castro-Mejía ◽  
Witold Kot ◽  
Leon Jessen ◽  
...  

The human gut microbiome (GM) plays an important role in human health and diseases. However, while substantial progress has been made in understanding the role of bacterial inhabitants of the gut, much less is known regarding the viral component of the GM. Bacteriophages (phages) are viruses attacking specific host bacteria and likely play important roles in shaping the GM. Although metagenomic approaches have led to the discoveries of many new viruses, they remain largely uncultured as their hosts have not been identified, which hampers our understanding of their biological roles. Existing protocols for isolation of viromes generally require relatively high input volumes and are generally more focused on extracting nucleic acids of good quality and purity for down-stream analysis, and less on purifying viruses with infective capacity. In this study, we report the development of an efficient protocol requiring low sample input yielding purified viromes containing phages that are still infective, which also are of sufficient purity for genome sequencing. We validated the method through spiking known phages followed by plaque assays, qPCR, and metagenomic sequencing. The protocol should facilitate the process of culturing novel viruses from the gut as well as large scale studies on gut viromes.


2020 ◽  
Vol 9 (8) ◽  
Author(s):  
Marcia F. Marston ◽  
Shawn W. Polson

Synechococcus spp. are unicellular cyanobacteria that are globally distributed and are important primary producers in marine coastal environments. Here, we report the complete genome sequence of Synechococcus sp. strain WH 8101 and identify genomic islands that may play a role in virus-host interactions.


Electronics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 427 ◽  
Author(s):  
Zahir ◽  
Yuan ◽  
Moniz

Recommendation systems alleviate the problem of information overload by helping users find information relevant to their preference. Memory-based recommender systems use correlation-based similarity to measure the common interest among users. The trust between users is often used to address the issues associated with correlation-based similarity measures. However, in most applications, the trust relationships between users are not available. A popular method to extract the implicit trust relationship between users employs prediction accuracy. This method has several problems such as high computational cost and data sparsity. In this paper, addressing the problems associated with prediction accuracy-based trust extraction methods, we proposed a novel trust-based method called AgreeRelTrust. Unlike accuracy-based methods, this method does not require the calculation of initial prediction and the trust relationship is more meaningful. The collective agreements between any two users and their relative activities are fused to obtain the trust relationship. To evaluate the usefulness of our method, we applied it to three public data sets and compared the prediction accuracy with well-known collaborative filtering methods. The experimental results show our method has large improvements over the other methods.


2020 ◽  
Vol 21 (15) ◽  
pp. 5222 ◽  
Author(s):  
Xiao-Nan Fan ◽  
Shao-Wu Zhang ◽  
Song-Yao Zhang ◽  
Jin-Jie Ni

Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing the lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. In this study, we presented an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporated three different input modalities, then a multimodal deep learning framework was built for learning the high-level abstract representations and predicting the probability whether a transcript was lncRNA or not. LncRNA_Mdeep achieved 98.73% prediction accuracy in a 10-fold cross-validation test on humans. Compared with other eight state-of-the-art methods, lncRNA_Mdeep showed 93.12% prediction accuracy independent test on humans, which was 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets showed that lncRNA_Mdeep was a powerful predictor for predicting lncRNAs.


2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Sriram P. Chockalingam ◽  
Jodh Pannu ◽  
Sahar Hooshmand ◽  
Sharma V. Thankachan ◽  
Srinivas Aluru

Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk, have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACSk takes O(n logkn) time and hence impractical for large datasets, multiple heuristics that can approximate ACSk have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACSk, which is faster than computing the exact ACSk while being closer to the exact ACSk values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACSk and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs.


Sign in / Sign up

Export Citation Format

Share Document