OrtSuite – a flexible pipeline for annotation of ecosystem processes and prediction of putative microbial interactions

2020 ◽  
Author(s):  
João Pedro Saraiva ◽  
Marta Gomes ◽  
René Kallies ◽  
Carsten Vogt ◽  
Antonis Chatzinotas ◽  
...  

Abstract Background: The exponential increase in high-throughput sequencing data and the development of computational sciences and bioinformatics pipelines has advanced our understanding of microbial community composition and distribution in complex ecosystems. Despite these advances, the identification of microbial interactions from genomic data remains a major bottleneck. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content. Results: OrtSuite combines ortholog clustering strategies with genome annotation based on a user-defined set of functions allowing for hypothesis-driven data analysis. OrtSuit allows users to install and run all workflow components and analyze the generated outputs using a simple pipeline consisting of 23 bash commands and one R command. Annotation is based on a two-stage process. First, only a subset of sequences from each ortholog cluster are aligned to all sequences in the Ortholog-Reaction Association database (ORAdb). Next, all sequences from clusters that meet a user-defined identity threshold are aligned to all sequence sets in ORAdb to which they had a hit. This approach results in a decrease in time needed for functional annotation. Further, OrtSuit identifies putative interspecies interactions based on their individual genomic content based on constrains given by the users. Additional control is afforded to the user at several stages of the workflow: 1) The construction of ORAdb only needs to be performed once for each specific process also allowing manual curation; 2) The identity and sequence similarity thresholds used during the annotation stage can be adjusted; and 3) Constraints related to pathway reaction composition and known species contributions to ecosystem processes can be defined. Conclusions: OrtSuit is an easy to use workflow that allows for rapid functional annotation based on a user curated database. Further, this novel workflow allows the identification of interspecies interactions through user-defined constrains. Due to its low computational demands, for small datasets (e.g. maximum 100 genomes) OrtSuit can run on a personal computer. For larger datasets (> 100 genomes), we suggest the use of computer clusters. OrtSuit is an open-source software available at https://github.com/mdsufz/OrtSuit .

2018 ◽  
Author(s):  
Chenhao Li ◽  
Lisa Tucker-Kellogg ◽  
Niranjan Nagarajan

AbstractA growing body of literature points to the important roles that different microbial communities play in diverse natural environments and the human body. The dynamics of these communities is driven by a range of microbial interactions from symbiosis to predator-prey relationships, the majority of which are poorly understood, making it hard to predict the response of the community to different perturbations. With the increasing availability of high-throughput sequencing based community composition data, it is now conceivable to directly learn models that explicitly define microbial interactions and explain community dynamics. The applicability of these approaches is however affected by several experimental limitations, particularly the compositional nature of sequencing data. We present a new computational approach (BEEM) that addresses this key limitation in the inference of generalised Lotka-Volterra models (gLVMs) by coupling biomass estimation and model inference in an expectation maximization like algorithm (BEEM). Surprisingly, BEEM outperforms state-of-the-art methods for inferring gLVMs, while simultaneously eliminating the need for additional experimental biomass data as input. BEEM’s application to previously inaccessible public datasets (due to the lack of biomass data) allowed us for the first time to analyse microbial communities in the human gut on a per individual basis, revealing personalised dynamics and keystone species.


2021 ◽  
Author(s):  
João Pedro Saraiva ◽  
Alexandre Bartholomäus ◽  
René Kallies ◽  
Marta Gomes ◽  
Marcos Bicalho ◽  
...  

Abstract The high complexity found in microbial communities makes the identification of microbial interactions challenging. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content of microbial communities and targeted to specific ecosystem processes. The pipeline is composed of three user-friendly bash commands. OrtSuite combines ortholog clustering with genome annotation strategies limited to user-defined sets of functions allowing for hypothesis-driven data analysis such as assessing microbial interactions in specific ecosystems. OrtSuite matched, on average, 96 % of experimentally verified KEGG orthologs involved in benzoate degradation in a known group of benzoate degraders. We evaluated the identification of putative synergistic species interactions using the sequenced genomes of an independent study that had previously proposed potential species interactions in benzoate degradation. OrtSuite is an easy-to-use workflow that allows for rapid functional annotation based on a user-curated database and can easily be extended to ecosystem processes where connections between genes and reactions are known. OrtSuite is an open-source software available at https://github.com/mdsufz/OrtSuite.


Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1428
Author(s):  
Rosalie B. Calderon ◽  
Chang Yoon Jeong ◽  
Hyun-Hwoi Ku ◽  
Lyndon M. Coghill ◽  
Young Jeong Ju ◽  
...  

The application of organic materials that promote beneficial microbial activity is vital to maintaining soil health and crop productivity. We investigated the effect on the soil microbiome of applying biochar (BC), poultry litter (PL), and a combination of biochar and poultry litter (BC/PL) in soybean cultivation at the Red River Research Station (Bossier City, Louisiana, USA). We characterized the microbial profiles, community structure, and co-occurrence network from sequencing data to infer microbial interactions in the soil samples collected in the first and second years of each soil treatment (2016 and 2017, respectively). Our results showed that soil treatments with BC, PL, and a combination of both moderately changed the microbial community composition and structure. In particular, genera significantly affected by the different soil treatments were identified via differential abundance analysis. In addition, canonical correspondence analysis revealed that soil chemical properties, total N in the first year, and total C and pH in the second year influenced the community variability. The differentially enriched bacterial ASVs and co-occurring taxa were linked to nutrient cycling. This study provides insights into the impact of soil carbon amendment on the soil microbiome, a process which favors beneficial bacteria and promotes soybean growth.


2021 ◽  
Author(s):  
Joao Pedro Saraiva ◽  
Alexandre Bartholomäus ◽  
René Kallies ◽  
Marta Gomes ◽  
Marcos Vinicios Fleming Bicalho ◽  
...  

The high complexity found in microbial communities makes the identification of microbial interactions challenging. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content of microbial communities and targeted to specific ecosystem processes. The pipeline is composed of three user-friendly bash commands. OrtSuite combines ortholog clustering with genome annotation strategies limited to user-defined sets of functions allowing for hypothesis-driven data analysis such as assessing microbial interactions in specific ecosystems. OrtSuite matched, on average, 96 % of experimentally verified KEGG orthologs involved in benzoate degradation in a known group of benzoate degraders. Identification of putative synergistic species interactions was evaluated using the sequenced genomes of an independent study which had previously proposed potential species interactions in benzoate degradation. OrtSuite is an easy to use workflow that allows for rapid functional annotation based on a user curated database and can easily be extended to ecosystem processes where connections between genes and reactions are known. OrtSuite is an open-source software available at https://github.com/mdsufz/OrtSuite.


Author(s):  
Stefanie Peschel ◽  
Christian L. Müller ◽  
Erika von Mutius ◽  
Anne-Laure Boulesteix ◽  
Martin Depner

AbstractEstimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization, and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analyzing, and comparing microbial association networks from high-throughput sequencing data.Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analyzing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa, or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi’s wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children’s rooms between samples from two study centers (Ulm and Munich).AvailabilityA script with R code used for producing the examples shown in this manuscript are provided as Supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi.


2019 ◽  
Vol 3 (4) ◽  
pp. 256-259 ◽  
Author(s):  
Marie Lefebvre ◽  
Sébastien Theil ◽  
Yuxin Ma ◽  
Thierry Candresse

Viral metagenomics relies on high-throughput sequencing and on bioinformatic analyses to access the genetic content and diversity of entire viral communities. No universally accepted strategy or tool currently exists to define operational taxonomy units (OTUs) and evaluate viral alpha or beta diversity from virome data. Here we present a new bioinformatic resource, the VirAnnot (automated viral diversity estimation) pipeline, which performs the automated identification of OTUs. Reverse-position-specific BLAST (RPS-Blastn) is used to detect conserved viral protein motifs. The corresponding contigs are then aligned and a clustering approach is used to group in the same OTU contigs sharing more than a set identity threshold. A 10% threshold has been validated as producing OTUs that reasonably approach, in many families, the International Committee for the Taxonomy of Viruses taxonomy and can therefore be used as a proxy to viral species.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Yue-Dong Gao ◽  
Yuqi Zhao ◽  
Jingfei Huang

The recent high-throughput sequencing has enabled the composition ofEscherichia colistrains in the human microbial community to be profiled en masse. However, there are two challenges to address: (1) exploring the genetic differences betweenE. colistrains in human gut and (2) dynamic responses ofE. colito diverse stress conditions. As a result, we investigated theE. colistrains in human gut microbiome using deep sequencing data and reconstructed genome-wide metabolic networks for the three most commonE. colistrains, includingE. coliHS, UTI89, and CFT073. The metabolic models show obvious strain-specific characteristics, both in network contents and in behaviors. We predicted optimal biomass production for three models on four different carbon sources (acetate, ethanol, glucose, and succinate) and found that these stress-associated genes were involved in host-microbial interactions and increased in human obesity. Besides, it shows that the growth rates are similar among the models, but the flux distributions are different, even inE. colicore reactions. The correlations between human diabetes-associated metabolic reactions in theE. colimodels were also predicted. The study provides a systems perspective onE. colistrains in human gut microbiome and will be helpful in integrating diverse data sources in the following study.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Jiayu Shang ◽  
Yanni Sun

Abstract Background Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction. Results In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifically designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class). Conclusion HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa.


2021 ◽  
Vol 9 (4) ◽  
pp. 840
Author(s):  
Joao Pedro Saraiva ◽  
Anja Worrich ◽  
Canan Karakoç ◽  
Rene Kallies ◽  
Antonis Chatzinotas ◽  
...  

Mining interspecies interactions remain a challenge due to the complex nature of microbial communities and the need for computational power to handle big data. Our meta-analysis indicates that genetic potential alone does not resolve all issues involving mining of microbial interactions. Nevertheless, it can be used as the starting point to infer synergistic interspecies interactions and to limit the search space (i.e., number of species and metabolic reactions) to a manageable size. A reduced search space decreases the number of additional experiments necessary to validate the inferred putative interactions. As validation experiments, we examine how multi-omics and state of the art imaging techniques may further improve our understanding of species interactions’ role in ecosystem processes. Finally, we analyze pros and cons from the current methods to infer microbial interactions from genetic potential and propose a new theoretical framework based on: (i) genomic information of key members of a community; (ii) information of ecosystem processes involved with a specific hypothesis or research question; (iii) the ability to identify putative species’ contributions to ecosystem processes of interest; and, (iv) validation of putative microbial interactions through integration of other data sources.


2017 ◽  
Author(s):  
Even Sannes Riiser ◽  
Thomas H.A. Haverkamp ◽  
Ørnulf Borgan ◽  
Kjetill S. Jakobsen ◽  
Sissel Jentoft ◽  
...  

AbstractBackgroundHost-microbe interactions are particularly intriguing in Atlantic cod (Gadus morhua), as it lacks the MHC II complex involved in presentation of extracellular pathogens. Nonetheless, little is known about the diversity of its microbiome in natural populations. Here, we use 16S rRNA high-throughput sequencing to investigate the microbial community composition in gut content and mucosa of 22 adult individuals from two coastal populations in Norway, located 470 km apart.ResultsWe identify a core microbiome of 23 OTUs (97% sequence similarity) in all individuals that comprises 93% of the total number of reads. The most abundant orders are classified as Vibrionales, Fusobacteriales, Clostridiales and Bacteroidales. While mucosal samples show significantly lower diversity than gut content samples, no differences in OTU community composition are observed between the two populations. The differential abundance of oligotypes within two common OTUs does reveal limited spatial segregation. Remarkably, the most abundant OTU consists of a single oligotype (order Vibrionales, genus Photobacterium) that represents nearly 50% of the reads in both locations.ConclusionsOur results show that the intestinal bacterial community of two geographically separated coastal populations of Atlantic cod is dominated by a limited number of highly abundant 16S rRNA oligotypes shared by all specimens examined. The ubiquity of these oligotypes suggests that the northern coastal Atlantic cod gut microbiome is colonized by a limited number of species with excellent dispersal capabilities that are well suited to thrive in their host environment.


Sign in / Sign up

Export Citation Format

Share Document