scholarly journals BiomeSeq: A Tool for the Characterization of Animal Microbiomes from Metagenomic Data

2019 ◽  
Author(s):  
Kelly A. Mulholland ◽  
Calvin L. Keeler

AbstractThe complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals. Many microbiome studies characterize only the bacterial component, for which there are several well-developed sequencing methods, bioinformatics tools and databases available. The lack of comprehensive bioinformatics workflows and databases have limited efforts to characterize the other components existing in a microbiome. BiomeSeq is a tool for the analysis of the complete animal microbiome using metagenomic sequencing data. With its comprehensive workflow, customizable parameters and microbial databases, BiomeSeq can rapidly quantify the viral, fungal, bacteriophage and bacterial components of a sample and produce informative tables for analysis. BiomeSeq was employed in detecting and quantifying the respiratory microbiome of a commercial poultry broiler flock throughout its grow-out cycle from hatching to processing. It successfully processed 780 million reads, of which 5,163 aligned to avian DNA viral genomes, 71,936 aligned to avian RNA viral genomes, 469,937 aligned to bacterial genomes, 504,682 aligned to bacteriophage genomes and 1,964 aligned to fungal genomes. For each microbial species detected, BiomeSeq calculated the normalized abundance, percent relative abundance, and coverage as well as the diversity for each sample. BiomeSeq provides for the detection and quantification of the microbiome from next-generation metagenomic sequencing data. This tool is implemented into a user-friendly container that requires one command and generates a table consisting of taxonomical information for each microbe detected as well as normalized abundance, percent relative abundance, coverage and diversity calculations.

2021 ◽  
Author(s):  
Kelly A. Mulholland ◽  
Calvin L. Keeler

Abstract BackgroundThe complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals. Many microbiome studies characterize only the bacterial component, for which there are several well-developed sequencing methods, bioinformatics tools and databases available. The lack of comprehensive bioinformatics workflows and databases have limited efforts to characterize the other components existing in a microbiome. BiomeSeq is a tool for the analysis of the complete animal microbiome using metagenomic sequencing data. With its comprehensive workflow and customizable parameters and microbial databases, BiomeSeq can rapidly quantify the viral, fungal, bacteriophage and bacterial components of a sample and produce informative tables for analysis. ResultsSimulated datasets were constructed, which contained known abundances of microbial sequences, and several performance metrics were analyzed, including correlation of predicted abundance with known abundance, root mean square error and rate of speed. BiomeSeq demonstrated high precision (average of 99.52%) and sensitivity (average of 93.01%). BiomeSeq was employed in detecting and quantifying the respiratory microbiome of a commercial poultry broiler flock throughout its grow-out cycle from hatching to processing and successfully processed 780 million reads. For each microbial species detected, BiomeSeq calculated the normalized abundance, percent relative abundance, and coverage as well as the diversity for each sample. Rate of speed for each step in the pipeline, precision and accuracy were calculated to examine BiomeSeq’s performance using in silico sequencing datasets. When compared to bacterial results generated by the commonly used 16S rRNA sequencing method, BiomeSeq detected the same most abundant bacteria, including Gallibacterium, Corynebacterium and Staphylococcus, as well as several additional species. ConclusionsBiomeSeq provides for the detection and quantification of the microbiome from next-generation metagenomic sequencing data. This tool is implemented into a user-friendly container that requires one command and generates a table containing taxonomical information for each microbe detected. It also determines normalized abundance, percent relative abundance, genome coverage and sample diversity calculations for each sample.


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1338
Author(s):  
Morgan E. Meissner ◽  
Emily J. Julik ◽  
Jonathan P. Badalamenti ◽  
William G. Arndt ◽  
Lauren J. Mills ◽  
...  

Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.


2018 ◽  
Vol 57 (2) ◽  
Author(s):  
Qun Yan ◽  
Yu Mi Wi ◽  
Matthew J. Thoendel ◽  
Yash S. Raval ◽  
Kerryl E. Greenwood-Quaintance ◽  
...  

ABSTRACT We previously demonstrated that shotgun metagenomic sequencing can detect bacteria in sonicate fluid, providing a diagnosis of prosthetic joint infection (PJI). A limitation of the approach that we used is that data analysis was time-consuming and specialized bioinformatics expertise was required, both of which are barriers to routine clinical use. Fortunately, automated commercial analytic platforms that can interpret shotgun metagenomic data are emerging. In this study, we evaluated the CosmosID bioinformatics platform using shotgun metagenomic sequencing data derived from 408 sonicate fluid samples from our prior study with the goal of evaluating the platform vis-à-vis bacterial detection and antibiotic resistance gene detection for predicting staphylococcal antibacterial susceptibility. Samples were divided into a derivation set and a validation set, each consisting of 204 samples; results from the derivation set were used to establish cutoffs, which were then tested in the validation set for identifying pathogens and predicting staphylococcal antibacterial resistance. Metagenomic analysis detected bacteria in 94.8% (109/115) of sonicate fluid culture-positive PJIs and 37.8% (37/98) of sonicate fluid culture-negative PJIs. Metagenomic analysis showed sensitivities ranging from 65.7 to 85.0% for predicting staphylococcal antibacterial resistance. In conclusion, the CosmosID platform has the potential to provide fast, reliable bacterial detection and identification from metagenomic shotgun sequencing data derived from sonicate fluid for the diagnosis of PJI. Strategies for metagenomic detection of antibiotic resistance genes for predicting staphylococcal antibacterial resistance need further development.


2020 ◽  
Author(s):  
Stevenn Volant ◽  
Pierre Lechat ◽  
Perrine Woringer ◽  
Laurence Motreff ◽  
Christophe Malabat ◽  
...  

Abstract BackgroundComparing the composition of microbial communities among groups of interest (e.g., patients vs healthy individuals) is a central aspect in microbiome research. It typically involves sequencing, data processing, statistical analysis and graphical representation of the detected signatures. Such an analysis is normally obtained by using a set of different applications that require specific expertise for installation, data processing and in some case, programming skills. ResultsHere, we present SHAMAN, an interactive web application we developed in order to facilitate the use of (i) a bioinformatic workflow for metataxonomic analysis, (ii) a reliable statistical modelling and (iii) to provide among the largest panels of interactive visualizations as compared to the other options that are currently available. SHAMAN is specifically designed for non-expert users who may benefit from using an integrated version of the different analytic steps underlying a proper metagenomic analysis. The application is freely accessible at http://shaman.pasteur.fr/, and may also work as a standalone application with a Docker container (aghozlane/shaman), conda and R. The source code is written in R and is available at https://github.com/aghozlane/shaman. Using two datasets (a mock community sequencing and published 16S rRNA metagenomic data), we illustrate the strengths of SHAMAN in quickly performing a complete metataxonomic analysis. ConclusionsWe aim with SHAMAN to provide the scientific community with a platform that simplifies reproducible quantitative analysis of metagenomic data.


Viruses ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2006
Author(s):  
Anna Y Budkina ◽  
Elena V Korneenko ◽  
Ivan A Kotov ◽  
Daniil A Kiselev ◽  
Ilya V Artyushin ◽  
...  

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


2020 ◽  
Author(s):  
Maxence Queyrel ◽  
Edi Prifti ◽  
Jean-Daniel Zucker

AbstractAnalysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and are stored as fastq files. Conventional processing pipelines consist multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Recent studies have demonstrated that training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimentionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life datasets as well a simulated one, we demonstrated that this original approach reached very high performances, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Kory J Dees ◽  
Hyunmin Koo ◽  
J Fraser Humphreys ◽  
Joseph A Hakim ◽  
David K Crossman ◽  
...  

Abstract Background Although immunotherapy works well in glioblastoma (GBM) preclinical mouse models, the therapy has not demonstrated efficacy in humans. To address this anomaly, we developed a novel humanized microbiome (HuM) model to study the response to immunotherapy in a preclinical mouse model of GBM. Methods We used 5 healthy human donors for fecal transplantation of gnotobiotic mice. After the transplanted microbiomes stabilized, the mice were bred to generate 5 independent humanized mouse lines (HuM1-HuM5). Results Analysis of shotgun metagenomic sequencing data from fecal samples revealed a unique microbiome with significant differences in diversity and microbial composition among HuM1-HuM5 lines. All HuM mouse lines were susceptible to GBM transplantation, and exhibited similar median survival ranging from 19 to 26 days. Interestingly, we found that HuM lines responded differently to the immune checkpoint inhibitor anti-PD-1. Specifically, we demonstrate that HuM1, HuM4, and HuM5 mice are nonresponders to anti-PD-1, while HuM2 and HuM3 mice are responsive to anti-PD-1 and displayed significantly increased survival compared to isotype controls. Bray-Curtis cluster analysis of the 5 HuM gut microbial communities revealed that responders HuM2 and HuM3 were closely related, and detailed taxonomic comparison analysis revealed that Bacteroides cellulosilyticus was commonly found in HuM2 and HuM3 with high abundances. Conclusions The results of our study establish the utility of humanized microbiome mice as avatars to delineate features of the host interaction with gut microbial communities needed for effective immunotherapy against GBM.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi93-vi94
Author(s):  
Kory Dees ◽  
Hyunmin Koo ◽  
James Humphreys ◽  
Joseph Hakim ◽  
David Crossman ◽  
...  

Abstract Although immunotherapy works well in glioblastoma (GBM) pre-clinical mouse models, the therapy has unfortunately not demonstrated efficacy in humans. In melanoma and other cancers, the composition of the gut microbiome has been shown to determine responsiveness or resistance to immune checkpoint inhibitors (anti-PD-1). Most pre-clinical cancer studies have been done in mouse models using mouse gut microbiomes, but there are significant differences between mouse and human microbial gut compositions. To address this inconsistency, we developed a novel humanized microbiome (HuM) model to study the response to immunotherapy in a pre-clinical mouse model of GBM. We used five healthy human donors for fecal transplantation of gnotobiotic mice. After the transplanted microbiomes stabilized, the mice were bred to generate five independent humanized mouse lines (HuM1-HuM5). Analysis of shotgun metagenomic sequencing data from fecal samples revealed a unique microbiome with significant differences in diversity and microbial composition among HuM1-HuM5 lines. Interestingly, we found that the HuM lines responded differently to anti-PD-1. Specifically, we demonstrate that HuM2 and HuM3 mice are responsive to anti-PD-1 and displayed significantly increased survival compared to isotype controls, while HuM1, HuM4, and HuM5 mice are resistant to anti-PD-1. These mice are genetically identical, and only differ in the composition of the gut microbiome. In a correlative experiment, we found that disrupting the responder HuM2 microbiome with antibiotics abrogated the positive response to anti-PD-1, indicating that HuM2 microbiota must be present in the mice to elicit the positive response to anti-PD-1 in the GBM model. The question remains of whether the “responsive” microbial communities in HuM2 and HuM3 can be therapeutically exploited and applicable in other tumor models, or if the “resistant” microbial communities in HuM1, HuM4, and HuM5 can be depleted and/or replaced. Future studies will assess responder microbial transplants as a method of enhancing immunotherapy.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii232-ii232
Author(s):  
Kory Dees ◽  
Hyunmin Koo ◽  
J Fraser Humphreys ◽  
Joseph Hakim ◽  
David Crossman ◽  
...  

Abstract Although immunotherapy works well in glioblastoma (GBM) pre-clinical mouse models, the therapy has not demonstrated efficacy in GBM patients. Since recent studies have linked the gut microbial composition to the success with immunotherapy for other cancers, we utilized a novel humanized microbiome (HuM) model in order to study the response to immunotherapy in a pre-clinical mouse model of GBM. We used five healthy human donors for fecal transplantation of gnotobiotic mice since it is now recognized that microbe strain level differences render individual humans with a unique microbial community composition. After the transplanted microbiomes stabilized, the mice were bred to generate 5 independent humanized mouse lines (humanized microbiome HuM1-HuM5). Analysis of shotgun metagenomic sequencing data from fecal samples revealed a unique microbiome composition with significant differences in diversity and microbial composition among HuM1-HuM5 lines. We next analyzed the growth of intracranial glioma cells in the HuM lines. All HuM mouse lines were susceptible to GBM transplantation, and exhibited similar median survival ranging from 19-26 days. Interestingly, we found that HuM lines responded differently to the immune checkpoint inhibitor anti-PD-1. Specifically, we demonstrate that HuM1, HuM4, and HuM5 mice are non-responders to anti-PD-1 resulting in the death of the mice from the intracranial tumors, while HuM2 and HuM3 mice are responsive to anti-PD-1 and displayed significantly increased survival compared to isotype controls. Bray-Curtis cluster analysis of the 5 HuM gut microbial communities revealed that HuM2 and HuM3 were closely related. Detailed taxonomic comparison analysis at the top 5 across all HuM mouse lines revealed that Bacteroides cellulosilyticus was commonly found between HuM2 and HuM3 with high abundances. The results of our study establish the utility of humanized microbiome mice as avatars to delineate features of the host interaction with gut microbe communities needed for effective immunotherapy against GBM.


Sign in / Sign up

Export Citation Format

Share Document