Profiling microbial strains in urban environments using metagenomic sequencing data

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.

Download Full-text

PStrain: an iterative microbial strains profiling algorithm for shotgun metagenomic sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa1056 ◽

2020 ◽

Author(s):

Shuai Wang ◽

Yiqi Jiang ◽

Shuaicheng Li

Keyword(s):

Optimization Method ◽

Supplementary Information ◽

Marker Genes ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Shotgun Metagenomic Sequencing ◽

Genotype Frequencies ◽

Microbial Strains ◽

And Control ◽

First Time

Abstract Motivation The microbial community plays an essential role in human diseases and physiological activities. The functions of microbes can differ due to strain-level differences in the genome sequences. Shotgun metagenomic sequencing allows us to profile the strains in microbial communities practically. However, current methods are underdeveloped due to the highly similar sequences among strains. We observe that strains genotypes at the same single nucleotide variant (SNV) locus can be speculated by the genotype frequencies. Also, the variants in different loci covered by the same reads can provide evidence that they reside on the same strain. Results These insights inspire us to design PStrain, an optimization method that utilizes genotype frequencies and the reads which cover multiple SNV loci to profile strains iteratively based on SNVs in a set of MetaPhlAn2 marker genes. Compared to the state-of-art methods, PStrain, on average, improved the performance of inferring strains abundances and genotypes by 87.75% and 59.45%, respectively. We have applied the PStrain package to the dataset with two cohorts of colorectal cancer (CRC) and found that the sequences of Bacteroides coprocola strains are significantly different between CRC and control samples, which is the first time to report the potential role of B.coprocola in the gut microbiota of CRC. Availabilityand implementation https://github.com/wshuai294/PStrain. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Characterizing and Evaluating the Zoonotic Potential of Novel Viruses Discovered in Vampire Bats

Viruses ◽

10.3390/v13020252 ◽

2021 ◽

Vol 13 (2) ◽

pp. 252

Author(s):

Laura M. Bergner ◽

Nardus Mollentze ◽

Richard J. Orton ◽

Carlos Tello ◽

Alice Broos ◽

...

Keyword(s):

Machine Learning ◽

Phylogenetic Analyses ◽

Human Infection ◽

Machine Learning Algorithms ◽

Zoonotic Potential ◽

Metagenomic Sequencing ◽

Learning Models ◽

Sequencing Data ◽

Vampire Bats ◽

Machine Learning Models

The contemporary surge in metagenomic sequencing has transformed knowledge of viral diversity in wildlife. However, evaluating which newly discovered viruses pose sufficient risk of infecting humans to merit detailed laboratory characterization and surveillance remains largely speculative. Machine learning algorithms have been developed to address this imbalance by ranking the relative likelihood of human infection based on viral genome sequences, but are not yet routinely applied to viruses at the time of their discovery. Here, we characterized viral genomes detected through metagenomic sequencing of feces and saliva from common vampire bats (Desmodus rotundus) and used these data as a case study in evaluating zoonotic potential using molecular sequencing data. Of 58 detected viral families, including 17 which infect mammals, the only known zoonosis detected was rabies virus; however, additional genomes were detected from the families Hepeviridae, Coronaviridae, Reoviridae, Astroviridae and Picornaviridae, all of which contain human-infecting species. In phylogenetic analyses, novel vampire bat viruses most frequently grouped with other bat viruses that are not currently known to infect humans. In agreement, machine learning models built from only phylogenetic information ranked all novel viruses similarly, yielding little insight into zoonotic potential. In contrast, genome composition-based machine learning models estimated different levels of zoonotic potential, even for closely related viruses, categorizing one out of four detected hepeviruses and two out of three picornaviruses as having high priority for further research. We highlight the value of evaluating zoonotic potential beyond ad hoc consideration of phylogeny and provide surveillance recommendations for novel viruses in a wildlife host which has frequent contact with humans and domestic animals.

Download Full-text

METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs

BMC Bioinformatics ◽

10.1186/s12859-021-04284-4 ◽

2021 ◽

Vol 22 (S10) ◽

Author(s):

Zhenmiao Zhang ◽

Lu Zhang

Keyword(s):

De Novo ◽

Label Propagation ◽

Next Generation Sequencing Data ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Fecal Samples ◽

Microbial Genomes ◽

Metagenome Assembly ◽

High Chance ◽

Mock Communities

Abstract Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.

Download Full-text

SCAPP: an algorithm for improved plasmid assembly in metagenomes

Microbiome ◽

10.1186/s40168-021-01068-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

David Pellow ◽

Alvah Zorea ◽

Maraike Probst ◽

Ori Furman ◽

Arik Segal ◽

...

Keyword(s):

Bacterial Species ◽

Bacterial Genome ◽

Biological Knowledge ◽

Assessment Procedure ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Human Gut ◽

Double Stranded Dna ◽

Wide Range ◽

Python Package

Abstract Background Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples. Results We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)—an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets. Conclusions SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP.

Download Full-text

Evaluation of the CosmosID Bioinformatics Platform for Prosthetic Joint-Associated Sonicate Fluid Shotgun Metagenomic Data Analysis

Journal of Clinical Microbiology ◽

10.1128/jcm.01182-18 ◽

2018 ◽

Vol 57 (2) ◽

Cited By ~ 8

Author(s):

Qun Yan ◽

Yu Mi Wi ◽

Matthew J. Thoendel ◽

Yash S. Raval ◽

Kerryl E. Greenwood-Quaintance ◽

...

Keyword(s):

Antibiotic Resistance ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Antibacterial Resistance ◽

Sequencing Data ◽

Bacterial Detection ◽

Shotgun Metagenomic Sequencing ◽

Prosthetic Joint ◽

Validation Set ◽

Fluid Culture

ABSTRACT We previously demonstrated that shotgun metagenomic sequencing can detect bacteria in sonicate fluid, providing a diagnosis of prosthetic joint infection (PJI). A limitation of the approach that we used is that data analysis was time-consuming and specialized bioinformatics expertise was required, both of which are barriers to routine clinical use. Fortunately, automated commercial analytic platforms that can interpret shotgun metagenomic data are emerging. In this study, we evaluated the CosmosID bioinformatics platform using shotgun metagenomic sequencing data derived from 408 sonicate fluid samples from our prior study with the goal of evaluating the platform vis-à-vis bacterial detection and antibiotic resistance gene detection for predicting staphylococcal antibacterial susceptibility. Samples were divided into a derivation set and a validation set, each consisting of 204 samples; results from the derivation set were used to establish cutoffs, which were then tested in the validation set for identifying pathogens and predicting staphylococcal antibacterial resistance. Metagenomic analysis detected bacteria in 94.8% (109/115) of sonicate fluid culture-positive PJIs and 37.8% (37/98) of sonicate fluid culture-negative PJIs. Metagenomic analysis showed sensitivities ranging from 65.7 to 85.0% for predicting staphylococcal antibacterial resistance. In conclusion, the CosmosID platform has the potential to provide fast, reliable bacterial detection and identification from metagenomic shotgun sequencing data derived from sonicate fluid for the diagnosis of PJI. Strategies for metagenomic detection of antibiotic resistance genes for predicting staphylococcal antibacterial resistance need further development.

Download Full-text

Towards end-to-end disease prediction from raw metagenomic data

10.1101/2020.10.29.360297 ◽

2020 ◽

Author(s):

Maxence Queyrel ◽

Edi Prifti ◽

Jean-Daniel Zucker

Keyword(s):

Dna Sequences ◽

Real Life ◽

Multiple Instance Learning ◽

Disease Classification ◽

Metagenomic Data ◽

Numerical Representation ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

End To End ◽

Bioinformatics Workflows

AbstractAnalysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and are stored as fastq files. Conventional processing pipelines consist multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Recent studies have demonstrated that training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimentionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life datasets as well a simulated one, we demonstrated that this original approach reached very high performances, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Download Full-text

Human gut microbial communities dictate efficacy of anti-PD-1 therapy in a humanized microbiome mouse model of glioma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab023 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Kory J Dees ◽

Hyunmin Koo ◽

J Fraser Humphreys ◽

Joseph A Hakim ◽

David K Crossman ◽

...

Keyword(s):

Mouse Model ◽

Microbial Communities ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Microbial Composition ◽

Healthy Human ◽

Fecal Transplantation ◽

Host Interaction ◽

Shotgun Metagenomic Sequencing ◽

Mouse Lines

Abstract Background Although immunotherapy works well in glioblastoma (GBM) preclinical mouse models, the therapy has not demonstrated efficacy in humans. To address this anomaly, we developed a novel humanized microbiome (HuM) model to study the response to immunotherapy in a preclinical mouse model of GBM. Methods We used 5 healthy human donors for fecal transplantation of gnotobiotic mice. After the transplanted microbiomes stabilized, the mice were bred to generate 5 independent humanized mouse lines (HuM1-HuM5). Results Analysis of shotgun metagenomic sequencing data from fecal samples revealed a unique microbiome with significant differences in diversity and microbial composition among HuM1-HuM5 lines. All HuM mouse lines were susceptible to GBM transplantation, and exhibited similar median survival ranging from 19 to 26 days. Interestingly, we found that HuM lines responded differently to the immune checkpoint inhibitor anti-PD-1. Specifically, we demonstrate that HuM1, HuM4, and HuM5 mice are nonresponders to anti-PD-1, while HuM2 and HuM3 mice are responsive to anti-PD-1 and displayed significantly increased survival compared to isotype controls. Bray-Curtis cluster analysis of the 5 HuM gut microbial communities revealed that responders HuM2 and HuM3 were closely related, and detailed taxonomic comparison analysis revealed that Bacteroides cellulosilyticus was commonly found in HuM2 and HuM3 with high abundances. Conclusions The results of our study establish the utility of humanized microbiome mice as avatars to delineate features of the host interaction with gut microbial communities needed for effective immunotherapy against GBM.

Download Full-text

Yoghurt Consumption is Associated With Transient Changes in the Composition of the Human Gut Microbiome

10.21203/rs.3.rs-99718/v1 ◽

2020 ◽

Author(s):

Caroline Ivanne Le Roy ◽

Alexander Kurilshikov ◽

Emily Leeming ◽

Alessia Visconti ◽

Ruth Bowyer ◽

...

Keyword(s):

16S Rrna ◽

Gut Microbiota ◽

Visceral Fat ◽

Gut Microbiome ◽

Body Weight Gain ◽

Healthy Eating Index ◽

Rrna Gene ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Shotgun Metagenomic Sequencing

Abstract Background: Yoghurt contains live bacteria that could contribute via modulation of the gut microbiota to its reported beneficial effects such as reduced body weight gain and lower incidence of type 2 diabetes. To date, the association between yoghurt consumption and the composition of the gut microbiota is underexplored. Here we used clinical variables, metabolomics, 16S rRNA and shotgun metagenomic sequencing data collected on over 1000 predominantly female UK twins to define the link between the gut microbiota and yoghurt-associated health benefits. Results: According to food frequency questionnaires (FFQ), 73% of subjects consumed yoghurt. Consumers presented a healthier diet pattern (healthy eating index: beta = 2.17±0.34; P = 2.72x10-10) and improved metabolic health characterised by reduced visceral fat (beta = -28.18±11.71 g; P = 0.01). According to 16S rRNA gene analyses and whole shotgun metagenomic sequencing approach consistent taxonomic variations were observed with yoghurt consumption. More specifically, we identified higher abundance of species used as yoghurt starters Streptococcus thermophilus (beta = 0.41±0.051; P = 6.14x10-12) and sometimes added Bifidobacterium animalis subsp. lactis (beta = 0.30±0.052; P = 1.49x10-8) in the gut of yoghurt consumers. Replication in 1103 volunteers from the LifeLines-DEEP cohort confirmed the increase of S. thermophilus among yoghurt consumers. Using food records collected the day prior to faecal sampling we showed that increase in these two yoghurt bacteria could be transient. Metabolomics analysis revealed that B. animalis subsp. lactis was associated with 13 faecal metabolites including a 3-hydroxyoctanoic acid, known to be involved in the regulation of gut inflammation.Conclusions: Yoghurt consumption is associated with reduced visceral fat mass and changes in gut microbiome including transient increase of yoghurt-contained species (i.e. S. thermophilus and B. lactis).

Download Full-text

deFUME: Dynamic exploration of functional metagenomic sequencing data

BMC Research Notes ◽

10.1186/s13104-015-1281-y ◽

2015 ◽

Vol 8 (1) ◽

Cited By ~ 5

Author(s):

Eric van der Helm ◽

Henrik Marcus Geertz-Hansen ◽

Hans Jasper Genee ◽

Sailesh Malla ◽

Morten Otto Alexander Sommer

Keyword(s):

Metagenomic Sequencing ◽

Sequencing Data

Download Full-text