Metagenome Mining Reveals Hidden Genomic Diversity of Pelagimyophages in Aquatic Environments

ABSTRACT The SAR11 clade is one of the most abundant bacterioplankton groups in surface waters of most of the oceans and lakes. However, only 15 SAR11 phages have been isolated thus far, and only one of them belongs to the Myoviridae family (pelagimyophages). Here, we have analyzed 26 sequences of myophages that putatively infect the SAR11 clade. They have been retrieved by mining ca. 45 Gbp aquatic assembled cellular metagenomes and viromes. Most of the myophages were obtained from the cellular fraction (0.2 μm), indicating a bias against this type of virus in viromes. We have found the first myophages that putatively infect Candidatus Fonsibacter (freshwater SAR11) and another group putatively infecting bathypelagic SAR11 phylogroup Ic. The genomes have similar sizes and maintain overall synteny in spite of low average nucleotide identity values, revealing high similarity to marine cyanomyophages. Pelagimyophages recruited metagenomic reads widely from several locations but always much more from cellular metagenomes than from viromes, opposite to what happens with pelagipodophages. Comparing the genomes resulted in the identification of a hypervariable island that is related to host recognition. Interestingly, some genes in these islands could be related to host cell wall synthesis and coinfection avoidance. A cluster of curli-related proteins was widespread among the genomes, although its function is unclear. IMPORTANCE SAR11 clade members are among the most abundant bacteria on Earth. Their study is complicated by their great diversity and difficulties in being grown and manipulated in the laboratory. On the other hand, and due to their extraordinary abundance, metagenomic data sets provide enormous richness of information about these microbes. Given the major role played by phages in the lifestyle and evolution of prokaryotic cells, the contribution of several new bacteriophage genomes preying on this clade opens windows into the infection strategies and life cycle of its viruses. Such strategies could provide models of attack of large-genome phages preying on streamlined aquatic microbes.

Download Full-text

Population genomics and antimicrobial resistance dynamics of Escherichia coli in wastewater and river environments

Communications Biology ◽

10.1038/s42003-021-01949-x ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Jose F. Delgado-Blas ◽

Cristina M. Ovejero ◽

Sophia David ◽

Natalia Montero ◽

William Calero-Caceres ◽

...

Keyword(s):

Escherichia Coli ◽

Antimicrobial Resistance ◽

Population Genomics ◽

Population Diversity ◽

Genomic Diversity ◽

Resistant Bacteria ◽

Aquatic Environments ◽

E Coli ◽

Water Ecosystems ◽

Sequence Types

AbstractAquatic environments are key niches for the emergence, evolution and dissemination of antimicrobial resistance. However, the population diversity and the genetic elements that drive the dynamics of resistant bacteria in different aquatic environments are still largely unknown. The aim of this study was to understand the population genomics and evolutionary events of Escherichia coli resistant to clinically important antibiotics including aminoglycosides, in anthropogenic and natural water ecosystems. Here we show that less different E. coli sequence types (STs) are identified in wastewater than in rivers, albeit more resistant to antibiotics, and with significantly more plasmids/cell (6.36 vs 3.72). However, the genomic diversity within E. coli STs in both aquatic environments is similar. Wastewater environments favor the selection of conserved chromosomal structures associated with diverse flexible plasmids, unraveling promiscuous interplasmidic resistance genes flux. On the contrary, the key driver for river E. coli adaptation is a mutable chromosome along with few plasmid types shared between diverse STs harboring a limited resistance gene content.

Download Full-text

Screening Metagenomic Data for Viruses Using the E-Probe Diagnostic Nucleic Acid Assay

Phytopathology ◽

10.1094/phyto-11-13-0310-r ◽

2014 ◽

Vol 104 (10) ◽

pp. 1125-1129 ◽

Cited By ~ 11

Author(s):

A. H. Stobbe ◽

W. L. Schneider ◽

P. R. Hoyt ◽

U. Melcher

Keyword(s):

Nucleic Acid ◽

Mosaic Virus ◽

Yellow Mosaic Virus ◽

Metagenomic Data ◽

Data Sets ◽

Virus Species ◽

Data Set ◽

Golden Yellow ◽

Nucleic Acid Assay ◽

Ngs Data

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.

Download Full-text

Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level

mSystems ◽

10.1128/msystems.00943-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gongchao Jing ◽

Lu Liu ◽

Zengbin Wang ◽

Yufeng Zhang ◽

Li Qian ◽

...

Keyword(s):

Big Data ◽

User Interface ◽

Search Engine ◽

Functional Similarity ◽

Metagenomic Data ◽

Data Sets ◽

Data Space ◽

Link Type ◽

Database Platform ◽

Microbiome Data

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

METAnnotatorX2: a Comprehensive Tool for Deep and Shallow Metagenomic Data Set Analyses

mSystems ◽

10.1128/msystems.00583-21 ◽

2021 ◽

Vol 6 (3) ◽

Author(s):

Christian Milani ◽

Gabriele Andrea Lugli ◽

Federico Fontana ◽

Leonardo Mancabelli ◽

Giulia Alessandri ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Metagenomic Data ◽

Data Sets ◽

Data Set

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.

Download Full-text

Mumame: a software tool for quantifying gene-specific point-mutations in shotgun metagenomic data

Metabarcoding and Metagenomics ◽

10.3897/mbmg.3.36236 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Shruthi Magesh ◽

Viktor Jonsson ◽

Johan Bengtsson-Palme

Keyword(s):

Microbial Communities ◽

Point Mutations ◽

Software Tool ◽

Metagenomic Data ◽

Data Sets ◽

Resistance Mutations ◽

Shotgun Metagenomics ◽

Key Factor ◽

Detection Of Mutations ◽

And Function

Metagenomics has emerged as a central technique for studying the structure and function of microbial communities. Often the functional analysis is restricted to classification into broad functional categories. However, important phenotypic differences, such as resistance to antibiotics, are often the result of just one or a few point mutations in otherwise identical sequences. Bioinformatic methods for metagenomic analysis have generally been poor at accounting for this fact, resulting in a somewhat limited picture of important aspects of microbial communities. Here, we address this problem by providing a software tool called Mumame, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances. We demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets. We also identified that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than for most other applications of shotgun metagenomics. Mumame is freely available online (http://microbiology.se/software/mumame).

Download Full-text

Diversity of Virophages in Metagenomic Data Sets

Journal of Virology ◽

10.1128/jvi.03398-12 ◽

2013 ◽

Vol 87 (8) ◽

pp. 4225-4236 ◽

Cited By ~ 58

Author(s):

J. Zhou ◽

W. Zhang ◽

S. Yan ◽

J. Xiao ◽

Y. Zhang ◽

...

Keyword(s):

Metagenomic Data ◽

Data Sets

Download Full-text

New insights on cytostatic drug risk assessment in aquatic environments based on measured concentrations in surface waters

Environment International ◽

10.1016/j.envint.2019.105236 ◽

2019 ◽

Vol 133 ◽

pp. 105236 ◽

Cited By ~ 3

Author(s):

Teresa I.A. Gouveia ◽

Arminda Alves ◽

Mónica S.F. Santos

Keyword(s):

Risk Assessment ◽

Surface Waters ◽

Cytostatic Drug ◽

Aquatic Environments ◽

Drug Risk

Download Full-text

Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China

Journal of Virology ◽

10.1128/jvi.00149-20 ◽

2020 ◽

Vol 94 (11) ◽

Cited By ~ 1

Author(s):

Shengzhong Xu ◽

Liang Zhou ◽

Xiaosha Liang ◽

Yifan Zhou ◽

Hao Chen ◽

...

Keyword(s):

Green Algae ◽

Green Alga ◽

Phylogenetic Analyses ◽

Freshwater Lake ◽

Metagenomic Data ◽

Data Sets ◽

Data Set ◽

Dna Viruses ◽

Oligonucleotide Frequency ◽

Dsdna Viruses

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

Download Full-text

Estimating coverage in metagenomic data sets and why it matters

The ISME Journal ◽

10.1038/ismej.2014.76 ◽

2014 ◽

Vol 8 (11) ◽

pp. 2349-2351 ◽

Cited By ~ 69

Author(s):

Luis M Rodriguez-R ◽

Konstantinos T Konstantinidis

Keyword(s):

Metagenomic Data ◽

Data Sets

Download Full-text