Diversity of Virophages in Metagenomic Data Sets

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.

Download Full-text

Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level

mSystems ◽

10.1128/msystems.00943-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gongchao Jing ◽

Lu Liu ◽

Zengbin Wang ◽

Yufeng Zhang ◽

Li Qian ◽

...

Keyword(s):

Big Data ◽

User Interface ◽

Search Engine ◽

Functional Similarity ◽

Metagenomic Data ◽

Data Sets ◽

Data Space ◽

Link Type ◽

Database Platform ◽

Microbiome Data

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

METAnnotatorX2: a Comprehensive Tool for Deep and Shallow Metagenomic Data Set Analyses

mSystems ◽

10.1128/msystems.00583-21 ◽

2021 ◽

Vol 6 (3) ◽

Author(s):

Christian Milani ◽

Gabriele Andrea Lugli ◽

Federico Fontana ◽

Leonardo Mancabelli ◽

Giulia Alessandri ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Metagenomic Data ◽

Data Sets ◽

Data Set

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.

Download Full-text

Mumame: a software tool for quantifying gene-specific point-mutations in shotgun metagenomic data

Metabarcoding and Metagenomics ◽

10.3897/mbmg.3.36236 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Shruthi Magesh ◽

Viktor Jonsson ◽

Johan Bengtsson-Palme

Keyword(s):

Microbial Communities ◽

Point Mutations ◽

Software Tool ◽

Metagenomic Data ◽

Data Sets ◽

Resistance Mutations ◽

Shotgun Metagenomics ◽

Key Factor ◽

Detection Of Mutations ◽

And Function

Metagenomics has emerged as a central technique for studying the structure and function of microbial communities. Often the functional analysis is restricted to classification into broad functional categories. However, important phenotypic differences, such as resistance to antibiotics, are often the result of just one or a few point mutations in otherwise identical sequences. Bioinformatic methods for metagenomic analysis have generally been poor at accounting for this fact, resulting in a somewhat limited picture of important aspects of microbial communities. Here, we address this problem by providing a software tool called Mumame, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances. We demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets. We also identified that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than for most other applications of shotgun metagenomics. Mumame is freely available online (http://microbiology.se/software/mumame).

Download Full-text

Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China

Journal of Virology ◽

10.1128/jvi.00149-20 ◽

2020 ◽

Vol 94 (11) ◽

Cited By ~ 1

Author(s):

Shengzhong Xu ◽

Liang Zhou ◽

Xiaosha Liang ◽

Yifan Zhou ◽

Hao Chen ◽

...

Keyword(s):

Green Algae ◽

Green Alga ◽

Phylogenetic Analyses ◽

Freshwater Lake ◽

Metagenomic Data ◽

Data Sets ◽

Data Set ◽

Dna Viruses ◽

Oligonucleotide Frequency ◽

Dsdna Viruses

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

Download Full-text

Estimating coverage in metagenomic data sets and why it matters

The ISME Journal ◽

10.1038/ismej.2014.76 ◽

2014 ◽

Vol 8 (11) ◽

pp. 2349-2351 ◽

Cited By ~ 69

Author(s):

Luis M Rodriguez-R ◽

Konstantinos T Konstantinidis

Keyword(s):

Metagenomic Data ◽

Data Sets

Download Full-text

Deep Sequencing of a Dimethylsulfoniopropionate-Degrading Gene (dmdA) by Using PCR Primer Pairs Designed on the Basis of Marine Metagenomic Data

Applied and Environmental Microbiology ◽

10.1128/aem.01258-09 ◽

2009 ◽

Vol 76 (2) ◽

pp. 609-617 ◽

Cited By ~ 41

Author(s):

Vanessa A. Varaljay ◽

Erinn C. Howard ◽

Shulei Sun ◽

Mary Ann Moran

Keyword(s):

Sequence Data ◽

Gene Clusters ◽

Amino Acid Identity ◽

Metagenomic Data ◽

Data Sets ◽

Free Living ◽

Metagenomic Sequence ◽

Primer Sets ◽

Design And Testing ◽

Pcr Primer

ABSTRACT In silico design and testing of environmental primer pairs with metagenomic data are beneficial for capturing a greater proportion of the natural sequence heterogeneity in microbial functional genes, as well as for understanding limitations of existing primer sets that were designed from more restricted sequence data. PCR primer pairs targeting 10 environmental clades and subclades of the dimethylsulfoniopropionate (DMSP) demethylase protein, DmdA, were designed using an iterative bioinformatic approach that took advantage of thousands of dmdA sequences captured in marine metagenomic data sets. Using the bioinformatically optimized primers, dmdA genes were amplified from composite free-living coastal bacterioplankton DNA (from 38 samples over 5 years and two locations) and sequenced using 454 technology. An average of 6,400 amplicons per primer pair represented more than 700 clusters of environmental dmdA sequences across all primers, with clusters defined conservatively at >90% nucleotide sequence identity (∼95% amino acid identity). Degenerate and inosine-based primers did not perform better than specific primer pairs in determining dmdA richness and sometimes captured a lower degree of richness of sequences from the same DNA sample. A comparison of dmdA sequences in free-living versus particle-associated bacteria in southeastern U.S. coastal waters showed that sequence richness in some dmdA subgroups differed significantly between size fractions, though most gene clusters were shared (52 to 91%) and most sequences were affiliated with the shared clusters (∼90%). The availability of metagenomic sequence data has significantly enhanced the design of quantitative PCR primer pairs for this key functional gene, providing robust access to the capabilities and activities of DMSP demethylating bacteria in situ.

Download Full-text