Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity

ABSTRACT Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.

Download Full-text

Screening Metagenomic Data for Viruses Using the E-Probe Diagnostic Nucleic Acid Assay

Phytopathology ◽

10.1094/phyto-11-13-0310-r ◽

2014 ◽

Vol 104 (10) ◽

pp. 1125-1129 ◽

Cited By ~ 11

Author(s):

A. H. Stobbe ◽

W. L. Schneider ◽

P. R. Hoyt ◽

U. Melcher

Keyword(s):

Nucleic Acid ◽

Mosaic Virus ◽

Yellow Mosaic Virus ◽

Metagenomic Data ◽

Data Sets ◽

Virus Species ◽

Data Set ◽

Golden Yellow ◽

Nucleic Acid Assay ◽

Ngs Data

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.

Download Full-text

METAnnotatorX2: a Comprehensive Tool for Deep and Shallow Metagenomic Data Set Analyses

mSystems ◽

10.1128/msystems.00583-21 ◽

2021 ◽

Vol 6 (3) ◽

Author(s):

Christian Milani ◽

Gabriele Andrea Lugli ◽

Federico Fontana ◽

Leonardo Mancabelli ◽

Giulia Alessandri ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Metagenomic Data ◽

Data Sets ◽

Data Set

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.

Download Full-text

PSII-B-28 Investigation of the presence of bacterial microbiota in 12-week-old bovine fetuses

Journal of Animal Science ◽

10.1093/jas/skab235.646 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 352-352

Author(s):

Samat Amat ◽

Devin B Holman ◽

Kaycie Schmidt ◽

kacie L L McCarthy ◽

Sheri T T Dorsam ◽

...

Keyword(s):

Microbial Community ◽

Amniotic Fluid ◽

Shannon Index ◽

Rrna Gene ◽

Sample Type ◽

Microscopy Imaging ◽

Tissue Samples ◽

Early Gestation ◽

Bacterial Phyla ◽

Bovine Fetuses

Abstract A recent study reported the existence of a diverse microbiota in 5-to-7-month-old calf fetuses, suggesting that colonization of the bovine gut with so-called “pioneer” microbiota may begin during mid-gestation. In the present study, we investigated the microbiota in bovine fetuses at early gestation. Amniotic and allantoic fluids, and intestinal and placental (cotyledon) tissue samples harvested from fetuses (n = 33) on day 83 of gestation were processed for the assessment of fetal microbiota using 16S rRNA gene sequencing. The sequencing results revealed that a diverse and complex microbial community was present in allantoic and amniotic fluids, and fetal intestine and placenta on day 83 of gestation in beef cattle. Microbial community structure was significantly different between allantoic and amniotic fluid, and intestinal and placental microbiota (0.047 ≥ R2 ≥ 0.019, P ≤ 0.031). Allantoic fluid had a greater (P < 0.05) microbial richness (number of OTUs) (122 ± 10) compared to amniotic fluid (84 ± 6), intestine (63 ± 7) and placenta (66 ± 6). Microbial diversity (Shannon index) was similar for the intestinal and placental samples, and both were less diverse compared to the fetal fluid microbiota (P < 0.05). At the phylum level, 39 different archaeal and bacterial phyla were detected across all fetal samples, with Proteobacteria (55%), Firmicutes (16.2%), Actinobacteria (13.6%) and Bacteroidetes (5%) predominating. Among the 20 most relatively abundant bacterial genera, Acidovorax, Acinetobacter, Brucella, Corynebacterium, Enterococcus, Exiguobacterium and Stenotrophomonas differed by fetal sample type (P < 0.05). A total of 55 taxa were shared among the four different microbial communities. qPCR of bacteria in the intestine and placenta samples as well as scanning electron microscopy imaging of fetal fluids provided additional evidence for the presence of a microbiota in these samples. Overall, the results of this study indicate that colonization with pioneer microbiota may occur during early gestation in bovine fetuses.

Download Full-text

Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China

Journal of Virology ◽

10.1128/jvi.00149-20 ◽

2020 ◽

Vol 94 (11) ◽

Cited By ~ 1

Author(s):

Shengzhong Xu ◽

Liang Zhou ◽

Xiaosha Liang ◽

Yifan Zhou ◽

Hao Chen ◽

...

Keyword(s):

Green Algae ◽

Green Alga ◽

Phylogenetic Analyses ◽

Freshwater Lake ◽

Metagenomic Data ◽

Data Sets ◽

Data Set ◽

Dna Viruses ◽

Oligonucleotide Frequency ◽

Dsdna Viruses

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

Download Full-text

Crosstalk Between Culturomics and Microbial Profiling of Egyptian Mongoose (Herpestes ichneumon) Gut Microbiome

Microorganisms ◽

10.3390/microorganisms8060808 ◽

2020 ◽

Vol 8 (6) ◽

pp. 808 ◽

Cited By ~ 1

Author(s):

André C. Pereira ◽

Victor Bandeira ◽

Carlos Fonseca ◽

Mónica V. Cunha

Keyword(s):

Microbial Community ◽

Gut Microbiota ◽

Microbial Community Composition ◽

Shannon Index ◽

Data Series ◽

Independent Sequence ◽

Microbial Profiling ◽

Culture Independent ◽

Egyptian Mongoose ◽

Herpestes Ichneumon

Recently, we unveiled taxonomical and functional differences in Egyptian mongoose (Herpestes ichneumon) gut microbiota across sex and age classes by microbial profiling. In this study, we generate, through culturomics, extended baseline information on the culturable bacterial and fungal microbiome of the species using the same specimens as models. Firstly, this strategy enabled us to explore cultivable microbial community differences across sexes and to ascertain the influence exerted by biological and environmental contexts of each host in its microbiota signature. Secondly, it permitted us to compare the culturomics and microbial profiling approaches and their ability to provide information on mongoose gut microbiota. In agreement with microbial profiling, culturomics showed that the core gut cultivable microbiota of the mongoose is dominated by Firmicutes and, as previously found, is able to distinguish sex- and age class-specific genera. Additional information could be obtained by culturomics, with six new genera unveiled. Richness indices and the Shannon index were concordant between culture-dependent and culture-independent approaches, highlighting significantly higher values when using microbial profiling. However, the Simpson index underlined higher values for the culturomics-generated data. These contrasting results were due to a differential influence of dominant and rare taxa on those indices. Beta diversity analyses of culturable microbiota showed similarities between adults and juveniles, but not in the data series originated from microbial profiling. Additionally, whereas the microbial profiling indicated that there were several bioenvironmental features related to the bacterial gut microbiota of the Egyptian mongoose, a clear association between microbiota and bioenvironmental features could not be established through culturomics. The discrepancies found between the data generated by the two methodologies and the underlying inferences, both in terms of β-diversity and role of bioenvironmental features, confirm that culture-independent, sequence-based methods have a higher ability to assess, at a fine scale, the influence of abiotic and biotic factors on the microbial community composition of mongoose’ gut. However, when used in a complementary perspective, this knowledge can be expanded by culturomics.

Download Full-text

Vertebrate Decomposition Is Accelerated by Soil Microbes

Applied and Environmental Microbiology ◽

10.1128/aem.00957-14 ◽

2014 ◽

Vol 80 (16) ◽

pp. 4920-4929 ◽

Cited By ~ 53

Author(s):

Christian L. Lauber ◽

Jessica L. Metcalf ◽

Kyle Keepers ◽

Gail Ackermann ◽

David O. Carter ◽

...

Keyword(s):

Microbial Community ◽

Microbial Communities ◽

Soil Microbes ◽

Soil Microbial Communities ◽

Natural Phenomenon ◽

Rrna Gene ◽

Data Set ◽

Soil Microbial ◽

Decomposer Community ◽

Carrion Decomposition

ABSTRACTCarrion decomposition is an ecologically important natural phenomenon influenced by a complex set of factors, including temperature, moisture, and the activity of microorganisms, invertebrates, and scavengers. The role of soil microbes as decomposers in this process is essential but not well understood and represents a knowledge gap in carrion ecology. To better define the role and sources of microbes in carrion decomposition, lab-reared mice were decomposed on either (i) soil with an intact microbial community or (ii) soil that was sterilized. We characterized the microbial community (16S rRNA gene for bacteria and archaea, and the 18S rRNA gene for fungi and microbial eukaryotes) for three body sites along with the underlying soil (i.e., gravesoils) at time intervals coinciding with visible changes in carrion morphology. Our results indicate that mice placed on soil with intact microbial communities reach advanced stages of decomposition 2 to 3 times faster than those placed on sterile soil. Microbial communities associated with skin and gravesoils of carrion in stages of active and advanced decay were significantly different between soil types (sterile versus untreated), suggesting that substrates on which carrion decompose may partially determine the microbial decomposer community. However, the source of the decomposer community (soil- versus carcass-associated microbes) was not clear in our data set, suggesting that greater sequencing depth needs to be employed to identify the origin of the decomposer communities in carrion decomposition. Overall, our data show that soil microbial communities have a significant impact on the rate at which carrion decomposes and have important implications for understanding carrion ecology.

Download Full-text

The International Soil Moisture Network: serving Earth system science for over a decade

10.5194/hess-2021-2 ◽

2021 ◽

Author(s):

Wouter Dorigo ◽

Irene Himmelbauer ◽

Daniel Aberer ◽

Lukas Schremmer ◽

Ivana Petrakovic ◽

...

Keyword(s):

Quality Control ◽

Soil Moisture ◽

European Space Agency ◽

Data Repository ◽

Reference Database ◽

Data Sets ◽

Scientific Publications ◽

Data Set ◽

Space Agency

Abstract. In 2009, the International Soil Moisture Network (ISMN) was initiated as a community effort, funded by the European Space Agency, to serve as a centralised data hosting facility for globally available in situ soil moisture measurements (Dorigo et al., 2011a, b). The ISMN brings together in situ soil moisture measurements collected and freely shared by a multitude of organisations, harmonizes them in terms of units and sampling rates, applies advanced quality control, and stores them in a database. Users can freely retrieve the data from this database through an online web portal (https://ismn.earth). Meanwhile, the ISMN has evolved into the primary in situ soil moisture reference database worldwide, as evidenced by more than 3000 active users and over 1000 scientific publications referencing the data sets provided by the network. As of December 2020, the ISMN now contains data of 65 networks and 2678 stations located all over the globe, with a time period spanning from 1952 to present.The number of networks and stations covered by the ISMN is still growing and many of the data sets contained in the database continue to be updated. The main scope of this paper is to inform readers about the evolution of the ISMN over the past decade,including a description of network and data set updates and quality control procedures. A comprehensive review of existing literature making use of ISMN data is also provided in order to identify current limitations in functionality and data usage, and to shape priorities for the next decade of operations of this unique community-based data repository.

Download Full-text

TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution

mSphere ◽

10.1128/msphere.00327-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 25

Author(s):

Robin R. Rohwer ◽

Joshua J. Hamilton ◽

Ryan J. Newton ◽

Katherine D. McMahon

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Microbial Community Composition ◽

Taxonomic Resolution ◽

Rrna Gene ◽

Data Sets ◽

Data Set ◽

Comprehensive Database ◽

Fine Resolution

ABSTRACT Taxonomy assignment of freshwater microbial communities is limited by the minimally curated phylogenies used for large taxonomy databases. Here we introduce TaxAss, a taxonomy assignment workflow that classifies 16S rRNA gene amplicon data using two taxonomy reference databases: a large comprehensive database and a small ecosystem-specific database rigorously curated by scientists within a field. We applied TaxAss to five different freshwater data sets using the comprehensive SILVA database and the freshwater-specific FreshTrain database. TaxAss increased the percentage of the data set classified compared to using only SILVA, especially at fine-resolution family to species taxon levels, while across the freshwater test data sets classifications increased by as much as 11 to 40% of total reads. A similar increase in classifications was not observed in a control mouse gut data set, which was not expected to contain freshwater bacteria. TaxAss also maintained taxonomic richness compared to using only the FreshTrain across all taxon levels from phylum to species. Without TaxAss, most organisms not represented in the FreshTrain were unclassified, but at fine taxon levels, incorrect classifications became significant. We validated TaxAss using simulated amplicon data derived from full-length clone libraries and found that 96 to 99% of test sequences were correctly classified at fine resolution. TaxAss splits a data set’s sequences into two groups based on their percent identity to reference sequences in the ecosystem-specific database. Sequences with high similarity to sequences in the ecosystem-specific database are classified using that database, and the others are classified using the comprehensive database. TaxAss is free and open source and is available at https://www.github.com/McMahonLab/TaxAss. IMPORTANCE Microbial communities drive ecosystem processes, but microbial community composition analyses using 16S rRNA gene amplicon data sets are limited by the lack of fine-resolution taxonomy classifications. Coarse taxonomic groupings at the phylum, class, and order levels lump ecologically distinct organisms together. To avoid this, many researchers define operational taxonomic units (OTUs) based on clustered sequences, sequence variants, or unique sequences. These fine-resolution groupings are more ecologically relevant, but OTU definitions are data set dependent and cannot be compared between data sets. Microbial ecologists studying freshwater have curated a small, ecosystem-specific taxonomy database to provide consistent and up-to-date terminology. We created TaxAss, a workflow that leverages this database to assign taxonomy. We found that TaxAss improves fine-resolution taxonomic classifications (family, genus, and species). Fine taxonomic groupings are more ecologically relevant, so they provide an alternative to OTU-based analyses that is consistent and comparable between data sets.

Download Full-text

An Extensive Meta-Metagenomic Search Identifies SARS-CoV-2-Homologous Sequences in Pangolin Lung Viromes

mSphere ◽

10.1128/msphere.00160-20 ◽

2020 ◽

Vol 5 (3) ◽

Cited By ~ 9

Author(s):

Lamia Wahba ◽

Nimit Jain ◽

Andrew Z. Fire ◽

Massa J. Shoura ◽

Karen L. Artiles ◽

...

Keyword(s):

Nucleic Acid ◽

High Speed ◽

High Throughput Sequencing ◽

Biological Significance ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Link Type ◽

Recent Emergence

ABSTRACT In numerous instances, tracking the biological significance of a nucleic acid sequence can be augmented through the identification of environmental niches in which the sequence of interest is present. Many metagenomic data sets are now available, with deep sequencing of samples from diverse biological niches. While any individual metagenomic data set can be readily queried using web-based tools, meta-searches through all such data sets are less accessible. In this brief communication, we demonstrate such a meta-metagenomic approach, examining close matches to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in all high-throughput sequencing data sets in the NCBI Sequence Read Archive accessible with the “virome” keyword. In addition to the homology to bat coronaviruses observed in descriptions of the SARS-CoV-2 sequence (F. Wu, S. Zhao, B. Yu, Y. M. Chen, et al., Nature 579:265–269, 2020, https://doi.org/10.1038/s41586-020-2008-3; P. Zhou, X. L. Yang, X. G. Wang, B. Hu, et al., Nature 579:270–273, 2020, https://doi.org/10.1038/s41586-020-2012-7), we note a strong homology to numerous sequence reads in metavirome data sets generated from the lungs of deceased pangolins reported by Liu et al. (P. Liu, W. Chen, and J. P. Chen, Viruses 11:979, 2019, https://doi.org/10.3390/v11110979). While analysis of these reads indicates the presence of a similar viral sequence in pangolin lung, the similarity is not sufficient to either confirm or rule out a role for pangolins as an intermediate host in the recent emergence of SARS-CoV-2. In addition to the implications for SARS-CoV-2 emergence, this study illustrates the utility and limitations of meta-metagenomic search tools in effective and rapid characterization of potentially significant nucleic acid sequences. IMPORTANCE Meta-metagenomic searches allow for high-speed, low-cost identification of potentially significant biological niches for sequences of interest.

Download Full-text

Monochloramine Disinfection Kinetics of Nitrosomonas europaea by Propidium Monoazide Quantitative PCR and Live/Dead BacLight Methods

Applied and Environmental Microbiology ◽

10.1128/aem.00407-09 ◽

2009 ◽

Vol 75 (17) ◽

pp. 5555-5562 ◽

Cited By ~ 42

Author(s):

David G. Wahman ◽

Karen A. Wulfeck-Kleier ◽

Jonathan G. Pressman

Keyword(s):

Drinking Water ◽

Distribution Systems ◽

Water Distribution ◽

Data Sets ◽

Propidium Monoazide ◽

Data Set ◽

Drinking Water Distribution Systems ◽

Qpcr Method ◽

Drinking Water Distribution ◽

Culture Independent

ABSTRACT Monochloramine disinfection kinetics were determined for the pure-culture ammonia-oxidizing bacterium Nitrosomonas europaea (ATCC 19718) by two culture-independent methods, namely, Live/Dead BacLight (LD) and propidium monoazide quantitative PCR (PMA-qPCR). Both methods were first verified with mixtures of heat-killed (nonviable) and non-heat-killed (viable) cells before a series of batch disinfection experiments with stationary-phase cultures (batch grown for 7 days) at pH 8.0, 25ï¿½C, and 5, 10, and 20 mg Cl2/liter monochloramine. Two data sets were generated based on the viability method used, either (i) LD or (ii) PMA-qPCR. These two data sets were used to estimate kinetic parameters for the delayed Chick-Watson disinfection model through a Bayesian analysis implemented in WinBUGS. This analysis provided parameter estimates of 490 mg Cl2-min/liter for the lag coefficient (b) and 1.6 ï¿½ 10−3 to 4.0 ï¿½ 10−3 liter/mg Cl2-min for the Chick-Watson disinfection rate constant (k). While estimates of b were similar for both data sets, the LD data set resulted in a greater k estimate than that obtained with the PMA-qPCR data set, implying that the PMA-qPCR viability measure was more conservative than LD. For N. europaea, the lag phase was not previously reported for culture-independent methods and may have implications for nitrification in drinking water distribution systems. This is the first published application of a PMA-qPCR method for disinfection kinetic model parameter estimation as well as its application to N. europaea or monochloramine. Ultimately, this PMA-qPCR method will allow evaluation of monochloramine disinfection kinetics for mixed-culture bacteria in drinking water distribution systems.

Download Full-text