BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities

ABSTRACT High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to “bxid” placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species—two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors. IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on “in-house” reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study.

Download Full-text

TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution

mSphere ◽

10.1128/msphere.00327-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 25

Author(s):

Robin R. Rohwer ◽

Joshua J. Hamilton ◽

Ryan J. Newton ◽

Katherine D. McMahon

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Microbial Community Composition ◽

Taxonomic Resolution ◽

Rrna Gene ◽

Data Sets ◽

Data Set ◽

Comprehensive Database ◽

Fine Resolution

ABSTRACT Taxonomy assignment of freshwater microbial communities is limited by the minimally curated phylogenies used for large taxonomy databases. Here we introduce TaxAss, a taxonomy assignment workflow that classifies 16S rRNA gene amplicon data using two taxonomy reference databases: a large comprehensive database and a small ecosystem-specific database rigorously curated by scientists within a field. We applied TaxAss to five different freshwater data sets using the comprehensive SILVA database and the freshwater-specific FreshTrain database. TaxAss increased the percentage of the data set classified compared to using only SILVA, especially at fine-resolution family to species taxon levels, while across the freshwater test data sets classifications increased by as much as 11 to 40% of total reads. A similar increase in classifications was not observed in a control mouse gut data set, which was not expected to contain freshwater bacteria. TaxAss also maintained taxonomic richness compared to using only the FreshTrain across all taxon levels from phylum to species. Without TaxAss, most organisms not represented in the FreshTrain were unclassified, but at fine taxon levels, incorrect classifications became significant. We validated TaxAss using simulated amplicon data derived from full-length clone libraries and found that 96 to 99% of test sequences were correctly classified at fine resolution. TaxAss splits a data set’s sequences into two groups based on their percent identity to reference sequences in the ecosystem-specific database. Sequences with high similarity to sequences in the ecosystem-specific database are classified using that database, and the others are classified using the comprehensive database. TaxAss is free and open source and is available at https://www.github.com/McMahonLab/TaxAss. IMPORTANCE Microbial communities drive ecosystem processes, but microbial community composition analyses using 16S rRNA gene amplicon data sets are limited by the lack of fine-resolution taxonomy classifications. Coarse taxonomic groupings at the phylum, class, and order levels lump ecologically distinct organisms together. To avoid this, many researchers define operational taxonomic units (OTUs) based on clustered sequences, sequence variants, or unique sequences. These fine-resolution groupings are more ecologically relevant, but OTU definitions are data set dependent and cannot be compared between data sets. Microbial ecologists studying freshwater have curated a small, ecosystem-specific taxonomy database to provide consistent and up-to-date terminology. We created TaxAss, a workflow that leverages this database to assign taxonomy. We found that TaxAss improves fine-resolution taxonomic classifications (family, genus, and species). Fine taxonomic groupings are more ecologically relevant, so they provide an alternative to OTU-based analyses that is consistent and comparable between data sets.

Download Full-text

High-resolution ISR amplicon sequencing reveals personalized oral microbiome

10.1101/320564 ◽

2018 ◽

Author(s):

Chiranjit Mukherjee ◽

Clifford J. Beall ◽

Ann L. Griffen ◽

Eugene J. Leys

Keyword(s):

High Resolution ◽

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Cost Effective ◽

Amplicon Sequencing ◽

Species Level ◽

Rrna Gene ◽

Oral Microbiota ◽

Community Fingerprinting

AbstractBackground:Sequencing of the 16S rRNA gene has been the standard for studying the composition of microbial communities. While it allows identification of bacteria at the level of species, it does not usually provide sufficient information to resolve at the sub-species level. Species-level resolution is not adequate for studies of transmission or stability, or for exploring subspecies variation in disease association. Current approaches using whole metagenome shotgun sequencing require very high coverage that can be cost-prohibitive and computationally challenging for diverse communities. Thus there is a need for high-resolution, yet cost-effective, high-throughput methods for characterizing microbial communities.Results:Significant improvement in resolution for amplicon-based bacterial community analysis was achieved by combining amplicon sequencing of a high-diversity marker gene, the ribosomal operon ISR, with a probabilistic error modeling algorithm, DADA2. The resolving power of this new approach was compared to that of both standard and high-resolution 16S-based approaches using a set of longitudinal subgingival plaque samples. The ISR strategy achieved a 5.2-fold increase in community richness compared to reference-based 16S rRNA gene analysis, and showed 100% accuracy in predicting the correct source of a clinical sample. Individuals’ microbial communities were highly personalized, and although they exhibited some drift in membership and levels over time, that difference was always smaller than the differences between any two subjects, even after one year. The construction of an ISR database from publicly available genomic sequences allowed us to explore genomic variationwithinspecies, resulting in the identification of multiple variants of the ISR for most species.Conclusions:The ISR approach resulted in significantly improved resolution of communities, and revealed a highly personalized, stable human oral microbiota. Multiple ISR types were observed for all species examined, demonstrating a high level of subspecies variation in the oral microbiota. The approach is high-throughput, high-resolution yet cost-effective, allowing subspecies-level community fingerprinting at a cost comparable to that of 16S rRNA gene amplicon sequencing. It will be useful for a range of applications that require high-resolution identification of organisms, including microbial tracking, community fingerprinting, and potentially for identification of virulence-associated strains.

Download Full-text

TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution

10.1101/214288 ◽

2017 ◽

Cited By ~ 6

Author(s):

Robin R. Rohwer ◽

Joshua J. Hamilton ◽

Ryan J. Newton ◽

Katherine D. McMahon

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Microbial Community Composition ◽

Taxonomic Resolution ◽

Rrna Gene ◽

Operational Taxonomic Units ◽

Comprehensive Database ◽

Reference Databases ◽

Fine Resolution

ABSTRACTTaxonomy assignment of freshwater microbial communities is limited by the minimally curated phylogenies used for large taxonomy databases. Here we introduce TaxAss, a taxonomy assignment workflow that classifies 16S rRNA gene amplicon data using two taxonomy reference databases: a large comprehensive database and a small ecosystem-specific database rigorously curated by scientists within a field. We applied TaxAss to five different freshwater datasets using the comprehensive Silva database and the freshwater-specific FreshTrain database. TaxAss increased the percent of the dataset classified compared to using only Silva, especially at fine-resolution family-species taxa levels, while across the freshwater test-datasets classifications increased by as much as 11-40 percent of total reads. A similar increase in classifications was not observed in a control mouse gut dataset, which was not expected to contain freshwater bacteria. TaxAss also maintained taxonomic richness compared to using only the FreshTrain across all taxa-levels from phylum to species. Without TaxAss, most organisms not represented in the FreshTrain were unclassified, but at fine taxa levels incorrect classifications became significant. We validated TaxAss using simulated amplicon data with known taxonomy and found that 96-99% of test sequences were correctly classified at fine resolution. TaxAss splits a dataset’s sequences into two groups based on their percent identity to reference sequences in the ecosystem-specific database. Sequences with high similarity to sequences in the ecosystem-specific database are classified using that database, and the others are classified using the comprehensive database. TaxAss is free and open source, and available at www.github.com/McMahonLab/TaxAss.IMPORTANCEMicrobial communities drive ecosystem processes, but microbial community composition analyses using 16S rRNA gene amplicon datasets are limited by the lack of fine-resolution taxonomy classifications. Coarse taxonomic groupings at phylum, class, and order level lump ecologically distinct organisms together. To avoid this, many researchers define operational taxonomic units (OTUs) based on clustered sequences, sequence variants, or unique sequences. These fine-resolution groupings are more ecologically relevant, but OTU definitions are dataset-dependent and cannot be compared between datasets. Microbial ecologists studying freshwater have curated a small, ecosystem-specific taxonomy database to provide consistent and up-to-date terminology. We created TaxAss, a workflow that leverages this database to assign taxonomy. We found that TaxAss improves fine-resolution taxonomic classifications (family, genus and species). Fine taxonomic groupings are more ecologically relevant, so they provide an alternative to OTU-based analyses that is consistent and comparable between datasets.

Download Full-text

The Termite Group I Phylum Is Highly Diverse and Widespread in the Environment

Applied and Environmental Microbiology ◽

10.1128/aem.00712-07 ◽

2007 ◽

Vol 73 (20) ◽

pp. 6682-6685 ◽

Cited By ~ 32

Author(s):

Daniel P. R. Herlemann ◽

Oliver Geissinger ◽

Andreas Brune

Keyword(s):

Phylogenetic Analysis ◽

16S Rrna ◽

16S Rrna Gene ◽

Rrna Gene ◽

Gene Sequences ◽

16S Rrna Gene Sequences ◽

Specific Primers ◽

Environmental Distribution ◽

Data Set ◽

Group I

ABSTRACT The bacterial candidate phylum Termite Group I (TG-1) presently consists mostly of “Endomicrobia,” which are endosymbionts of flagellate protists occurring exclusively in the hindguts of termites and wood-feeding cockroaches. Here, we show that public databases contain many, mostly undocumented 16S rRNA gene sequences from other habitats that are affiliated with the TG-1 phylum but are only distantly related to “Endomicrobia.” Phylogenetic analysis of the expanded data set revealed several diverse and deeply branching lineages comprising clones from many different habitats. In addition, we designed specific primers to explore the diversity and environmental distribution of bacteria in the TG-1 phylum.

Download Full-text

Diversity of microbial communities in hot springs of Sri Lanka as revealed by 16S rRNA gene high-throughput sequencing analysis

Gene ◽

10.1016/j.gene.2021.146103 ◽

2021 ◽

pp. 146103

Author(s):

Dilini Sadeepa ◽

Kosala Sirisena ◽

Pathmalal M. Manage

Keyword(s):

Sri Lanka ◽

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

High Throughput ◽

High Throughput Sequencing ◽

Hot Springs ◽

Rrna Gene ◽

Sequencing Analysis

Download Full-text

Pseudomonas taeanensis sp. nov., isolated from a crude oil-contaminated seashore

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.018093-0 ◽

2010 ◽

Vol 60 (12) ◽

pp. 2719-2723 ◽

Cited By ~ 14

Author(s):

Dong-Heon Lee ◽

Sung-Ran Moon ◽

Young-Hyun Park ◽

Jung-Ho Kim ◽

Hoon Kim ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Crude Oil ◽

Gene Sequence ◽

Novel Species ◽

Sequence Similarity ◽

Rrna Gene ◽

16S Rrna Gene Sequences ◽

Low Levels

A novel Gram-negative, aerobic, motile, short rod-shaped bacterium, designated MS-3T, was isolated from a crude oil-contaminated seashore in Taean, Korea. Strain MS-3T grew at 4–30 °C, at pH 6.0–9.5 and with 0–5 % NaCl and was oxidase- and catalase-positive. Phylogenetic analysis based on 16S rRNA gene sequences revealed that strain MS-3T was most similar to Pseudomonas marincola KMM 3042T (97.9 % 16S rRNA gene sequence similarity), P. cuatrocienegasensis 1NT (97.8 %), P. borbori R-20821T (97.3 %) and P. lundensis ATCC 49968T (97.1 %). Relatively low levels of DNA–DNA relatedness were found between strain MS-3T and P. cuatrocienegasensis LMG 24676T (57.2 %), P. borbori LMG 23199T (39.7 %), P. marincola KMM 3042T (32.2 %) and P. lundensis KACC 10832T (32.1 %), which support the classification of strain MS-3T within a novel species of the genus Pseudomonas. The G+C content of the genomic DNA of strain MS-3T was 57.6 mol% and the major isoprenoid quinone was Q-9. Strain MS-3T contained summed feature 3 (iso-C15 : 0 2-OH and/or C16 : 1 ω7c; 38.0 %), C16 : 0 (24.4 %), C18 : 1 ω7c (12.8 %), C12 : 0 (9.6 %) and C10 : 0 3-OH (4.9 %) as the major cellular fatty acids. On the basis of the phenotypic, genotypic and phylogenetic data, strain MS-3T represents a novel species of the genus Pseudomonas, for which the name Pseudomonas taeanensis sp. nov. is proposed. The type strain is MS-3T (=KCTC 22612T =KACC 14032T =JCM 16046T =NBRL 105641T).

Download Full-text

Dynamics of Soil Microbial Communities During Diazepam and Oxazepam Biodegradation in Soil Flooded by Water From a WWTP

Frontiers in Microbiology ◽

10.3389/fmicb.2021.742000 ◽

2021 ◽

Vol 12 ◽

Author(s):

Marc Crampon ◽

Coralie Soulier ◽

Pauline Sidoli ◽

Jennifer Hellal ◽

Catherine Joulian ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Soil Microbial Communities ◽

Gene Sequencing ◽

16S Rrna Gene Sequencing ◽

Treated Wastewater ◽

Rrna Gene ◽

Sequencing Data ◽

Rrna Gene Sequencing

The demand for energy and chemicals is constantly growing, leading to an increase of the amounts of contaminants discharged to the environment. Among these, pharmaceutical molecules are frequently found in treated wastewater that is discharged into superficial waters. Indeed, wastewater treatment plants (WWTPs) are designed to remove organic pollution from urban effluents but are not specific, especially toward contaminants of emerging concern (CECs), which finally reach the natural environment. In this context, it is important to study the fate of micropollutants, especially in a soil aquifer treatment (SAT) context for water from WWTPs, and for the most persistent molecules such as benzodiazepines. In the present study, soils sampled in a reed bed frequently flooded by water from a WWTP were spiked with diazepam and oxazepam in microcosms, and their concentrations were monitored for 97 days. It appeared that the two molecules were completely degraded after 15 days of incubation. Samples were collected during the experiment in order to follow the dynamics of the microbial communities, based on 16S rRNA gene sequencing for Archaea and Bacteria, and ITS2 gene for Fungi. The evolution of diversity and of specific operating taxonomic units (OTUs) highlighted an impact of the addition of benzodiazepines, a rapid resilience of the fungal community and an evolution of the bacterial community. It appeared that OTUs from the Brevibacillus genus were more abundant at the beginning of the biodegradation process, for diazepam and oxazepam conditions. Additionally, Tax4Fun tool was applied to 16S rRNA gene sequencing data to infer on the evolution of specific metabolic functions during biodegradation. It finally appeared that the microbial community in soils frequently exposed to water from WWTP, potentially containing CECs such as diazepam and oxazepam, may be adapted to the degradation of persistent contaminants.

Download Full-text

Differentiation of Staphylococcus spp. by high-resolution melting analysis

Canadian Journal of Microbiology ◽

10.1139/w10-091 ◽

2010 ◽

Vol 56 (12) ◽

pp. 1040-1049 ◽

Cited By ~ 10

Author(s):

Michal Slany ◽

Martina Vanerkova ◽

Eva Nemcova ◽

Barbora Zaloudikova ◽

Filip Ruzicka ◽

...

Keyword(s):

High Resolution ◽

16S Rrna ◽

16S Rrna Gene ◽

High Resolution Melting ◽

Bacterial Species ◽

High Resolution Melting Analysis ◽

Rrna Gene ◽

Staphylococcus Capitis ◽

Melting Analysis ◽

The 16S Rrna Gene

High-resolution melting analysis (HRMA) is a fast (post-PCR) high-throughput method to scan for sequence variations in a target gene. The aim of this study was to test the potential of HRMA to distinguish particular bacterial species of the Staphylococcus genus even when using a broad-range PCR within the 16S rRNA gene where sequence differences are minimal. Genomic DNA samples isolated from 12 reference staphylococcal strains ( Staphylococcus aureus , Staphylococcus capitis , Staphylococcus caprae , Staphylococcus epidermidis , Staphylococcus haemolyticus , Staphylococcus hominis , Staphylococcus intermedius , Staphylococcus saprophyticus , Staphylococcus sciuri , Staphylococcus simulans , Staphylococcus warneri , and Staphylococcus xylosus ) were subjected to a real-time PCR amplification of the 16S rRNA gene in the presence of fluorescent dye EvaGreen™, followed by HRMA. Melting profiles were used as molecular fingerprints for bacterial species differentiation. HRMA of S. saprophyticus and S. xylosus resulted in undistinguishable profiles because of their identical sequences in the analyzed 16S rRNA region. The remaining reference strains were fully differentiated either directly or via high-resolution plots obtained by heteroduplex formation between coamplified PCR products of the tested staphylococcal strain and phylogenetically unrelated strain.

Download Full-text

Characterization of the depth-related changes in the microbial communities in Lake Hovsgol sediment by 16S rRNA gene-based approaches

The Journal of Microbiology ◽

10.1007/s12275-007-0189-1 ◽

2008 ◽

Vol 46 (2) ◽

pp. 125-136 ◽

Cited By ~ 20

Author(s):

Young-Do Nam ◽

Youlboong Sung ◽

Ho-Won Chang ◽

Seong Woon Roh ◽

Kyoung-Ho Kim ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Rrna Gene ◽

Lake Hovsgol

Download Full-text

Metagenome assembled genomes of novel taxa from an acid mine drainage environment

Applied and Environmental Microbiology ◽

10.1128/aem.00772-21 ◽

2021 ◽

Author(s):

Christen L. Grettenberger ◽

Trinity L. Hamilton

Keyword(s):

Acid Mine Drainage ◽

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Mine Drainage ◽

Biogeochemical Cycling ◽

Rrna Gene ◽

Metabolic Potential ◽

Carbon And Nitrogen ◽

Acid Mine

Acid mine drainage (AMD) is a global problem in which iron sulfide minerals oxidize and generate acidic, metal-rich water. Bioremediation relies on understanding how microbial communities inhabiting an AMD site contribute to biogeochemical cycling. A number of studies have reported community composition in AMD sites from 16S rRNA gene amplicons but it remains difficult to link taxa to function, especially in the absence of closely related cultured species or those with published genomes. Unfortunately, there is a paucity of genomes and cultured taxa from AMD environments. Here, we report 29 novel metagenome assembled genomes from Cabin Branch, an AMD site in the Daniel Boone National Forest, KY, USA. The genomes span 11 bacterial phyla and one Archaea and include taxa that contribute to carbon, nitrogen, sulfur, and iron cycling. These data reveal overlooked taxa that contribute to carbon fixation in AMD sites as well as uncharacterized Fe(II)-oxidizing bacteria. These data provide additional context for 16S rRNA gene studies, add to our understanding of the taxa involved in biogeochemical cycling in AMD environments, and can inform bioremediation strategies. IMPORTANCE Bioremediating acid mine drainage requires understanding how microbial communities influence geochemical cycling of iron and sulfur and biologically important elements like carbon and nitrogen. Research in this area has provided an abundance of 16S rRNA gene amplicon data. However, linking these data to metabolisms is difficult because many AMD taxa are uncultured or lack published genomes. Here, we present metagenome assembled genomes from 29 novel AMD taxa and detail their metabolic potential. These data provide information on AMD taxa that could be important for bioremediation strategies including taxa that are involved in cycling iron, sulfur, carbon, and nitrogen.

Download Full-text