scholarly journals TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution

mSphere ◽  
2018 ◽  
Vol 3 (5) ◽  
Author(s):  
Robin R. Rohwer ◽  
Joshua J. Hamilton ◽  
Ryan J. Newton ◽  
Katherine D. McMahon

ABSTRACT Taxonomy assignment of freshwater microbial communities is limited by the minimally curated phylogenies used for large taxonomy databases. Here we introduce TaxAss, a taxonomy assignment workflow that classifies 16S rRNA gene amplicon data using two taxonomy reference databases: a large comprehensive database and a small ecosystem-specific database rigorously curated by scientists within a field. We applied TaxAss to five different freshwater data sets using the comprehensive SILVA database and the freshwater-specific FreshTrain database. TaxAss increased the percentage of the data set classified compared to using only SILVA, especially at fine-resolution family to species taxon levels, while across the freshwater test data sets classifications increased by as much as 11 to 40% of total reads. A similar increase in classifications was not observed in a control mouse gut data set, which was not expected to contain freshwater bacteria. TaxAss also maintained taxonomic richness compared to using only the FreshTrain across all taxon levels from phylum to species. Without TaxAss, most organisms not represented in the FreshTrain were unclassified, but at fine taxon levels, incorrect classifications became significant. We validated TaxAss using simulated amplicon data derived from full-length clone libraries and found that 96 to 99% of test sequences were correctly classified at fine resolution. TaxAss splits a data set’s sequences into two groups based on their percent identity to reference sequences in the ecosystem-specific database. Sequences with high similarity to sequences in the ecosystem-specific database are classified using that database, and the others are classified using the comprehensive database. TaxAss is free and open source and is available at https://www.github.com/McMahonLab/TaxAss. IMPORTANCE Microbial communities drive ecosystem processes, but microbial community composition analyses using 16S rRNA gene amplicon data sets are limited by the lack of fine-resolution taxonomy classifications. Coarse taxonomic groupings at the phylum, class, and order levels lump ecologically distinct organisms together. To avoid this, many researchers define operational taxonomic units (OTUs) based on clustered sequences, sequence variants, or unique sequences. These fine-resolution groupings are more ecologically relevant, but OTU definitions are data set dependent and cannot be compared between data sets. Microbial ecologists studying freshwater have curated a small, ecosystem-specific taxonomy database to provide consistent and up-to-date terminology. We created TaxAss, a workflow that leverages this database to assign taxonomy. We found that TaxAss improves fine-resolution taxonomic classifications (family, genus, and species). Fine taxonomic groupings are more ecologically relevant, so they provide an alternative to OTU-based analyses that is consistent and comparable between data sets.

2017 ◽  
Author(s):  
Robin R. Rohwer ◽  
Joshua J. Hamilton ◽  
Ryan J. Newton ◽  
Katherine D. McMahon

ABSTRACTTaxonomy assignment of freshwater microbial communities is limited by the minimally curated phylogenies used for large taxonomy databases. Here we introduce TaxAss, a taxonomy assignment workflow that classifies 16S rRNA gene amplicon data using two taxonomy reference databases: a large comprehensive database and a small ecosystem-specific database rigorously curated by scientists within a field. We applied TaxAss to five different freshwater datasets using the comprehensive Silva database and the freshwater-specific FreshTrain database. TaxAss increased the percent of the dataset classified compared to using only Silva, especially at fine-resolution family-species taxa levels, while across the freshwater test-datasets classifications increased by as much as 11-40 percent of total reads. A similar increase in classifications was not observed in a control mouse gut dataset, which was not expected to contain freshwater bacteria. TaxAss also maintained taxonomic richness compared to using only the FreshTrain across all taxa-levels from phylum to species. Without TaxAss, most organisms not represented in the FreshTrain were unclassified, but at fine taxa levels incorrect classifications became significant. We validated TaxAss using simulated amplicon data with known taxonomy and found that 96-99% of test sequences were correctly classified at fine resolution. TaxAss splits a dataset’s sequences into two groups based on their percent identity to reference sequences in the ecosystem-specific database. Sequences with high similarity to sequences in the ecosystem-specific database are classified using that database, and the others are classified using the comprehensive database. TaxAss is free and open source, and available at www.github.com/McMahonLab/TaxAss.IMPORTANCEMicrobial communities drive ecosystem processes, but microbial community composition analyses using 16S rRNA gene amplicon datasets are limited by the lack of fine-resolution taxonomy classifications. Coarse taxonomic groupings at phylum, class, and order level lump ecologically distinct organisms together. To avoid this, many researchers define operational taxonomic units (OTUs) based on clustered sequences, sequence variants, or unique sequences. These fine-resolution groupings are more ecologically relevant, but OTU definitions are dataset-dependent and cannot be compared between datasets. Microbial ecologists studying freshwater have curated a small, ecosystem-specific taxonomy database to provide consistent and up-to-date terminology. We created TaxAss, a workflow that leverages this database to assign taxonomy. We found that TaxAss improves fine-resolution taxonomic classifications (family, genus and species). Fine taxonomic groupings are more ecologically relevant, so they provide an alternative to OTU-based analyses that is consistent and comparable between datasets.


2019 ◽  
Vol 85 (7) ◽  
Author(s):  
Alexander Burkert ◽  
Thomas A. Douglas ◽  
Mark P. Waldrop ◽  
Rachel Mackelprang

ABSTRACTPermafrost hosts a community of microorganisms that survive and reproduce for millennia despite extreme environmental conditions, such as water stress, subzero temperatures, high salinity, and low nutrient availability. Many studies focused on permafrost microbial community composition use DNA-based methods, such as metagenomics and 16S rRNA gene sequencing. However, these methods do not distinguish among active, dead, and dormant cells. This is of particular concern in ancient permafrost, where constant subzero temperatures preserve DNA from dead organisms and dormancy may be a common survival strategy. To circumvent this, we applied (i) LIVE/DEAD differential staining coupled with microscopy, (ii) endospore enrichment, and (iii) selective depletion of DNA from dead cells to permafrost microbial communities across a Pleistocene permafrost chronosequence (19,000, 27,000, and 33,000 years old). Cell counts and analysis of 16S rRNA gene amplicons from live, dead, and dormant cells revealed how communities differ between these pools, how they are influenced by soil physicochemical properties, and whether they change over geologic time. We found evidence that cells capable of forming endospores are not necessarily dormant and that members of the classBacilliwere more likely to form endospores in response to long-term stressors associated with permafrost environmental conditions than members of theClostridia, which were more likely to persist as vegetative cells in our older samples. We also found that removing exogenous “relic” DNA preserved within permafrost did not significantly alter microbial community composition. These results link the live, dead, and dormant microbial communities to physicochemical characteristics and provide insights into the survival of microbial communities in ancient permafrost.IMPORTANCEPermafrost soils store more than half of Earth’s soil carbon despite covering ∼15% of the land area (C. Tarnocai et al., Global Biogeochem Cycles 23:GB2023, 2009, https://doi.org/10.1029/2008GB003327). This permafrost carbon is rapidly degraded following a thaw (E. A. G. Schuur et al., Nature 520:171–179, 2015, https://doi.org/10.1038/nature14338). Understanding microbial communities in permafrost will contribute to the knowledge base necessary to understand the rates and forms of permafrost C and N cycling postthaw. Permafrost is also an analog for frozen extraterrestrial environments, and evidence of viable organisms in ancient permafrost is of interest to those searching for potential life on distant worlds. If we can identify strategies microbial communities utilize to survive in permafrost, it may yield insights into how life (if it exists) survives in frozen environments outside of Earth. Our work is significant because it contributes to an understanding of how microbial life adapts and survives in the extreme environmental conditions in permafrost terrains.


2018 ◽  
Author(s):  
Alex Burkert ◽  
Thomas A. Douglas ◽  
Mark P. Waldrop ◽  
Rachel Mackelprang

AbstractPermafrost hosts a community of microorganisms that survive and reproduce for millennia despite extreme environmental conditions such as water stress, subzero temperatures, high salinity, and low nutrient availability. Many studies focused on permafrost microbial community composition use DNA-based methods such as metagenomic and 16S rRNA gene sequencing. However, these methods do not distinguish between active, dead, and dormant cells. This is of particular concern in ancient permafrost where constant subzero temperatures preserve DNA from dead organisms and dormancy may be a common survival strategy. To circumvent this we applied: (i) live/dead differential staining coupled with microscopy, (ii) endospore enrichment, and (iii) selective depletion of DNA from dead cells to permafrost microbial communities across a Pleistocene permafrost chronosequence (19K, 27K, and 33K). Cell counts and analysis of 16S rRNA gene amplicons from live, dead, and dormant cells revealed how communities differ between these pools and how they change over geologic time. We found clear evidence that cells capable of forming endospores are not necessarily dormant and that the propensity to form endospores differed among taxa. Specifically, Bacilli are more likely to form endospores in response to long-term stressors associated with permafrost environmental conditions than members of Clostridia, which are more likely to persist as vegetative cells over geologic timescales. We also found that exogenous DNA preserved within permafrost does not bias DNA sequencing results since its removal did not significantly alter the microbial community composition. These results extend the findings of a previous study that showed permafrost age and ice content largely control microbial community diversity and cell abundances.ImportanceThe study of permafrost transcends the study of climate change and exobiology. Permafrost soils store more than half earth’s soil carbon despite covering ∽15% of the land area (Tarnocai et al 2009). This permafrost carbon is rapidly degraded following thaw (Tarnocai C et al 2009, Schuur et al 2015). Understanding microbial communities in permafrost will contribute to the knowledge base necessary to understand the rates and forms of permafrost C and N cycling post thaw. Permafrost is also an analog for frozen extraterrestrial environments and evidence of viable organisms in ancient permafrost is of interest to those searching for potential life on distant worlds. If we can identify strategies microbial communities utilize to survive permafrost we can focus efforts searching for evidence of life on cryogenic cosmic bodies. Our work is significant because it contributes to an understanding of how microbial life adapts and survives in the extreme environmental conditions in permafrost terrains across geologic timescales.


mSystems ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
Brendan A. Daisley ◽  
Gregor Reid

ABSTRACT High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to “bxid” placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species—two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors. IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on “in-house” reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study.


2010 ◽  
Vol 76 (9) ◽  
pp. 2968-2979 ◽  
Author(s):  
Shingo Kato ◽  
Yoshinori Takano ◽  
Takeshi Kakegawa ◽  
Hironori Oba ◽  
Kazuhiko Inoue ◽  
...  

ABSTRACT The abundance, diversity, activity, and composition of microbial communities in sulfide structures both of active and inactive vents were investigated by culture-independent methods. These sulfide structures were collected at four hydrothermal fields, both on- and off-axis of the back-arc spreading center of the Southern Mariana Trough. The microbial abundance and activity in the samples were determined by analyzing total organic content, enzymatic activity, and copy number of the 16S rRNA gene. To assess the diversity and composition of the microbial communities, 16S rRNA gene clone libraries including bacterial and archaeal phylotypes were constructed from the sulfide structures. Despite the differences in the geological settings among the sampling points, phylotypes related to the Epsilonproteobacteria and cultured hyperthermophilic archaea were abundant in the libraries from the samples of active vents. In contrast, the relative abundance of these phylotypes was extremely low in the libraries from the samples of inactive vents. These results suggest that the composition of microbial communities within sulfide structures dramatically changes depending on the degree of hydrothermal activity, which was supported by statistical analyses. Comparative analyses suggest that the abundance, activity and diversity of microbial communities within sulfide structures of inactive vents are likely to be comparable to or higher than those in active vent structures, even though the microbial community composition is different between these two types of vents. The microbial community compositions in the sulfide structures of inactive vents were similar to those in seafloor basaltic rocks rather than those in marine sediments or the sulfide structures of active vents, suggesting that the microbial community compositions on the seafloor may be constrained by the available energy sources. Our findings provide helpful information for understanding the biogeography, biodiversity and microbial ecosystems in marine environments.


2007 ◽  
Vol 73 (20) ◽  
pp. 6682-6685 ◽  
Author(s):  
Daniel P. R. Herlemann ◽  
Oliver Geissinger ◽  
Andreas Brune

ABSTRACT The bacterial candidate phylum Termite Group I (TG-1) presently consists mostly of “Endomicrobia,” which are endosymbionts of flagellate protists occurring exclusively in the hindguts of termites and wood-feeding cockroaches. Here, we show that public databases contain many, mostly undocumented 16S rRNA gene sequences from other habitats that are affiliated with the TG-1 phylum but are only distantly related to “Endomicrobia.” Phylogenetic analysis of the expanded data set revealed several diverse and deeply branching lineages comprising clones from many different habitats. In addition, we designed specific primers to explore the diversity and environmental distribution of bacteria in the TG-1 phylum.


2021 ◽  
Vol 12 ◽  
Author(s):  
Marc Crampon ◽  
Coralie Soulier ◽  
Pauline Sidoli ◽  
Jennifer Hellal ◽  
Catherine Joulian ◽  
...  

The demand for energy and chemicals is constantly growing, leading to an increase of the amounts of contaminants discharged to the environment. Among these, pharmaceutical molecules are frequently found in treated wastewater that is discharged into superficial waters. Indeed, wastewater treatment plants (WWTPs) are designed to remove organic pollution from urban effluents but are not specific, especially toward contaminants of emerging concern (CECs), which finally reach the natural environment. In this context, it is important to study the fate of micropollutants, especially in a soil aquifer treatment (SAT) context for water from WWTPs, and for the most persistent molecules such as benzodiazepines. In the present study, soils sampled in a reed bed frequently flooded by water from a WWTP were spiked with diazepam and oxazepam in microcosms, and their concentrations were monitored for 97 days. It appeared that the two molecules were completely degraded after 15 days of incubation. Samples were collected during the experiment in order to follow the dynamics of the microbial communities, based on 16S rRNA gene sequencing for Archaea and Bacteria, and ITS2 gene for Fungi. The evolution of diversity and of specific operating taxonomic units (OTUs) highlighted an impact of the addition of benzodiazepines, a rapid resilience of the fungal community and an evolution of the bacterial community. It appeared that OTUs from the Brevibacillus genus were more abundant at the beginning of the biodegradation process, for diazepam and oxazepam conditions. Additionally, Tax4Fun tool was applied to 16S rRNA gene sequencing data to infer on the evolution of specific metabolic functions during biodegradation. It finally appeared that the microbial community in soils frequently exposed to water from WWTP, potentially containing CECs such as diazepam and oxazepam, may be adapted to the degradation of persistent contaminants.


2008 ◽  
Vol 46 (2) ◽  
pp. 125-136 ◽  
Author(s):  
Young-Do Nam ◽  
Youlboong Sung ◽  
Ho-Won Chang ◽  
Seong Woon Roh ◽  
Kyoung-Ho Kim ◽  
...  

Author(s):  
Christen L. Grettenberger ◽  
Trinity L. Hamilton

Acid mine drainage (AMD) is a global problem in which iron sulfide minerals oxidize and generate acidic, metal-rich water. Bioremediation relies on understanding how microbial communities inhabiting an AMD site contribute to biogeochemical cycling. A number of studies have reported community composition in AMD sites from 16S rRNA gene amplicons but it remains difficult to link taxa to function, especially in the absence of closely related cultured species or those with published genomes. Unfortunately, there is a paucity of genomes and cultured taxa from AMD environments. Here, we report 29 novel metagenome assembled genomes from Cabin Branch, an AMD site in the Daniel Boone National Forest, KY, USA. The genomes span 11 bacterial phyla and one Archaea and include taxa that contribute to carbon, nitrogen, sulfur, and iron cycling. These data reveal overlooked taxa that contribute to carbon fixation in AMD sites as well as uncharacterized Fe(II)-oxidizing bacteria. These data provide additional context for 16S rRNA gene studies, add to our understanding of the taxa involved in biogeochemical cycling in AMD environments, and can inform bioremediation strategies. IMPORTANCE Bioremediating acid mine drainage requires understanding how microbial communities influence geochemical cycling of iron and sulfur and biologically important elements like carbon and nitrogen. Research in this area has provided an abundance of 16S rRNA gene amplicon data. However, linking these data to metabolisms is difficult because many AMD taxa are uncultured or lack published genomes. Here, we present metagenome assembled genomes from 29 novel AMD taxa and detail their metabolic potential. These data provide information on AMD taxa that could be important for bioremediation strategies including taxa that are involved in cycling iron, sulfur, carbon, and nitrogen.


Sign in / Sign up

Export Citation Format

Share Document