NG-Tax, a highly accurate and validated pipeline for analysis of 16S rRNA amplicons from complex biomes

Background Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises true meta-analyses, although this fact is often disregarded. Results To facilitate true meta-analysis of microbiome studies, we developed NG-Tax, a pipeline for 16S rRNA gene amplicon sequence analysis that was validated with different mock communities and benchmarked against QIIME as the currently most frequently used pipeline. The microbial composition of 49 independently amplified mock samples was characterized by sequencing two variable 16S rRNA gene regions, V4 and V5-V6, in three separate sequencing runs on Illumina’s HiSeq2000 platform. This allowed evaluating important factors of technical bias in taxonomic classification: 1) run-to-run sequencing variation, 2) PCR–error, and 3) region/primer specific amplification bias. Despite the short read length (~140 nt) and all technical biases, the average specificity of the taxonomic assignment for the phylotypes included in the mock communities was 96%. On average 99.94% and 92.02% of the reads could be assigned to at least family or genus level, respectively, while assignment to ‘spurious genera’ represented on average only 0.02% of the reads per sample. Analysis of α- and β-diversity confirmed conclusions guided by biology rather than the aforementioned methodological aspects, which was not the case when samples were analysed using QIIME. Conclusions Different biological outcomes are commonly observed due to 16S rRNA region-specific performance. NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition. This will improve comparability between studies and facilitate efforts towards standardization.

Download Full-text

NG-Tax, a highly accurate and validated pipeline for analysis of 16S rRNA amplicons from complex biomes

F1000Research ◽

10.12688/f1000research.9227.2 ◽

2018 ◽

Vol 5 ◽

pp. 1791 ◽

Cited By ~ 14

Author(s):

Javier Ramiro-Garcia ◽

Gerben D. A. Hermes ◽

Christos Giatsis ◽

Detmer Sipkema ◽

Erwin G. Zoetendal ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput Sequencing ◽

Meta Analysis ◽

Read Length ◽

Rrna Gene ◽

Taxonomic Assignment ◽

Microbial Composition ◽

Sample Composition ◽

Mock Communities

Background: Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises meta-analyses, although this fact is often disregarded. Results: To facilitate true meta-analysis of microbiome studies, we developed NG-Tax, a pipeline for 16S rRNA gene amplicon sequence analysis that was validated with different mock communities and benchmarked against QIIME as a frequently used pipeline. The microbial composition of 49 independently amplified mock samples was characterized by sequencing two variable 16S rRNA gene regions, V4 and V5-V6, in three separate sequencing runs on Illumina’s HiSeq2000 platform. This allowed for the evaluation of important causes of technical bias in taxonomic classification: 1) run-to-run sequencing variation, 2) PCR–error, and 3) region/primer specific amplification bias. Despite the short read length (~140 nt) and all technical biases, the average specificity of the taxonomic assignment for the phylotypes included in the mock communities was 97.78%. On average 99.95% and 88.43% of the reads could be assigned to at least family or genus level, respectively, while assignment to ‘spurious genera’ represented on average only 0.21% of the reads per sample. Analysis of α- and β-diversity confirmed conclusions guided by biology rather than the aforementioned methodological aspects, which was not achieved with QIIME. Conclusions: Different biological outcomes are commonly observed due to 16S rRNA region-specific performance. NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition. This will improve comparability between studies and facilitate efforts towards standardization.

Download Full-text

Soybean Roots and Soil From High- and Low-Yielding Field Sites Have Different Microbiome Composition

Frontiers in Microbiology ◽

10.3389/fmicb.2021.675352 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ananda Y. Bandara ◽

Dilooshi K. Weerasooriya ◽

Ryan V. Trexler ◽

Terrence H. Bell ◽

Paul D. Esker

Keyword(s):

Network Analysis ◽

16S Rrna ◽

16S Rrna Gene ◽

High Throughput Sequencing ◽

Microbial Interactions ◽

Growth Stages ◽

Rrna Gene ◽

Plant Growth Promoting Bacteria ◽

Microbial Composition ◽

Site Type

The occurrence of high- (H) and low- (L) yielding field sites within a farm is a commonly observed phenomenon in soybean cultivation. Site topography, soil physical and chemical attributes, and soil/root-associated microbial composition can contribute to this phenomenon. In order to better understand the microbial dynamics associated with each site type (H/L), we collected bulk soil (BS), rhizosphere soil (RS), and soybean root (R) samples from historically high and low yield sites across eight Pennsylvania farms at V1 (first trifoliate) and R8 (maturity) soybean growth stages (SGS). We extracted DNA extracted from collected samples and performed high-throughput sequencing of PCR amplicons from both the fungal ITS and prokaryotic 16S rRNA gene regions. Sequences were then grouped into amplicon sequence variants (ASVs) and subjected to network analysis. Based on both ITS and 16S rRNA gene data, a greater network size and edges were observed for all sample types from H-sites compared to L-sites at both SGS. Network analysis suggested that the number of potential microbial interactions/associations were greater in samples from H-sites compared to L-sites. Diversity analyses indicated that site-type was not a main driver of alpha and beta diversity in soybean-associated microbial communities. L-sites contained a greater percentage of fungal phytopathogens (ex: Fusarium, Macrophomina, Septoria), while H-sites contained a greater percentage of mycoparasitic (ex: Trichoderma) and entomopathogenic (ex: Metarhizium) fungal genera. Furthermore, roots from H-sites possessed a greater percentage of Bradyrhizobium and genera known to contain plant growth promoting bacteria (ex: Flavobacterium, Duganella). Overall, our results revealed that there were differences in microbial composition in soil and roots from H- and L-sites across a variety of soybean farms. Based on our findings, we hypothesize that differences in microbial composition could have a causative relationship with observed within-farm variability in soybean yield.

Download Full-text

Accurate Microbiome Sequencing with Synthetic Long Read Sequencing

10.1101/2020.10.02.324038 ◽

2020 ◽

Author(s):

Nico Chung ◽

Marc W. Van Goethem ◽

Melanie A. Preston ◽

Filip Lhota ◽

Leona Cerna ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Rrna Gene ◽

Microbial Composition ◽

Short Read ◽

Long Read ◽

Phylogenetic Resolution

AbstractThe microbiome plays a central role in biochemical cycling and nutrient turnover of most ecosystems. Because it can comprise myriad microbial prokaryotes, eukaryotes and viruses, microbiome characterization requires high-throughput sequencing to attain an accurate identification and quantification of such co-existing microbial populations. Short-read next-generation-sequencing (srNGS) revolutionized the study of microbiomes and remains the most widely used approach, yet read lengths spanning only a few of the nine hypervariable regions of the 16S rRNA gene limit phylogenetic resolution leading to misclassification or failure to classify in a high percentage of cases. Here we evaluate a synthetic long-read (SLR) NGS approach for full-length 16S rRNA gene sequencing that is high-throughput, highly accurate and low-cost. The sequencing approach is amenable to highly multiplexed sequencing and provides microbiome sequence data that surpasses existing short and long-read modalities in terms of accuracy and phylogenetic resolution. We validated this commercially-available technology, termed LoopSeq, by characterizing the microbial composition of well-established mock microbiome communities and diverse real-world samples. SLR sequencing revealed differences in aquatic community complexity associated with environmental gradients, resolved species-level community composition of uterine lavage from subjects with histories of misconception and accurately detected strain differences, multiple copies of the 16S rRNA in a single strain’s genome, as well as low-level contamination in soil cyanobacterial cultures. This approach has implications for widespread adoption of high-resolution, accurate long-read microbiome sequencing as it is generated on popular short read sequencing platforms without the need for additional infrastructure.

Download Full-text

Benchmarking of 16S rRNA gene databases using known strain sequences

Bioinformation ◽

10.6026/97320630017377 ◽

2021 ◽

Vol 17 (3) ◽

pp. 377-391

Author(s):

Kunal Dixit ◽

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Full Length ◽

Rrna Gene ◽

Gene Sequences ◽

16S Rrna Gene Sequences ◽

Taxonomic Assignment ◽

16S Rrna Gene Analysis ◽

Data Set ◽

Mock Communities

16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at varioustaxonomic levels using various methods and databases.

Download Full-text

Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling

ISME Communications ◽

10.1038/s43705-021-00033-z ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Sandra Reitmeier ◽

Thomas C. A. Hitch ◽

Nicole Treichel ◽

Nikolaos Fikas ◽

Bela Hausmann ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Diversity Analysis ◽

Careful Attention ◽

Operational Taxonomic Units ◽

Basic Concepts ◽

Gnotobiotic Mice ◽

Mock Communities

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.

Download Full-text

Characterization of Bacteria in Biopsies of Colon and Stools by High Throughput Sequencing of the V2 Region of Bacterial 16S rRNA Gene in Human

PLoS ONE ◽

10.1371/journal.pone.0016952 ◽

2011 ◽

Vol 6 (2) ◽

pp. e16952 ◽

Cited By ~ 76

Author(s):

Yukihide Momozawa ◽

Valérie Deffontaine ◽

Edouard Louis ◽

Juan F. Medrano

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

High Throughput Sequencing ◽

Rrna Gene ◽

Bacterial 16S Rrna Gene

Download Full-text

Diversity of microbial communities in hot springs of Sri Lanka as revealed by 16S rRNA gene high-throughput sequencing analysis

Gene ◽

10.1016/j.gene.2021.146103 ◽

2021 ◽

pp. 146103

Author(s):

Dilini Sadeepa ◽

Kosala Sirisena ◽

Pathmalal M. Manage

Keyword(s):

Sri Lanka ◽

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

High Throughput ◽

High Throughput Sequencing ◽

Hot Springs ◽

Rrna Gene ◽

Sequencing Analysis

Download Full-text

Deep insights into the green nitrogen removal by anammox in four full-scale WWTPs treating landfill leachate based on 16S rRNA gene and transcripts by 16S rRNA high-throughput sequencing

Journal of Cleaner Production ◽

10.1016/j.jclepro.2020.124176 ◽

2020 ◽

Vol 276 ◽

pp. 124176 ◽

Cited By ~ 2

Author(s):

Yuchun Yang ◽

Meng Li ◽

Zhong Hu ◽

Hojae Shim ◽

Jih-Gaw Lin ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

Nitrogen Removal ◽

Landfill Leachate ◽

High Throughput Sequencing ◽

Full Scale ◽

Rrna Gene

Download Full-text

Wild specimens of sand fly phlebotomine Lutzomyia evansi, vector of leishmaniasis, show high abundance of Methylobacterium and natural carriage of Wolbachia and Cardinium types in the midgut microbiome

Scientific Reports ◽

10.1038/s41598-019-53769-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Rafael J. Vivero ◽

Marcela Villegas-Plazas ◽

Gloria E. Cadavid-Restrepo ◽

Claudia Ximena Moreno Herrera ◽

Sandra I. Uribe ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput Sequencing ◽

Sand Fly ◽

Human Populations ◽

Rrna Gene ◽

Remarkable Feature ◽

Core Microbiome ◽

Bacterial Phyla ◽

Wide Range

AbstractPhlebotomine sand flies are remarkable vectors of several etiologic agents (virus, bacterial, trypanosomatid Leishmania), posing a heavy health burden for human populations mainly located at developing countries. Their intestinal microbiota is involved in a wide range of biological and physiological processes, and could exclude or facilitate such transmission of pathogens. In this study, we investigated the Eubacterial microbiome from digestive tracts of Lu. evansi adults structure using 16S rRNA gene sequence amplicon high throughput sequencing (Illumina MiSeq) obtained from digestive tracts of Lu. evansi adults. The samples were collected at two locations with high incidence of the disease in humans: peri-urban and forest ecosystems from the department of Sucre, Colombia. 289,068 quality-filtered reads of V4 region of 16S rRNA gene were obtained and clustered into 1,762 operational taxonomic units (OTUs) with 97% similarity. Regarding eubacterial diversity, 14 bacterial phyla and 2 new candidate phyla were found to be consistently associated with the gut microbiome content. Proteobacteria, Firmicutes, and Bacteroidetes were the most abundant phyla in all the samples and the core microbiome was particularly dominated by Methylobacterium genus. Methylobacterium species, are known to have mutualistic relationships with some plants and are involved in shaping the microbial community in the phyllosphere. As a remarkable feature, OTUs classified as Wolbachia spp. were found abundant on peri-urban ecosystem samples, in adult male (OTUs n = 776) and unfed female (OTUs n = 324). Furthermore, our results provide evidence of OTUs classified as Cardinium endosymbiont in relative abundance, notably higher with respect to Wolbachia. The variation in insect gut microbiota may be determined by the environment as also for the type of feeding. Our findings increase the richness of the microbiota associated with Lu. evansi. In this study, OTUs of Methylobacterium found in Lu. evansi was higher in engorged females, suggesting that there are interactions between microbes from plant sources, blood nutrients and the parasites they transmit during the blood intake.

Download Full-text

Soehngenia longivitae sp. nov., a Fermenting Bacterium Isolated from a Petroleum Reservoir in Azerbaijan, and Emended Description of the Genus Soehngenia

Microorganisms ◽

10.3390/microorganisms8121967 ◽

2020 ◽

Vol 8 (12) ◽

pp. 1967

Author(s):

Tamara N. Nazina ◽

Salimat K. Bidzhieva ◽

Denis S. Grouzdev ◽

Diyana S. Sokolova ◽

Tatyana P. Tourova ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput Sequencing ◽

Biochemical Characterization ◽

Sequence Similarity ◽

Phylogenomic Analysis ◽

Rrna Gene ◽

Petroleum Reservoir ◽

Ph Range ◽

The 16S Rrna Gene

A methanogenic enrichment growing on a medium with methanol was obtained from a petroleum reservoir (Republic of Azerbaijan) and stored for 33 years without transfers to fresh medium. High-throughput sequencing of the V4 region of the 16S rRNA gene revealed members of the genera Desulfovibrio, Soehngenia, Thermovirga, Petrimonas, Methanosarcina, and Methanomethylovorans. A novel gram-positive, rod-shaped, anaerobic fermentative bacterium, strain 1933PT, was isolated from this enrichment and characterized. The strain grew at 13–55 °C (optimum 35 °C), with 0–3.0% (w/v) NaCl (optimum 0–2.0%) and in the pH range of 6.7–8.0 (optimum pH 7.0). The 16S rRNA gene sequence similarity, the average nucleotide identity (ANI) and in silico DNA–DNA hybridization (dDDH) values between strain 1933PT and the type strain of the most closely related species Soehngenia saccharolytica DSM 12858T were 98.5%, 70.5%, and 22.6%, respectively, and were below the threshold accepted for species demarcation. Genome-based phylogenomic analysis and physiological and biochemical characterization of the strain 1933PT (VKM B-3382T = KCTC 15984T) confirmed its affiliation to a novel species of the genus Soehngenia, for which the name Soehngenia longivitae sp. nov. is proposed. Genome analysis suggests that the new strain has potential in the degradation of proteinaceous components.

Download Full-text