scholarly journals Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Sandra Reitmeier ◽  
Thomas C. A. Hitch ◽  
Nicole Treichel ◽  
Nikolaos Fikas ◽  
Bela Hausmann ◽  
...  

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.

2019 ◽  
Author(s):  
Jean-Claude OGIER ◽  
Sylvie Pagès ◽  
Maxime Galan ◽  
Matthieu Barret ◽  
Sophie Gaudriault

Abstract Background Microbiome composition is frequently studied by the amplification and high-throughput sequencing of specific molecular markers (metabarcoding). Various hypervariable regions of the 16S rRNA gene are classically used to estimate bacterial diversity, but other universal bacterial markers with a finer taxonomic resolution could be employed. We compared specificity and sensitivity between a portion of the rpoB gene and the V3V4 hypervariable region of the 16S rRNA gene. Results We first designed universal primers for rpoB suitable for use with Illumina sequencing-based technology and constructed a reference rpoB database of 45,000 sequences. The rpoB and V3V4 markers were amplified and sequenced from (i) a mock community of 19 bacterial strains from both Gram-negative and Gram-positive lineages; (ii) bacterial assemblages associated with entomopathogenic nematodes. In metabarcoding analyses of mock communities with two analytical pipelines (FROGS and DADA2), the estimated diversity captured with the rpoB marker resembled the expected composition of these mock communities more closely than that captured with V3V4. The rpoB marker had a higher level of taxonomic affiliation, a higher sensitivity (detection of all the species present in the mock communities), and a higher specificity (low rates of spurious OTU detection) than V3V4. We applied both primers to infective juveniles of the nematode Steinernema glaseri. Both markers showed the bacterial community associated with this nematode to be of low diversity (< 50 OTUs), but only rpoB reliably detected the symbiotic bacterium Xenorhabdus poinarii. Conclusions Our results confirm that different microbiota composition data may be obtained with different markers. We found that rpoB was a highly appropriate marker for assessing the taxonomic structure of mock communities and the nematode microbiota. Further studies on other ecosystems should be considered to evaluate the universal usefulness of the rpoB marker. Our data highlight two crucial elements that should be taken into account to ensure more reliable and accurate descriptions of microbial diversity in high-throughput amplicon sequencing analyses: i) the need to include mock communities as controls; ii) the advantages of using a multigenic approach including at least one housekeeping gene (rpoB is a good candidate) and one variable region of the 16S rRNA gene.


2019 ◽  
Author(s):  
Jean-Claude OGIER ◽  
Sylvie Pagès ◽  
Maxime Galan ◽  
Matthieu Barret ◽  
Sophie Gaudriault

Abstract Background Microbiome composition is frequently studied by the amplification and high-throughput sequencing of specific molecular markers (metabarcoding). Various hypervariable regions of the 16S rRNA gene are classically used to estimate bacterial diversity, but other universal bacterial markers with a finer taxonomic resolution could be employed. We compared specificity and sensitivity between a portion of the rpoB gene and the V3V4 hypervariable region of the 16S rRNA gene. Results We first designed universal primers for rpoB suitable for use with Illumina sequencing-based technology and constructed a reference rpoB database of 45,000 sequences. The rpoB and V3V4 markers were amplified and sequenced from (i) a mock community of 19 bacterial strains from both Gram-negative and Gram-positive lineages; (ii) bacterial assemblages associated with entomopathogenic nematodes. In metabarcoding analyses of mock communities with two analytical pipelines (FROGS and DADA2), the estimated diversity captured with the rpoB marker resembled the expected composition of these mock communities more closely than that captured with V3V4. The rpoB marker had a higher level of taxonomic affiliation, a higher sensitivity (detection of all the species present in the mock communities), and a higher specificity (low rates of spurious OTU detection) than V3V4. We applied both primers to infective juveniles of the nematode Steinernema glaseri. Both markers showed the bacterial community associated with this nematode to be of low diversity (< 50 OTUs), but only rpoB reliably detected the symbiotic bacterium Xenorhabdus poinarii. Conclusions Our results confirm that different microbiota composition data may be obtained with different markers. We found that rpoB was a highly appropriate marker for assessing the taxonomic structure of mock communities and the nematode microbiota. Further studies on other ecosystems should be considered to evaluate the universal usefulness of the rpoB marker. Our data highlight two crucial elements that should be taken into account to ensure more reliable and accurate descriptions of microbial diversity in high-throughput amplicon sequencing analyses: i) the need to include mock communities as controls; ii) the advantages of using a multigenic approach including at least one housekeeping gene (rpoB is a good candidate) and one variable region of the 16S rRNA gene.


2019 ◽  
Author(s):  
Jean-Claude Ogier ◽  
Sylvie Pagès ◽  
Maxime Galan ◽  
Mathieu Barret ◽  
Sophie Gaudriault

AbstractBackgroundMicrobiome composition is frequently studied by the amplification and high-throughput sequencing of specific molecular markers (metabarcoding). Various hypervariable regions of the 16S rRNA gene are classically used to estimate bacterial diversity, but other universal bacterial markers with a finer taxonomic resolution could be employed. We compared specificity and sensitivity between a portion of the rpoB gene and the V3V4 hypervariable region of the 16S rRNA gene.ResultsWe first designed universal primers for rpoB suitable for use with Illumina sequencing-based technology and constructed a reference rpoB database of 45,000 sequences. The rpoB and V3V4 markers were amplified and sequenced from (i) a mock community of 19 bacterial strains from both Gram-negative and Gram-positive lineages; (ii) bacterial assemblages associated with entomopathogenic nematodes. In metabarcoding analyses of mock communities with two analytical pipelines (FROGS and DADA2), the estimated diversity captured with the rpoB marker resembled the expected composition of these mock communities more closely than that captured with V3V4. The rpoB marker had a higher level of taxonomic affiliation, a higher sensitivity (detection of all the species present in the mock communities), and a higher specificity (low rates of spurious OTU detection) than V3V4. We applied both primers to infective juveniles of the nematode Steinernema glaseri. Both markers showed the bacterial community associated with this nematode to be of low diversity (< 50 OTUs), but only rpoB reliably detected the symbiotic bacterium Xenorhabdus poinarii.ConclusionsOur results confirm that different microbiota composition data may be obtained with different markers. We found that rpoB was a highly appropriate marker for assessing the taxonomic structure of mock communities and the nematode microbiota. Further studies on other ecosystems should be considered to evaluate the universal usefulness of the rpoB marker. Our data highlight two crucial elements that should be taken into account to ensure more reliable and accurate descriptions of microbial diversity in high-throughput amplicon sequencing analyses: i) the need to include mock communities as controls; ii) the advantages of using a multigenic approach including at least one housekeeping gene (rpoB is a good candidate) and one variable region of the 16S rRNA gene.


2021 ◽  
Vol 10 (2) ◽  
Author(s):  
Junho Lee ◽  
Ilwon Jeong ◽  
Jong-Oh Kim ◽  
Kyunghoi Kim

ABSTRACT The Yeosu New Harbor in the South Korean benthic environment shows a mesotrophic environment affected by the Tsushima Current and the Seomjin River. Here, we report microbial diversity in sediments of Yeosu New Harbor based on 16S rRNA gene amplicon sequencing. The dominant bacterial phylum was Proteobacteria (relative abundance, 72.5 to 78.1%).


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Janis R. Bedarf ◽  
Naiara Beraza ◽  
Hassan Khazneh ◽  
Ezgi Özkurt ◽  
David Baker ◽  
...  

Abstract Background Recent studies suggested the existence of (poly-)microbial infections in human brains. These have been described either as putative pathogens linked to the neuro-inflammatory changes seen in Parkinson’s disease (PD) and Alzheimer’s disease (AD) or as a “brain microbiome” in the context of healthy patients’ brain samples. Methods Using 16S rRNA gene sequencing, we tested the hypothesis that there is a bacterial brain microbiome. We evaluated brain samples from healthy human subjects and individuals suffering from PD (olfactory bulb and pre-frontal cortex), as well as murine brains. In line with state-of-the-art recommendations, we included several negative and positive controls in our analysis and estimated total bacterial biomass by 16S rRNA gene qPCR. Results Amplicon sequencing did detect bacterial signals in both human and murine samples, but estimated bacterial biomass was extremely low in all samples. Stringent reanalyses implied bacterial signals being explained by a combination of exogenous DNA contamination (54.8%) and false positive amplification of host DNA (34.2%, off-target amplicons). Several seemingly brain-enriched microbes in our dataset turned out to be false-positive signals upon closer examination. We identified off-target amplification as a major confounding factor in low-bacterial/high-host-DNA scenarios. These amplified human or mouse DNA sequences were clustered and falsely assigned to bacterial taxa in the majority of tested amplicon sequencing pipelines. Off-target amplicons seemed to be related to the tissue’s sterility and could also be found in independent brain 16S rRNA gene sequences. Conclusions Taxonomic signals obtained from (extremely) low biomass samples by 16S rRNA gene sequencing must be scrutinized closely to exclude the possibility of off-target amplifications, amplicons that can only appear enriched in biological samples, but are sometimes assigned to bacterial taxa. Sequences must be explicitly matched against any possible background genomes present in large quantities (i.e., the host genome). Using close scrutiny in our approach, we find no evidence supporting the hypothetical presence of either a brain microbiome or a bacterial infection in PD brains.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jun-ichi Kanatani ◽  
Masanori Watahiki ◽  
Keiko Kimata ◽  
Tomoko Kato ◽  
Kaoru Uchida ◽  
...  

Abstract Background Legionellosis is caused by the inhalation of aerosolized water contaminated with Legionella bacteria. In this study, we investigated the prevalence of Legionella species in aerosols collected from outdoor sites near asphalt roads, bathrooms in public bath facilities, and other indoor sites, such as buildings and private homes, using amoebic co-culture, quantitative PCR, and 16S rRNA gene amplicon sequencing. Results Legionella species were not detected by amoebic co-culture. However, Legionella DNA was detected in 114/151 (75.5%) air samples collected near roads (geometric mean ± standard deviation: 1.80 ± 0.52 log10 copies/m3), which was comparable to the numbers collected from bathrooms [15/21 (71.4%), 1.82 ± 0.50] but higher than those collected from other indoor sites [11/30 (36.7%), 0.88 ± 0.56] (P < 0.05). The amount of Legionella DNA was correlated with the monthly total precipitation (r = 0.56, P < 0.01). It was also directly and inversely correlated with the daily total precipitation for seven days (r = 0.21, P = 0.01) and one day (r = − 0.29, P < 0.01) before the sampling day, respectively. 16S rRNA gene amplicon sequencing revealed that Legionella species were detected in 9/30 samples collected near roads (mean proportion of reads, 0.11%). At the species level, L. pneumophila was detected in 2/30 samples collected near roads (the proportion of reads, 0.09 and 0.11% of the total reads number in each positive sample). The three most abundant bacterial genera in the samples collected near roads were Sphingomonas, Streptococcus, and Methylobacterium (mean proportion of reads; 21.1%, 14.6%, and 1.6%, respectively). In addition, the bacterial diversity in outdoor environment was comparable to that in indoor environment which contains aerosol-generating features and higher than that in indoor environment without the features. Conclusions DNA from Legionella species was widely present in aerosols collected from outdoor sites near asphalt roads, especially during the rainy season. Our findings suggest that there may be a risk of exposure to Legionella species not only in bathrooms but also in the areas surrounding asphalt roads. Therefore, the possibility of contracting legionellosis in daily life should be considered.


2021 ◽  
Author(s):  
Seppo Virtanen ◽  
Schahzad Saqib ◽  
Tinja Kanerva ◽  
Pekka Nieminen ◽  
Ilkka Kalliala ◽  
...  

Abstract Background: Amplicon sequencing of kingdom-specific tags such as 16S rRNA gene for bacteria and internal transcribed spacer (ITS) region for fungi are widely used for investigating microbial populations. So far most human studies have focused on bacteria while studies on host-associated fungi in health and disease have only recently started to accumulate. To enable cost-effective parallel analysis of bacterial and fungal communities in human and environmental samples, we developed a method where 16S rRNA gene and ITS-1 amplicons were pooled together for a single Illumina MiSeq or HiSeq run and analysed after primer-based segregation. Taxonomic assignments were performed with Blast in combination with an iterative text-extraction based filtration approach, which uses extensive literature records from public databases to select the most probable hits that were further validated by shotgun metagenomic sequencing. Results: Using 50 vaginal samples, we show that the combined run provides comparable results on bacterial composition and diversity to conventional 16S rRNA gene amplicon sequencing. The text-extraction-based taxonomic assignment guided tool provided ecosystem specific annotations that were confirmed by Metagenomic Phylogenetic Analysis (MetaPhlAn). The metagenome analysis revealed distinct functional differences between the bacterial community types while fungi were undetected, despite being identified in all samples based on ITS amplicons. Co-abundance analysis of bacteria and fungi did not show strong between-kingdom correlations within the vaginal ecosystem of healthy women.Conclusion: Combined amplicon sequencing for bacteria and fungi provides a simple and cost-effective method for simultaneous analysis of microbiota and mycobiota within the same samples. Text extraction-based annotation tool facilitates the characterization and interpretation of defined microbial communities from rapidly accumulating sequencing and metadata readily available through public databases.


2019 ◽  
Vol 47 (18) ◽  
pp. e103-e103 ◽  
Author(s):  
Benjamin J Callahan ◽  
Joan Wong ◽  
Cheryl Heiner ◽  
Steve Oh ◽  
Casey M Theriot ◽  
...  

AbstractTargeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate. In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowed Escherichia coli strains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in several E. coli strains. There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use.


2020 ◽  
Vol 11 ◽  
Author(s):  
Daniel Straub ◽  
Nia Blackwell ◽  
Adrian Langarica-Fuentes ◽  
Alexander Peltzer ◽  
Sven Nahnsen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document