scholarly journals Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling

Author(s):  
Sandra Reitmeier ◽  
Thomas CA Hitch ◽  
Nikolaos Fikas ◽  
Bela Hausmann ◽  
Amanda E Ramer-Tait ◽  
...  

Abstract Background: 16S rRNA gene amplicon sequencing is a very popular approach for studying microbiomes. However, varying standards exist for sample and data processing and some basic concepts such as the occurrence of spurious sequences have not been investigated in a comprehensive manner, which was done in the present study. Methods: Using defined communities of bacteria in vitro and in vivo , we searched for sequences not matching the expected species ( i.e. , spurious taxa) and determine a threshold of occurrence relevant for adequate data analysis. The origin of spurious taxa was then investigated via large-scale amplicon queries. We also assessed the impact of varying sequence filtering stringency on diversity readouts in human fecal and peat soil communities. Results: 16S rRNA gene amplicon data processing based on Operational Taxonomic Units (OTUs) clustering and singleton removal, a commonly used approach that discards any taxa represented by only one sequence across all samples, delivered approx. 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa on average. This spurious fraction of taxa was lower based on amplicon sequence variants (ASVs) analysis but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was identified as a threshold below which the analysis of spurious taxa can be prevented to a large extent. Most spurious taxa (approx. 70%) detected in simplified communities occurred in samples multiplexed in the same sequencing run and were present in only one of ten runs. Use of the 0.25% relative abundance threshold decreased the coefficient of variations calculated on richness in the same six human fecal samples across seven sequencing runs by 38% compared with singleton filtering. The output of beta -diversity analyses of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparing samples. Importantly, major findings were confirmed by using data generated in a second sequencing facility. Conclusions: Handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. A threshold of relative abundance of 0.25% is more appropriate than singleton removal, although study-specific analysis strategies are mandatory. We propose the concept of effective richness, which will help comparing results across studies.

2020 ◽  
Author(s):  
Thomas Clavel ◽  
Sandra Reitmeier ◽  
Thomas CA Hitch ◽  
Nicole Treichel ◽  
Nikolaos Fikas ◽  
...  

Abstract Background: 16S rRNA gene amplicon sequencing is a very popular approach for studying microbiomes. However, varying standards exist for sample and data processing and some basic concepts, such as the occurrence of spurious sequences, have not been investigated in a comprehensive manner. Methods: Using defined communities of bacteria in vitro and in vivo, we searched for sequences not matching the expected species (i.e., spurious taxa) and determined a minimum threshold of occurrence suitable for robust data analysis. The presence and origin of spurious taxa were investigated via large-scale amplicon queries and gut samples from germfree mice spiked with target mock DNA. We also assessed the effect of varying sequence-filtering stringency on diversity readouts in human fecal and peat soil communities. Our findings are based on data generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches.Results: 16S rRNA gene amplicon data-processing based on OTUs clustering and singleton removal, a commonly used approach that discards any taxa represented by only one sequence across all samples, delivered an average approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Most spurious taxa (approx. 70%) detected in simplified communities occurred in samples multiplexed in the same sequencing run and were present in only one of ten runs. DNase treatment of gut content from germfree mice partly helped to exclude spurious taxa from the analysis of spiked mock DNA, but was not necessary when applying the 0.25% relative abundance threshold. Using this cut-off improved the reproducibility of analysis, i.e., specifically by reducing variation in richness estimates by 38% compared with singleton filtering in a benchmarking experiment using six human fecal samples across seven sequencing runs. Beta-diversity analyses of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparing samples, highlighting the importance of carefully analyzing data before drawing conclusions. Conclusions: Handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. Applying a minimum relative abundance threshold between 0.10 and 0.30% is superior to the singleton removal approach, although study-specific analysis strategies may be needed depending on, for instance, the type of samples analyzed and the sequencing depth achieved. Additionally, we propose the concept of effective richness to facilitate the comparison of results across studies.


1989 ◽  
Vol 9 (12) ◽  
pp. 5650-5659 ◽  
Author(s):  
E Sun ◽  
B W Wu ◽  
K K Tewari

A cloned pea chloroplast 16S rRNA gene promoter has been characterized in detail by use of a homologous in vitro transcription system that contains a highly purified chloroplast RNA polymerase. The in vivo and in vitro 16S rRNA transcriptional start site has been identified to be a T on the plus strand, 158 bases upstream of the mature 5' end of the gene. BAL 31 deletions of the 16S rRNA leader region demonstrated that the bases between -66 to +30 relative to the transcriptional start site (+1) are necessary for specific 16S transcription. Disruption of canonical TTGACA or TATAAT elements within this region caused complete transcriptional inactivation and prevented protein binding. The topological requirement for 16S transcription was examined by using a construct that synthesized a transcript from the 16S promoter and released it from a pea plastid putative terminator sequence. This minigene was relaxed in vitro with a topoisomerase I from pea chloroplast. It was shown that the 16S promoter was most active when the minigene plasmid was supercoiled.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10372
Author(s):  
Jose F. Garcia-Mazcorro ◽  
Jorge R. Kawas ◽  
Cuauhtemoc Licona Cassani ◽  
Susanne Mertens-Talcott ◽  
Giuliana Noratto

Background One of the main functions of diet is to nurture the gut microbiota and this relationship affects the health of the host. However, different analysis strategies can generate different views on the relative abundance of each microbial taxon, which can affect our conclusions about the significance of diet to gut health in lean and obese subjects. Here we explored the impact of using different analysis strategies to study the gut microbiota in a context of diet, health and obesity. Methods Over 15 million 16S rRNA gene sequences from published studies involving dietary interventions in obese laboratory rodents were analyzed. Three strategies were used to assign the 16S sequences to Operational Taxonomic Units (OTUs) based on the GreenGenes reference OTU sequence files clustered at 97% and 99% similarity. Results Different strategies to select OTUs influenced the relative abundance of all bacterial taxa, but the magnitude of this phenomenon showed a strong study effect. Different taxa showed up to 20% difference in relative abundance within the same study, depending on the analysis strategy. Very few OTUs were shared among the samples. ANOSIM test on unweighted UniFrac distances showed that study, sequencing technique, animal model, and dietary treatment (in that order) were the most important factors explaining the differences in bacterial communities. Except for obesity status, the contribution of diet and other factors to explain the variability in bacterial communities was lower when using weighted UniFrac distances. Predicted functional profile and high-level phenotypes of the microbiota showed that each study was associated with unique features and patterns. Conclusions The results confirm previous findings showing a strong study effect on gut microbial composition and raise concerns about the impact of analysis strategies on the membership and composition of the gut microbiota. This study may be helpful to guide future research aiming to investigate the relationship between diet, health, and the gut microbiota.


Author(s):  
Chloé Le Roy ◽  
Arabella Touati ◽  
Carla Balcon ◽  
Justine Garraud ◽  
Jean-Michel Molina ◽  
...  

Abstract Objectives Tetracyclines are widely used for the treatment of bacterial sexually transmitted infections (STIs) and recently have been used successfully for post-exposure prophylaxis of STIs in MSM. We investigated the in vitro and in vivo development of tetracycline resistance in Chlamydia trachomatis and Mycoplasma genitalium and evaluated 16S rRNA mutations associated with acquired resistance in other bacteria. Methods In vitro selection of resistant mutants of reference strains of C. trachomatis and M. genitalium was undertaken by serial passage in medium containing subinhibitory concentrations of tetracycline or doxycycline, respectively. The 16S rRNA gene of the two microorganisms was amplified and sequenced at different passages, as were those of 43 C. trachomatis- and 106 M. genitalium-positive specimens collected in France from 2013 to 2019. Results No tetracycline- or doxycycline-resistant strains of C. trachomatis and M. genitalium, respectively, were obtained after 30 serial passages. The tetracycline and doxycycline MICs were unchanged and analysis of the 16S rRNA gene, the molecular target of tetracyclines, of C. trachomatis and M. genitalium revealed no mutation. No mutation in the 16S rRNA gene was detected in C. trachomatis-positive specimens. However, six M. genitalium-positive specimens harboured a mutation potentially associated with tetracycline resistance without known prior tetracycline treatment for patients. Conclusions Tetracyclines did not select in vitro-resistant mutants of C. trachomatis or M. genitalium. However, 16S rRNA mutations either responsible for or associated with tetracycline resistance in other bacteria, including mycoplasma species, were identified in several M. genitalium-positive specimens.


1989 ◽  
Vol 9 (12) ◽  
pp. 5650-5659
Author(s):  
E Sun ◽  
B W Wu ◽  
K K Tewari

A cloned pea chloroplast 16S rRNA gene promoter has been characterized in detail by use of a homologous in vitro transcription system that contains a highly purified chloroplast RNA polymerase. The in vivo and in vitro 16S rRNA transcriptional start site has been identified to be a T on the plus strand, 158 bases upstream of the mature 5' end of the gene. BAL 31 deletions of the 16S rRNA leader region demonstrated that the bases between -66 to +30 relative to the transcriptional start site (+1) are necessary for specific 16S transcription. Disruption of canonical TTGACA or TATAAT elements within this region caused complete transcriptional inactivation and prevented protein binding. The topological requirement for 16S transcription was examined by using a construct that synthesized a transcript from the 16S promoter and released it from a pea plastid putative terminator sequence. This minigene was relaxed in vitro with a topoisomerase I from pea chloroplast. It was shown that the 16S promoter was most active when the minigene plasmid was supercoiled.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Denise M. O’Sullivan ◽  
Ronan M. Doyle ◽  
Sasithon Temisak ◽  
Nicholas Redshaw ◽  
Alexandra S. Whale ◽  
...  

AbstractDespite the advent of whole genome metagenomics, targeted approaches (such as 16S rRNA gene amplicon sequencing) continue to be valuable for determining the microbial composition of samples. Amplicon microbiome sequencing can be performed on clinical samples from a normally sterile site to determine the aetiology of an infection (usually single pathogen identification) or samples from more complex niches such as human mucosa or environmental samples where multiple microorganisms need to be identified. The methodologies are frequently applied to determine both presence of micro-organisms and their quantity or relative abundance. There are a number of technical steps required to perform microbial community profiling, many of which may have appreciable precision and bias that impacts final results. In order for these methods to be applied with the greatest accuracy, comparative studies across different laboratories are warranted. In this study we explored the impact of the bioinformatic approaches taken in different laboratories on microbiome assessment using 16S rRNA gene amplicon sequencing results. Data were generated from two mock microbial community samples which were amplified using primer sets spanning five different variable regions of 16S rRNA genes. The PCR-sequencing analysis included three technical repeats of the process to determine the repeatability of their methods. Thirteen laboratories participated in the study, and each analysed the same FASTQ files using their choice of pipeline. This study captured the methods used and the resulting sequence annotation and relative abundance output from bioinformatic analyses. Results were compared to digital PCR assessment of the absolute abundance of each target representing each organism in the mock microbial community samples and also to analyses of shotgun metagenome sequence data. This ring trial demonstrates that the choice of bioinformatic analysis pipeline alone can result in different estimations of the composition of the microbiome when using 16S rRNA gene amplicon sequencing data. The study observed differences in terms of both presence and abundance of organisms and provides a resource for ensuring reproducible pipeline development and application. The observed differences were especially prevalent when using custom databases and applying high stringency operational taxonomic unit (OTU) cut-off limits. In order to apply sequencing approaches with greater accuracy, the impact of different analytical steps needs to be clearly delineated and solutions devised to harmonise microbiome analysis results.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Christine Drengenes ◽  
Tomas M. L. Eagan ◽  
Ingvild Haaland ◽  
Harald G. Wiker ◽  
Rune Nielsen

Abstract Background Studies on the airway microbiome have been performed using a wide range of laboratory protocols for high-throughput sequencing of the bacterial 16S ribosomal RNA (16S rRNA) gene. We sought to determine the impact of number of polymerase chain reaction (PCR) steps (1- or 2- steps) and choice of target marker gene region (V3 V4 and V4) on the presentation of the upper and lower airway microbiome. Our analyses included lllumina MiSeq sequencing following three setups: Setup 1 (2-step PCR; V3 V4 region), Setup 2 (2-step PCR; V4 region), Setup 3 (1-step PCR; V4 region). Samples included oral wash, protected specimen brushes and protected bronchoalveolar lavage (healthy and obstructive lung disease), and negative controls. Results The number of sequences and amplicon sequence variants (ASV) decreased in order setup1 > setup2 > setup3. This trend appeared to be associated with an increased taxonomic resolution when sequencing the V3 V4 region (setup 1) and an increased number of small ASVs in setups 1 and 2. The latter was considered a result of contamination in the two-step PCR protocols as well as sequencing across multiple runs (setup 1). Although genera Streptococcus, Prevotella, Veillonella and Rothia dominated, differences in relative abundance were observed across all setups. Analyses of beta-diversity revealed that while oral wash samples (high biomass) clustered together regardless of number of PCR steps, samples from the lungs (low biomass) separated. The removal of contaminants identified using the Decontam package in R, did not resolve differences in results between sequencing setups. Conclusions Differences in number of PCR steps will have an impact of final bacterial community descriptions, and more so for samples of low bacterial load. Our findings could not be explained by differences in contamination levels alone, and more research is needed to understand how variations in PCR-setups and reagents may be contributing to the observed protocol bias.


2013 ◽  
Vol 167 (4) ◽  
pp. 393-403 ◽  
Author(s):  
Jung Soh ◽  
Xiaoli Dong ◽  
Sean M. Caffrey ◽  
Gerrit Voordouw ◽  
Christoph W. Sensen

2018 ◽  
Vol 7 (14) ◽  
Author(s):  
Kyunghoi Kim

Deterioration of sediment quality has been found in the Nakdong River Estuary after large-scale reclamations. Here, I report microbial diversity in sediments of Nakdong River Estuary in the Republic of Korea based on 16S rRNA gene sequencing by next-generation sequencing (NGS) techniques.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12097
Author(s):  
Yaowanoot Promnuan ◽  
Saran Promsai ◽  
Wasu Pathom-aree ◽  
Sujinan Meelai

This study aimed to investigate cultivable actinomycetes associated with rare honey bee species in Thailand and their antagonistic activity against plant pathogenic bacteria. Actinomycetes were selectively isolated from the black dwarf honey bee (Apis andreniformis). A total of 64 actinomycete isolates were obtained with Streptomyces as the predominant genus (84.4%) followed by Micromonospora (7.8%), Nonomuraea (4.7%) and Actinomadura (3.1%). All isolates were screened for antimicrobial activity against Xanthomonas campestris pv. campestris, Pectobacterium carotovorum and Pseudomonas syringae pv. sesame. Three isolates inhibited the growth of X. campestris pv. campestris during in vitro screening. The crude extracts of two isolates (ASC3-2 and ASC5-7P) had a minimum inhibitory concentration (MIC) of 128 mg L−1against X. campestris pv. campestris. For isolate ACZ2-27, its crude extract showed stronger inhibitory effect with a lower MIC value of 64 mg L−1 against X. campestris pv. campestris. These three active isolates were identified as members of the genus Streptomyces based on their 16S rRNA gene sequences. Phylogenetic analysis based on the maximum likelihood algorithm showed that isolate ACZ2-27, ASC3-2 and ASC5-7P were closely related to Streptomyces misionensis NBRC 13063T (99.71%), Streptomyces cacaoi subsp. cacaoi NBRC 12748T (100%) and Streptomyces puniceus NBRC 12811T (100%), respectively. In addition, representative isolates from non-Streptomyces groups were identified by 16S rRNA gene sequence analysis. High similarities were found with members of the genera Actinomadura, Micromonospora and Nonomuraea. Our study provides evidence of actinomycetes associated with the black dwarf honey bee including members of rare genera. Antimicrobial potential of these insect associated Streptomyces was also demonstrated especially the antibacterial activity against phytopathogenic bacteria.


Sign in / Sign up

Export Citation Format

Share Document