scholarly journals Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Jethro S. Johnson ◽  
Daniel J. Spakowicz ◽  
Bo-Young Hong ◽  
Lauren M. Petersen ◽  
Patrick Demkowicz ◽  
...  

Abstract The 16S rRNA gene has been a mainstay of sequence-based bacterial analysis for decades. However, high-throughput sequencing of the full gene has only recently become a realistic prospect. Here, we use in silico and sequence-based experiments to critically re-evaluate the potential of the 16S gene to provide taxonomic resolution at species and strain level. We demonstrate that targeting of 16S variable regions with short-read sequencing platforms cannot achieve the taxonomic resolution afforded by sequencing the entire (~1500 bp) gene. We further demonstrate that full-length sequencing platforms are sufficiently accurate to resolve subtle nucleotide substitutions (but not insertions/deletions) that exist between intragenomic copies of the 16S gene. In consequence, we argue that modern analysis approaches must necessarily account for intragenomic variation between 16S gene copies. In particular, we demonstrate that appropriate treatment of full-length 16S intragenomic copy variants has the potential to provide taxonomic resolution of bacterial communities at species and strain level.

2021 ◽  
Author(s):  
Yuta Kinoshita ◽  
Hidekazu NIWA ◽  
Eri UCHIDA-FUJII ◽  
Toshio NUKADA

Abstract Microbial communities are commonly studied by using amplicon sequencing of part of the 16S rRNA gene. Sequencing of the full-length 16S rRNA gene can provide higher taxonomic resolution and accuracy. To obtain even higher taxonomic resolution, with as few false-positives as possible, we assessed a method using long amplicon sequencing targeting the rRNA operon combined with a CCMetagen pipeline. Taxonomic assignment had >90% accuracy at the species level in a mock sample and at the family level in equine fecal samples, generating similar taxonomic composition as shotgun sequencing. The rRNA operon amplicon sequencing of equine fecal samples underestimated compositional percentages of bacterial strains containing unlinked rRNA genes by a third to almost a half, but unlinked rRNA genes had a limited effect on the overall results. The rRNA operon amplicon sequencing with the A519F + U2428R primer set was able to reflect archaeal genomes, whereas full-length 16S rRNA with 27F + 1492R could not. Therefore, we conclude that amplicon sequencing targeting the rRNA operon captures more detailed variations of bacterial and archaeal microbiota.


2014 ◽  
Author(s):  
Catherine Burke ◽  
Aaron E Darling

We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.


Author(s):  
Patrick D Schloss ◽  
Matthew L Jenior ◽  
Charles C. Koumpouras ◽  
Sarah L Westcott ◽  
Sarah K Highlander

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.


2016 ◽  
Author(s):  
Patrick D Schloss ◽  
Matthew L Jenior ◽  
Charles C. Koumpouras ◽  
Sarah L Westcott ◽  
Sarah K Highlander

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.


2018 ◽  
Vol 84 (7) ◽  
Author(s):  
Jolinda Pollock ◽  
Laura Glendinning ◽  
Trong Wisedchanwet ◽  
Mick Watson

ABSTRACTThe development and continuous improvement of high-throughput sequencing platforms have stimulated interest in the study of complex microbial communities. Currently, the most popular sequencing approach to study microbial community composition and dynamics is targeted 16S rRNA gene metabarcoding. To prepare samples for sequencing, there are a variety of processing steps, each with the potential to introduce bias at the data analysis stage. In this short review, key information from the literature pertaining to each processing step is described, and consequently, general recommendations for future 16S rRNA gene metabarcoding experiments are made.


2018 ◽  
Author(s):  
Szymon T Calus ◽  
Umer Z Ijaz ◽  
Ameet J Pinto

AbstractBackgroundAmplicon sequencing on Illumina sequencing platforms leverages their deep sequencing and multiplexing capacity, but is limited in genetic resolution due to short read lengths. While Oxford Nanopore or Pacific Biosciences platforms overcome this limitation, their application has been limited due to higher error rates or smaller data output.ResultsIn this study, we introduce an amplicon sequencing workflow, i.e., NanoAmpli-Seq, that builds on Intramolecular-ligated Nanopore Consensus Sequencing (INC-Seq) approach and demonstrate its application for full-length 16S rRNA gene sequencing. NanoAmpli-Seq includes vital improvements to the aforementioned protocol that reduces sample-processing time while significantly improving sequence accuracy. The developed protocol includes chopSeq software for fragmentation and read orientation correction of INC-Seq consensus reads while nanoClust algorithm was designed for read partitioning-based de novo clustering and within cluster consensus calling to obtain full-length 16S rRNA gene sequences.ConclusionsNanoAmpli-Seq accurately estimates the diversity of tested mock communities with average sequence accuracy of 99.5% for 2D and 1D2 sequencing on the nanopore sequencing platform. Nearly all residual errors in NanoAmpli-Seq sequences originate from deletions in homopolymer regions, indicating that homopolymer aware basecalling or error correction may allow for sequencing accuracy comparable to short-read sequencing platforms.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1869 ◽  
Author(s):  
Patrick D. Schloss ◽  
Matthew L. Jenior ◽  
Charles C. Koumpouras ◽  
Sarah L. Westcott ◽  
Sarah K. Highlander

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.


2017 ◽  
Author(s):  
Anna Cuscó ◽  
Joaquim Viñes ◽  
Sara D’Andreano ◽  
Francesca Riva ◽  
Joaquim Casellas ◽  
...  

AbstractThe most common strategy to assess microbiota is sequencing specific hypervariable regions of 16S rRNA gene using 2ndgeneration platforms (such as MiSeq or Ion Torrent PGM). Despite obtaining high-quality reads, many sequences fail to be classified at the genus or species levels due to their short length. This pitfall can be overcome sequencing the full-length 16S rRNA gene (1,500bp) by 3rdgeneration sequencers.We aimed to assess the performance of nanopore sequencing using MinION™on characterizing microbiota complex samples. First set-up step was performed using a staggered mock community (HM-783D). Then, we sequenced a pool of several dog skin microbiota samples previously sequenced by Ion Torrent PGM. Sequences obtained for full-length 16S rRNA with degenerated primers retrieved increased richness estimates at high taxonomic level (Bacteria and Archaea) that were missed with short-reads. Besides, we were able to obtain taxonomic assignments down to species level, although it was not always feasible due to: i) incomplete database; ii) primer set chosen; iii) low taxonomic resolution of 16S rRNA gene within some genera; and/or iv) sequencing errors. Nanopore sequencing of the full-length 16S rRNA gene using MinION™with 1D sequencing kit allowed us inferring microbiota composition of a complex microbial community to lower taxonomic levels than short-reads from 2ndgeneration sequencers.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yuta Kinoshita ◽  
Hidekazu Niwa ◽  
Eri Uchida-Fujii ◽  
Toshio Nukada

AbstractMicrobial communities are commonly studied by using amplicon sequencing of part of the 16S rRNA gene. Sequencing of the full-length 16S rRNA gene can provide higher taxonomic resolution and accuracy. To obtain even higher taxonomic resolution, with as few false-positives as possible, we assessed a method using long amplicon sequencing targeting the rRNA operon combined with a CCMetagen pipeline. Taxonomic assignment had > 90% accuracy at the species level in a mock sample and at the family level in equine fecal samples, generating similar taxonomic composition as shotgun sequencing. The rRNA operon amplicon sequencing of equine fecal samples underestimated compositional percentages of bacterial strains containing unlinked rRNA genes by a fourth to a third, but unlinked rRNA genes had a limited effect on the overall results. The rRNA operon amplicon sequencing with the A519F + U2428R primer set was able to detect some kind of archaeal genomes such as Methanobacteriales and Methanomicrobiales, whereas full-length 16S rRNA with 27F + 1492R could not. Therefore, we conclude that amplicon sequencing targeting the rRNA operon captures more detailed variations of equine microbiota.


Sign in / Sign up

Export Citation Format

Share Document