Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

Download Full-text

Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

10.7287/peerj.preprints.778v2 ◽

2016 ◽

Cited By ~ 2

Author(s):

Patrick D Schloss ◽

Matthew L Jenior ◽

Charles C. Koumpouras ◽

Sarah L Westcott ◽

Sarah K Highlander

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Dna Sequencing ◽

Error Rate ◽

Full Length ◽

Rrna Genes ◽

Rrna Gene ◽

Mock Community ◽

Sequencing Platforms ◽

The 16S Rrna Gene

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

Download Full-text

Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

10.7287/peerj.preprints.778 ◽

2016 ◽

Author(s):

Patrick D Schloss ◽

Matthew L Jenior ◽

Charles C. Koumpouras ◽

Sarah L Westcott ◽

Sarah K Highlander

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Dna Sequencing ◽

Error Rate ◽

Full Length ◽

Rrna Genes ◽

Rrna Gene ◽

Mock Community ◽

Sequencing Platforms ◽

The 16S Rrna Gene

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

Download Full-text

Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

10.7287/peerj.preprints.778v1 ◽

2015 ◽

Cited By ~ 5

Author(s):

Patrick D Schloss ◽

Sarah L Westcott ◽

Matthew L Jenior ◽

Sarah K Highlander

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Dna Sequencing ◽

Error Rate ◽

Sequencing Error ◽

Rrna Gene ◽

Sequencing Data ◽

Mock Community ◽

Sequencing Platforms ◽

The 16S Rrna Gene

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. These platforms have allowed researchers to significantly improve the design of their experiments. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The synthetic mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 2.16% to 0.32%. Unfortunately, this error rate was still 16-times higher than the error rate that has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the longer reads frequently provided better classification, the wider adoption of this approach for 16S rRNA gene sequencing is likely limited by its high sequencing error and low yield of sequencing data relative to the other available platforms.

Download Full-text

Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes

10.1101/338731 ◽

2018 ◽

Cited By ~ 2

Author(s):

Joshua P. Earl ◽

Nithin D. Adappa ◽

Jaroslaw Krol ◽

Archana S. Bhat ◽

Sergey Balashov ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Mock Community ◽

Analysis Pipeline ◽

Pacific Biosciences

AbstractBackgroundPan-bacterial 16S rRNA microbiome surveys performed with massively parallel DNA sequencing technologies have transformed community microbiological studies. Current 16S profiling methods, however, fail to provide sufficient taxonomic resolution and accuracy to adequately perform species-level associative studies for specific conditions. This is due to the amplification and sequencing of only short 16S rRNA gene regions, typically providing for only family- or genus-level taxonomy. Moreover, sequencing errors often inflate the number of taxa present. Pacific Biosciences’ (PacBio’s) long-read technology in particular suffers from high error rates per base. Herein we present a microbiome analysis pipeline that takes advantage of PacBio circular consensus sequencing (CCS) technology to sequence and error correct full-length bacterial 16S rRNA genes, which provides high-fidelity species-level microbiome dataResultsAnalysis of a mock community with 20 bacterial species demonstrated 100% specificity and sensitivity. Examination of a 250-plus species mock community demonstrated correct species-level classification of >90% of taxa and relative abundances were accurately captured. The majority of the remaining taxa were demonstrated to be multiply, incorrectly, or incompletely classified. Using this methodology, we examined the microgeographic variation present among the microbiomes of six sinonasal sites, by both swab and biopsy, from the anterior nasal cavity to the sphenoid sinus from 12 subjects undergoing trans-sphenoidal hypophysectomy. We found greater variation among subjects than among sites within a subject, although significant within-individual differences were also observed.Propiniobacterium acnes(recently renamedCutibacterium acnes[1]) was the predominant species throughout, but was found at distinct relative abundances by site.ConclusionsOur microbial composition analysis pipeline for single-molecule real-time 16S rRNA gene sequencing (MCSMRT,https://github.com/jpearl01/mcsmrt) overcomes deficits of standard marker gene based microbiome analyses by using CCS of entire 16S rRNA genes to provide increased taxonomic and phylogenetic resolution. Extensions of this approach to other marker genes could help refine taxonomic assignments of microbial species and improve reference databases, as well as strengthen the specificity of associations between microbial communities and dysbiotic states.

Download Full-text

Accurate Determination of Bacterial Abundances in Human Metagenomes Using Full-length 16S Sequencing Reads

10.1101/228619 ◽

2017 ◽

Cited By ~ 2

Author(s):

Fanny Perraudeau ◽

Sandrine Dudoit ◽

James H. Bullard

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Dna Sequencing ◽

Accurate Determination ◽

Full Length ◽

Sequencing Error ◽

Marker Genes ◽

Rrna Gene ◽

Chain Model ◽

The 16S Rrna Gene

AbstractDNA sequencing of PCR-amplified marker genes, especially but not limited to the 16S rRNA gene, is perhaps the most common approach for profiling microbial communities. Due to technological constraints of commonly available DNA sequencing, these approaches usually take the form of short reads sequenced from a narrow, targeted variable region, with a corresponding loss of taxonomic resolution relative to the full length marker gene. We use Pacific Biosciences single-molecule, real-time circular consensus sequencing to sequence amplicons spanning the entire length of the 16S rRNA gene. However, this sequencing technology suffers from high sequencing error rate that needs to be addressed in order to take full advantage of the longer sequence. Here, we present a method to model the sequencing error process using a generalized pair hidden Markov chain model and estimate bacterial abundances in microbial samples. We demonstrate, with simulated and real data, that our model and its associated estimation procedure are able to give accurate estimates at the species (or subspecies) level, and is more flexible than existing methods like SImple Non-Bayesian TAXonomy (SINTAX).

Download Full-text

Amplicon sequence variants artificially split bacterial genomes into separate clusters

10.1101/2021.02.26.433139 ◽

2021 ◽

Author(s):

Patrick D. Schloss

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Bacterial Genome ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Bacterial Genomes ◽

A Genome ◽

The 16S Rrna Gene

AbstractAmplicon sequencing variants (ASVs) have been proposed as an alternative to operational taxonomic units (OTUs) for analyzing microbial communities. ASVs have grown in popularity, in part, because of a desire to reflect a more refined level of taxonomy since they do not cluster sequences based on a distance-based threshold. However, ASVs and the use of overly narrow thresholds to identify OTUs increase the risk of splitting a single genome into separate clusters. To assess this risk, I analyzed the intragenomic variation of 16S rRNA genes from the bacterial genomes represented in a rrn copy number database, which contained 20,427 genomes from 5,972 species. As the number of copies of the 16S rRNA gene increased in a genome, the number of ASVs also increased. There was an average of 0.58 ASVs per copy of the 16S rRNA gene for full length 16S rRNA genes. It was necessary to use a distance threshold of 5.25% to cluster full length ASVs from the same genome into a single OTU with 95% confidence for genomes with 7 copies of the 16S rRNA, such as E. coli. This research highlights the risk of splitting a single bacterial genome into separate clusters when ASVs are used to analyze 16S rRNA gene sequence data. Although there is also a risk of clustering ASVs from different species into the same OTU when using broad distance thresholds, those risks are of less concern than artificially splitting a genome into separate ASVs and OTUs.

Download Full-text

Clover proliferation phytoplasma: ‘Candidatus Phytoplasma trifolii’

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.02842-0 ◽

2004 ◽

Vol 54 (4) ◽

pp. 1349-1353 ◽

Cited By ~ 51

Author(s):

Chuji Hiruki ◽

Keri Wang

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Reference Strain ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Beet Leafhopper ◽

Signature Sequences ◽

Elm Yellows ◽

The 16S Rrna Gene

Clover proliferation phytoplasma (CPR) is designated as the reference strain for the CP phylogenetic group or subclade, on the basis of molecular analyses of genomic DNA, the 16S rRNA gene and the 16S–23S spacer region. Other strains related to CPR include alfalfa witches'-broom (AWB), brinjal little leaf (BLL), beet leafhopper-transmitted virescence (BLTV), Illinois elm yellows (ILEY), potato witches'-broom (PWB), potato yellows (PY), tomato big bud in California (TBBc) and phytoplasmas from Fragaria multicipita (FM). Phylogenetic analysis of the 16S rRNA gene sequences of BLL, CPR, FM and ILEY, together with sequences from 16 other phytoplasmas that belong to the ash yellows (AshY), jujube witches'-broom (JWB) and elm yellows (EY) groups that were available in GenBank, produced a tree on which these phytoplasmas clearly clustered as a discrete group. Three subgroups have been classified on the basis of sequence homology and the collective RFLP patterns of amplified 16S rRNA genes. AWB, BLTV, PWB and TBBc are assigned to taxonomic subgroup CP-A, FM belongs to subgroup CP-B and BLL and ILEY are assigned to subgroup CP-C. Genetic heterogeneity between different isolates of AWB, CPR and PWB has been observed from heteroduplex mobility assay analysis of amplified 16S rRNA genes and the 16S–23S spacer region. Two unique signature sequences that can be utilized to distinguish the CP group from others were present. On the basis of unique properties of the DNA from clover proliferation phytoplasma, the name ‘Candidatus Phytoplasma trifolii’ is proposed for the CP group.

Download Full-text

Characterization of polybacterial clinical samples using a set of group-specific broad-range primers targeting the 16S rRNA gene followed by DNA sequencing and RipSeq analysis

Journal of Medical Microbiology ◽

10.1099/jmm.0.028373-0 ◽

2011 ◽

Vol 60 (7) ◽

pp. 927-936 ◽

Cited By ~ 26

Author(s):

Øyvind Kommedal ◽

Katrine Lekang ◽

Nina Langeland ◽

Harald G. Wiker

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Dna Sequencing ◽

Clinical Samples ◽

Rrna Gene ◽

The 16S Rrna Gene

Download Full-text

A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq

PeerJ ◽

10.7717/peerj.2492 ◽

2016 ◽

Vol 4 ◽

pp. e2492 ◽

Cited By ~ 29

Author(s):

Catherine M. Burke ◽

Aaron E. Darling

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

Single Molecule ◽

Illumina Miseq ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Bacterial Taxonomy

BackgroundThe bacterial 16S rRNA gene has historically been used in defining bacterial taxonomy and phylogeny. However, there are currently no high-throughput methods to sequence full-length 16S rRNA genes present in a sample with precision.ResultsWe describe a method for sequencing near full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform and test it using DNA from human skin swab samples. Proof of principle of the approach is demonstrated, with the generation of 1,604 sequences greater than 1,300 nt from a single Nano MiSeq run, with accuracy estimated to be 100-fold higher than standard Illumina reads. The reads were chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection.ConclusionsThis method could be scaled up to generate many thousands of sequences per MiSeq run and could be applied to other sequencing platforms. This has great potential for populating databases with high quality, near full-length 16S rRNA gene sequences from under-represented taxa and environments and facilitates analyses of microbial communities at higher resolution.

Download Full-text

Overestimation of Streptococcus mutans prevalence by nested PCR detection of the 16S rRNA gene

Journal of Medical Microbiology ◽

10.1099/jmm.0.46280-0 ◽

2006 ◽

Vol 55 (1) ◽

pp. 109-113 ◽

Cited By ~ 14

Author(s):

Ali Al-Ahmad ◽

Thorsten Mathias Auschill ◽

Gabriele Braun ◽

Elmar Hellwig ◽

Nicole Birgit Arweiler

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Streptococcus Mutans ◽

Nested Pcr ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Direct Pcr ◽

Specific Primers ◽

The 16S Rrna Gene

This study was carried out in order to compare two PCR-based methods in the detection of Streptococcus mutans. The first PCR method was based on primers for the 16S rRNA gene and the second method was based on specific primers that targeted the glucosyltransferase gene (gtfB). Each PCR was performed with eight different streptococci from the viridans group, five other streptococci and 17 different non-streptococcal bacterial strains. Direct use of the S. mutans 16S rRNA gene-specific primers revealed that Streptococcus gordonii and Streptococcus infantis were also detected. After amplifying the 16S rRNA gene with universal primers and subsequently performing nested PCR, the S. mutans-specific nested primers based on the 16S rRNA gene detected all tested streptococci. There was no cross-reaction of the gtfB primers after direct PCR. Our results indicate that direct PCR and nested PCR based on 16S rRNA genes can reveal false-positive results for oral streptococci and lead to an overestimation of the prevalence of S. mutans with regards to its role as the most prevalent causative agent of dental caries.

Download Full-text