scholarly journals Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform

2013 ◽  
Vol 79 (17) ◽  
pp. 5112-5120 ◽  
Author(s):  
James J. Kozich ◽  
Sarah L. Westcott ◽  
Nielson T. Baxter ◽  
Sarah K. Highlander ◽  
Patrick D. Schloss

ABSTRACTRapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.

2017 ◽  
Author(s):  
Garold Fuks ◽  
Michael Elgart ◽  
Amnon Amir ◽  
Amit Zeisel ◽  
Peter J. Turnbaugh ◽  
...  

AbstractBackgroundMost of our knowledge about the remarkable microbial diversity on Earth comes from sequencing the 16S rRNA gene. The use of next-generation sequencing methods has increased sample number and sequencing depth, but the read length of the most widely used sequencing platforms today is quite short, requiring the researcher to choose a subset of the gene to sequence (typically 16-33% of the total length). Thus, many bacteria may share the same amplified region and the resolution of profiling is inherently limited. Platforms that offer ultra long read lengths, whole genome shotgun sequencing approaches, and computational frameworks formerly suggested by us and by others, all allow different ways to circumvent this problem yet suffer various shortcomings. There is need for a simple and low cost 16S rRNA gene based profiling approach that harnesses the short read length to provide a much larger coverage of the gene to allow for high resolution, even in harsh conditions of low bacterial biomass and fragmented DNA.ResultsThis manuscript suggests Short MUltiple Regions Framework (SMURF), a method to combine sequencing results from different PCR-amplified regions to provide one coherent profiling. The de facto amplicon length is the total length of all amplified regions, thus providing much higher resolution compared to current techniques. Computationally, the method solves a convex optimization problem that allows extremely fast reconstruction and requires only moderate memory. We demonstrate the increase in resolution by in silico simulations and by profiling two mock mixtures and real-world biological samples. Reanalyzing a mock mixture from the Human Microbiome Project achieved about two-fold improvement in resolution when combing two independent regions. Using a custom set of six primer pairs spanning about 1200bp (80%) of the 16S rRNA gene we were able to achieve ~100 fold improvement in resolution compared to a single region, over a mock mixture of common human gut bacterial isolates. Finally, profiling of a Drosophila melanogaster microbiome using the set of six primer pairs provided a ~100 fold increase in resolution, and thus enabling efficient downstream analysis.ConclusionsSMURF enables identification of near full-length 16S rRNA gene sequences in microbial communities, having resolution superior compared to current techniques. It may be applied to standard sample preparation protocols with very little modifications. SMURF also paves the way to high-resolution profiling of low-biomass and fragmented DNA, e.g., in the case of Formalin-fixed and Paraffin-embedded samples, fossil-derived DNA or DNA exposed to other degrading conditions. The approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons, e.g., in Multilocus Sequence Typing (MLST).


2014 ◽  
Author(s):  
Catherine Burke ◽  
Aaron E Darling

We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.


2013 ◽  
Vol 80 (4) ◽  
pp. 1403-1410 ◽  
Author(s):  
Clare A. Anstead ◽  
Neil B. Chilton

ABSTRACTThe genomic DNA from four species of ixodid ticks in western Canada was tested for the presence ofRickettsiellaby PCR analyses targeting the 16S rRNA gene. Eighty-eight percent of theIxodes angustus(n= 270), 43% of theI. sculptus(n= 61), and 4% of theI. kingi(n= 93) individuals examined were PCR positive forRickettsiella, whereas there was no evidence for the presence ofRickettsiellainDermacentor andersoni(n= 45). Three different single-strand conformation polymorphism profiles of the 16S rRNA gene were detected among amplicons derived fromRickettsiella-positive ticks, each corresponding to a different sequence type. Furthermore, each sequence type was associated with a different tick species. Phylogenetic analyses of sequence data of the 16S rRNA gene and three other genes (rpsA,gidA, andsucB) revealed that all three sequence types were placed in a clade that contained species and pathotypes of the genusRickettsiella. The bacterium inI. kingirepresented the sister taxon to theRickettsiellainI. sculptus, and both formed a clade withRickettsiellagryllifrom crickets (Gryllus bimaculatus) and “R. ixodidis” fromI. woodi. In contrast, theRickettsiellainI. angustuswas not a member of this clade but was placed external to the clade comprising the pathotypes ofR. popilliae. The results indicate the existence of at least two new species ofRickettsiella: one inI. angustusand another inI. kingiandI. sculptus. However, theRickettsiellastrains inI. kingiandI. sculptusmay also represent different species because each had unique sequences for all four genes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hannah E. Epstein ◽  
Alejandra Hernandez-Agreda ◽  
Samuel Starko ◽  
Julia K. Baum ◽  
Rebecca Vega Thurber

16S rRNA gene profiling (amplicon sequencing) is a popular technique for understanding host-associated and environmental microbial communities. Most protocols for sequencing amplicon libraries follow a standardized pipeline that can differ slightly depending on laboratory facility and user. Given that the same variable region of the 16S gene is targeted, it is generally accepted that sequencing output from differing protocols are comparable and this assumption underlies our ability to identify universal patterns in microbial dynamics through meta-analyses. However, discrepant results from a combined 16S rRNA gene dataset prepared by two labs whose protocols differed only in DNA polymerase and sequencing platform led us to scrutinize the outputs and challenge the idea of confidently combining them for standard microbiome analysis. Using technical replicates of reef-building coral samples from two species, Montipora aequituberculata and Porites lobata, we evaluated the consistency of alpha and beta diversity metrics between data resulting from these highly similar protocols. While we found minimal variation in alpha diversity between platform, significant differences were revealed with most beta diversity metrics, dependent on host species. These inconsistencies persisted following removal of low abundance taxa and when comparing across higher taxonomic levels, suggesting that bacterial community differences associated with sequencing protocol are likely to be context dependent and difficult to correct without extensive validation work. The results of this study encourage caution in the statistical comparison and interpretation of studies that combine rRNA gene sequence data from distinct protocols and point to a need for further work identifying mechanistic causes of these observed differences.


2007 ◽  
Vol 73 (16) ◽  
pp. 5261-5267 ◽  
Author(s):  
Qiong Wang ◽  
George M. Garrity ◽  
James M. Tiedje ◽  
James R. Cole

ABSTRACT The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (≥95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/ .


Author(s):  
Jessica L. O’Callaghan ◽  
Dana Willner ◽  
Melissa Buttini ◽  
Flavia Huygens ◽  
Elise S. Pelzer

The endometrial cavity is an upper genital tract site previously thought as sterile, however, advances in culture-independent, next-generation sequencing technology have revealed that this low-biomass site harbors a rich microbial community which includes multiple Lactobacillus species. These bacteria are considered to be the most abundant non-pathogenic genital tract commensals. Next-generation sequencing of the female lower genital tract has revealed significant variation amongst microbial community composition with respect to Lactobacillus sp. in samples collected from healthy women and women with urogenital conditions. The aim of this study was to evaluate our ability to characterize members of the genital tract microbial community to species-level taxonomy using variable regions of the 16S rRNA gene. Samples were interrogated for the presence of microbial DNA using next-generation sequencing technology that targets the V5–V8 regions of the 16S rRNA gene and compared to speciation using qPCR. We also performed re-analysis of published data using alternate variable regions of the 16S rRNA gene. In this analysis, we explore next-generation sequencing of clinical genital tract isolates as a method for high throughput identification to species-level of key Lactobacillus sp. Data revealed that characterization of genital tract taxa is hindered by a lack of a consensus protocol and 16S rRNA gene region target allowing comparison between studies.


1998 ◽  
Vol 36 (2) ◽  
pp. 462-466 ◽  
Author(s):  
Joanne B. Messick ◽  
Linda M. Berent ◽  
Sandra K. Cooper

The 16S rRNA gene of Haemobartonella felis was amplified by using universal eubacterial primers and was subsequently cloned and sequenced. Based on this sequence data, we designed a set ofH. felis-specific primers. These primers selectively amplified a 1,316-bp DNA fragment of the 16S rRNA gene of H. felis from each of four experimentally infected cats at peak parasitemia. No PCR product was amplified from purified DNA ofEperythrozoon suis, Mycoplasma genitalium, andBartonella bacilliformis. Blood from the experimental cats prior to infection was negative for PCR products and was greatly diminished or absent 1 month after doxycycline treatment. The overall sequence identity of this fragment varied by less than 1.0% among experimentally infected cats. By taking into consideration the secondary structure of the 16S rRNA molecule, we were able to further verify the alignment of nucleotides and quality of our sequence data. In this PCR assay, the minimum detectable number of H. felis organisms was determined to be between 50 and 704. The potential usefulness of restriction enzymes DdeI andMnlI for distinguishing H. felis from closely related bacteria was examined. This is the first report of the utility of PCR-facilitated diagnosis and discrimination of H. felisinfection in cats.


2000 ◽  
Vol 38 (3) ◽  
pp. 953-959 ◽  
Author(s):  
M. S. Hughes ◽  
G. James ◽  
N. Ball ◽  
M. Scally ◽  
R. Malik ◽  
...  

PCR amplifications of the 16S rRNA gene were performed on 46 specimens obtained from 43 dogs with canine leproid granuloma syndrome to help determine its etiology. Sequence capture PCR was applied to 37 paraffin-embedded specimens from 37 dogs, and nested PCR was attempted on DNA from 9 fresh tissue specimens derived from 3 of the 37 aforementioned dogs and from an additional 6 dogs. Molecular analyses of the paraffin-embedded tissues and fresh tissue specimen analyses were performed at separate institutions. PCR products with identical sequences over a 350-bp region encompassing variable regions 2 and 3 of the 16S rRNA gene were obtained from 4 of 37 paraffin-embedded specimens and from all 9 specimens of fresh tissue originating from 12 of the 43 dogs. Identical sequences were determined from amplicons obtained from paraffin-embedded and fresh specimens from one dog. The consensus DNA sequence, amplified from paraffin-embedded tissue and represented by GenBank accession no. AF144747, shared highest nucleotide identity (99.4% over 519 bp) with mycobacterial strain IWGMT 90413 but did not correspond exactly to any EMBL or GenBank database sequence. With a probe derived from the V2 region of the novel canine sequence, reverse cross blot hybridization identified an additional four paraffin-embedded specimens containing the same novel sequence. In total, molecular methodologies identified the proposed novel mycobacterial sequence in 16 of 43 dogs with canine leproid granuloma syndrome, indicating that the species represented by this sequence may be the principal etiological agent of canine leproid granuloma syndrome.


Sign in / Sign up

Export Citation Format

Share Document