Comprehensive single‐PCR 16S and 18S rRNA community analysis validated with mock communities, and estimation of sequencing bias against 18S

Author(s):  
Yi‐Chun Yeh ◽  
Jesse C. McNichol ◽  
David M. Needham ◽  
Erin B. Fichot ◽  
Lyria Berdjeb ◽  
...  
2019 ◽  
Author(s):  
Yi-Chun Yeh ◽  
Jesse C. McNichol ◽  
David M. Needham ◽  
Erin B. Fichot ◽  
Jed A. Fuhrman

AbstractUniversal SSU rRNA primers allow comprehensive quantitative profiling of natural communities by simultaneously amplifying templates from Bacteria, Archaea, and Eukaryota in a single PCR reaction. Despite the potential to show all rRNA gene relative gene abundances, they are rarely used due to concerns about length bias against 18S amplicons and bioinformatic challenges converting mixed 16S/18S sequences into amplicon sequence variants. We thus developed 16S and 18S rRNA mock communities and a bioinformatic pipeline to validate this three-domain approach. To test for length biases, we mixed eukaryotic and prokaryotic mocks before PCR, and found consistent two-fold underestimation of longer 18S sequences due to sequencing but not PCR bias. Using these mocks, we show universal V4-V5 primers (515Y/926R) outperformed eukaryote-specific V4 primers in observed vs. expected abundance correlations and sequences with single mismatches to the primer were strongly underestimated (3-8 fold). A year of monthly time-series data from a protist-enriched 1.2-80 μm size fraction yielded an average of 9% 18S, 17% chloroplast 16S, and 74% prokaryote 16S rRNA gene amplicons. These data demonstrate the potential for universal primers to generate quantitative and comprehensive microbiome profiles, although gene copy and genome size variability should be considered - as for any quantitative genetic analysis.


2016 ◽  
Vol 82 (19) ◽  
pp. 5878-5891 ◽  
Author(s):  
Ian M. Bradley ◽  
Ameet J. Pinto ◽  
Jeremy S. Guest

ABSTRACTThe use of high-throughput sequencing technologies with the 16S rRNA gene for characterization of bacterial and archaeal communities has become routine. However, the adoption of sequencing methods for eukaryotes has been slow, despite their significance to natural and engineered systems. There are large variations among the target genes used for amplicon sequencing, and for the 18S rRNA gene, there is no consensus on which hypervariable region provides the most suitable representation of diversity. Additionally, it is unclear how much PCR/sequencing bias affects the depiction of community structure using current primers. The present study amplified the V4 and V8-V9 regions from seven microalgal mock communities as well as eukaryotic communities from freshwater, coastal, and wastewater samples to examine the effect of PCR/sequencing bias on community structure and membership. We found that degeneracies on the 3′ end of the current V4-specific primers impact read length and mean relative abundance. Furthermore, the PCR/sequencing error is markedly higher for GC-rich members than for communities with balanced GC content. Importantly, the V4 region failed to reliably capture 2 of the 12 mock community members, and the V8-V9 hypervariable region more accurately represents mean relative abundance and alpha and beta diversity. Overall, the V4 and V8-V9 regions show similar community representations over freshwater, coastal, and wastewater environments, but specific samples show markedly different communities. These results indicate that multiple primer sets may be advantageous for gaining a more complete understanding of community structure and highlight the importance of including mock communities composed of species of interest.IMPORTANCEThe quantification of error associated with community representation by amplicon sequencing is a critical challenge that is often ignored. When target genes are amplified using currently available primers, differential amplification efficiencies result in inaccurate estimates of community structure. The extent to which amplification bias affects community representation and the accuracy with which different gene targets represent community structure are not known. As a result, there is no consensus on which region provides the most suitable representation of diversity for eukaryotes. This study determined the accuracy with which commonly used 18S rRNA gene primer sets represent community structure and identified particular biases related to PCR amplification and Illumina MiSeq sequencing in order to more accurately study eukaryotic microbial communities.


mSystems ◽  
2018 ◽  
Vol 3 (3) ◽  
Author(s):  
Yi-Chun Yeh ◽  
David M. Needham ◽  
Ella T. Sieradzki ◽  
Jed A. Fuhrman

ABSTRACT Mock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification and sequencing and to optimize pipeline outputs. Nevertheless, the strong value of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, we could have easily missed the problem had we not used the mock communities because of natural variation of microbiomes at our site. The “normal” results were validated over four MiSeqPE300 runs and three HiSeqPE250 runs, and run-to-run variation was usually low. While validating these “normal” results, we also discovered that some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We strongly advise the use of mock communities in every sequencing run to distinguish potentially serious aberrations from natural variations. The mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed to detect problems that show up only in some taxa and also to help validate clustering. IMPORTANCE Despite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be detected by the use of mock communities.


2017 ◽  
Author(s):  
Yi-Chun Yeh ◽  
David M. Needham ◽  
Ella T. Sieradzki ◽  
Jed A. Fuhrman

AbstractMock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification, sequencing, and to optimize pipeline outputs. Nevertheless, the necessity of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, due to natural variation of microbiomes at our site, we easily could have missed the problem had we not used the mock communities. The “normal” results were validated over 4 MiSeqPE300 runs and 3 HiSeqPE250 runs, and run-to-run variation was usually low (Bray-Curtis distance was 0.12±0.04). While validating these “normal” results, we also discovered some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We suggest that using mock communities in every sequencing run is essential to distinguish potentially serious aberrations from natural variations. Such mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed, to detect problems that show up only in some taxa, as we observed.ImportanceDespite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs, and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be avoided by the use of suitable mock communities.


2021 ◽  
Vol 12 ◽  
Author(s):  
Changwoo Park ◽  
Seung Bum Kim ◽  
Sang Ho Choi ◽  
Seil Kim

Microbial community analysis based on the 16S rRNA-gene is used to investigate both beneficial and harmful microorganisms in various fields and environments. Recently, the next-generation sequencing (NGS) technology has enabled rapid and accurate microbial community analysis. Despite these advantages of NGS based metagenomics study, sample transport, storage conditions, amplification, library preparation kits, sequencing, and bioinformatics procedures can bias microbial community analysis results. In this study, eight mock communities were pooled from genomic DNA of Lactobacillus acidophilus KCTC 3164T, Limosilactobacillus fermentum KCTC 3112T, Lactobacillus gasseri KCTC 3163T, Lacticaseibacillus paracasei subsp. paracasei KCTC 3510T, Limosilactobacillus reuteri KCTC 3594T, Lactococcus lactis subsp. lactis KCTC 3769T, Bifidobacterium animalis subsp. lactis KCTC 5854T, and Bifidobacterium breve KCTC 3220T. The genomic DNAs were quantified by droplet digital PCR (ddPCR) and were mixed as mock communities. The mock communities were amplified with various 16S rRNA gene universal primer pairs and sequenced by MiSeq, IonTorrent, MGIseq-2000, Sequel II, and MinION NGS platforms. In a comparison of primer-dependent bias, the microbial profiles of V1-V2 and V3 regions were similar to the original ratio of the mock communities, while the microbial profiles of the V1-V3 region were relatively biased. In a comparison of platform-dependent bias, the sequence read from short-read platforms (MiSeq, IonTorrent, and MGIseq-2000) showed lower bias than that of long-read platforms (Sequel II and MinION). Meanwhile, the sequences read from Sequel II and MinION platforms were relatively biased in some mock communities. In the data of all NGS platforms and regions, L. acidophilus was greatly underrepresented while Lactococcus lactis subsp. lactis was generally overrepresented. In all samples of this study, the bias index (BI) was calculated and PCA was performed for comparison. The samples with biased relative abundance showed high BI values and were separated in the PCA results. In particular, analysis of regions rich in AT and GC poses problems for genome assembly, which can lead to sequencing bias. According to this comparative analysis, the development of reference material (RM) material has been proposed to calibrate the bias in microbiome analysis.


2020 ◽  
Author(s):  
Md. Maniruzzaman Sikder ◽  
Mette Vestergård ◽  
Rumakanta Sapkota ◽  
Tina Kyndt ◽  
Mogens Nicolaisen

AbstractNematodes are widely abundant soil metazoa and often referred to as indicators of soil health. While recent advances in next-generation sequencing technologies have accelerated research in microbial ecology, the ecology of nematodes remains poorly elucidated, partly due to the lack of reliable and validated sequencing strategies. Objectives of the present study were (i) to compare commonly used primer sets and to identify the most suitable primer set for metabarcoding of nematodes; (ii) to establish and validate a high-throughput sequencing strategy for nematodes using Illumina paired-end sequencing. In this study, we tested four primer sets for amplicon sequencing: JB3/JB5 (mitochondrial, I3-M11 partition); SSU_04F/SSU_22R (18S rRNA, V1-V2 region); Nemf/18Sr2b (18S rRNA, V6-V8 region) from earlier studies; and MMSF/MMSR (18S rRNA, V4-V5 region), a newly developed primer set from this study. In order to test the primer sets, we used 22 samples of individual nematode species, 20 mock communities, 20 soil samples, 20 spiked soil samples (mock communities in soil), and 4 root/rhizosphere soil samples. We successfully amplified the target regions (I3-M11 partition of the COI gene; V1-V2, V4-V8 region of 18S rRNA gene) from these 86 DNA samples with the four different primer combinations and sequenced the amplicons on an Illumina MiSeq sequencing platform. We found that the MMSF/MMSR and Nemf/18Sr2b were efficient in detecting nematode compared to JB and SSU primer sets based on annotation of sequence reads at genus and in some cases at species level. Therefore, these primer sets are suggested for studies of nematode communities in agricultural environments.


2016 ◽  
Author(s):  
Nicholas A Bokulich ◽  
Jai Ram Rideout ◽  
William G Mercurio ◽  
Benjamin Wolfe ◽  
Corinne F Maurice ◽  
...  

Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at https://github.com/caporaso-lab/mockrobiota. The materials contained in mockrobiota include dataset and sample metadata, expected composition data, which are annotated based on one or more reference taxonomies, links to raw data (e.g., raw sequence data) for each mock community dataset, and optional reference sequences for mock community members. mockrobiota does not supply physical sample materials directly, but the dataset metadata included for each mock community indicate whether physical sample materials are available (and associated contact information). At the time of this writing, mockrobiota contains 11 mock community datasets with known species compositions (including bacterial, archaeal, and eukaryotic mock communities), analyzed by high-throughput marker-gene sequencing. The availability of standard, public mock community data will facilitate ongoing methods optimizations; comparisons across studies that share source data; greater transparency and access; and eliminate redundancy. This dynamic resource is intended to expand and evolve to meet the changing needs of the ‘omics community.


Diversity ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 388
Author(s):  
Md. Maniruzzaman Sikder ◽  
Mette Vestergård ◽  
Rumakanta Sapkota ◽  
Tina Kyndt ◽  
Mogens Nicolaisen

While recent advances in next-generation sequencing technologies have accelerated research in microbial ecology, the application of high throughput approaches to study the ecology of nematodes remains unresolved due to several issues, e.g., whether to include an initial nematode extraction step or not, the lack of consensus on the best performing primer combination, and the absence of a curated nematode reference database. The objective of this method development study was to compare different primer sets to identify the most suitable primer set for the metabarcoding of nematodes without initial nematode extraction. We tested four primer sets for amplicon sequencing: JB3/JB5 (mitochondrial, I3-M11 partition of COI gene), SSU_04F/SSU_22R (18S rRNA, V1-V2 regions), and Nemf/18Sr2b (18S rRNA, V6-V8 regions) from earlier studies, as well as MMSF/MMSR (18S rRNA, V4-V5 regions), a newly developed primer set. We used DNA from 22 nematode taxa, 10 mock communities, 20 soil samples, 4 root samples, and one bulk soil. We amplified the target regions from the DNA samples with the four different primer combinations and sequenced the amplicons on an Illumina MiSeq sequencing platform. We found that the Nemf/18Sr2b primer set was superior for detecting soil nematodes compared to the other primer sets based on our sequencing results and on the annotation of our sequence reads at the genus and species ranks. This primer set generated 74% reads of Nematoda origin in the soil samples. Additionally, this primer set did well with the mock communities, detecting all the included specimens. It also worked better in the root samples than the other primer set that was tested. Therefore, we suggest that the Nemf/18Sr2b primer set could be used to study rhizosphere soil and root associated nematodes, and this can be done without an initial nematode extraction step.


2016 ◽  
Author(s):  
Nicholas A Bokulich ◽  
Jai Ram Rideout ◽  
William G Mercurio ◽  
Benjamin Wolfe ◽  
Corinne F Maurice ◽  
...  

Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at https://github.com/caporaso-lab/mockrobiota. The materials contained in mockrobiota include dataset and sample metadata, expected composition data, which are annotated based on one or more reference taxonomies, links to raw data (e.g., raw sequence data) for each mock community dataset, and optional reference sequences for mock community members. mockrobiota does not supply physical sample materials directly, but the dataset metadata included for each mock community indicate whether physical sample materials are available (and associated contact information). At the time of this writing, mockrobiota contains 11 mock community datasets with known species compositions (including bacterial, archaeal, and eukaryotic mock communities), analyzed by high-throughput marker-gene sequencing. The availability of standard, public mock community data will facilitate ongoing methods optimizations; comparisons across studies that share source data; greater transparency and access; and eliminate redundancy. This dynamic resource is intended to expand and evolve to meet the changing needs of the ‘omics community.


Sign in / Sign up

Export Citation Format

Share Document