scholarly journals Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4925 ◽  
Author(s):  
Jonathan M. Palmer ◽  
Michelle A. Jusino ◽  
Mark T. Banik ◽  
Daniel L. Lindner

High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.

2017 ◽  
Author(s):  
Jonathan M Palmer ◽  
Michelle A Jusino ◽  
Mark T Banik ◽  
Daniel L Lindner

High throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal ITS amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community (BioMock), consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: 1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, 2) pre-clustering steps for variable length amplicons are critically important, 3) a major source of bias is attributed to initial PCR reactions and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological synthetic mock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.


Author(s):  
Andrew Krohn ◽  
Bo Stevens ◽  
Adam Robbins-Pianka ◽  
Matthew Belus ◽  
Gerard J Allan ◽  
...  

Diversity of complex microbial communities can be rapidly assessed by community amplicon sequencing of marker genes (e.g., 16S), often yielding many thousands of DNA sequences per sample. However, analysis of community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assigners performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community diversity by at least a factor of ten, compared to the optimized analysis which correctly characterized the taxonomic composition of the mock communities while still overestimating OTU diversity by about a factor of two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low quality base call resulting in sequence truncation during quality filtering. Low quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurate estimation of microbial community diversity.


mSystems ◽  
2018 ◽  
Vol 3 (3) ◽  
Author(s):  
Yi-Chun Yeh ◽  
David M. Needham ◽  
Ella T. Sieradzki ◽  
Jed A. Fuhrman

ABSTRACT Mock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification and sequencing and to optimize pipeline outputs. Nevertheless, the strong value of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, we could have easily missed the problem had we not used the mock communities because of natural variation of microbiomes at our site. The “normal” results were validated over four MiSeqPE300 runs and three HiSeqPE250 runs, and run-to-run variation was usually low. While validating these “normal” results, we also discovered that some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We strongly advise the use of mock communities in every sequencing run to distinguish potentially serious aberrations from natural variations. The mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed to detect problems that show up only in some taxa and also to help validate clustering. IMPORTANCE Despite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be detected by the use of mock communities.


2017 ◽  
Author(s):  
Yi-Chun Yeh ◽  
David M. Needham ◽  
Ella T. Sieradzki ◽  
Jed A. Fuhrman

AbstractMock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification, sequencing, and to optimize pipeline outputs. Nevertheless, the necessity of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, due to natural variation of microbiomes at our site, we easily could have missed the problem had we not used the mock communities. The “normal” results were validated over 4 MiSeqPE300 runs and 3 HiSeqPE250 runs, and run-to-run variation was usually low (Bray-Curtis distance was 0.12±0.04). While validating these “normal” results, we also discovered some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We suggest that using mock communities in every sequencing run is essential to distinguish potentially serious aberrations from natural variations. Such mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed, to detect problems that show up only in some taxa, as we observed.ImportanceDespite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs, and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be avoided by the use of suitable mock communities.


2016 ◽  
Author(s):  
Andrew Krohn ◽  
Bo Stevens ◽  
Adam Robbins-Pianka ◽  
Matthew Belus ◽  
Gerard J Allan ◽  
...  

Diversity of complex microbial communities can be rapidly assessed by community amplicon sequencing of marker genes (e.g., 16S), often yielding many thousands of DNA sequences per sample. However, analysis of community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assigners performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community diversity by at least a factor of ten, compared to the optimized analysis which correctly characterized the taxonomic composition of the mock communities while still overestimating OTU diversity by about a factor of two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low quality base call resulting in sequence truncation during quality filtering. Low quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurate estimation of microbial community diversity.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Sandra Reitmeier ◽  
Thomas C. A. Hitch ◽  
Nicole Treichel ◽  
Nikolaos Fikas ◽  
Bela Hausmann ◽  
...  

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.


2017 ◽  
Vol 83 (17) ◽  
Author(s):  
Francesca De Filippis ◽  
Manolo Laiola ◽  
Giuseppe Blaiotta ◽  
Danilo Ercolini

ABSTRACT Target-gene amplicon sequencing is the most exploited high-throughput sequencing application in microbial ecology. The targets are taxonomically relevant genes, with 16S rRNA being the gold standard for bacteria. As for fungi, the most commonly used target is the internal transcribed spacer (ITS). However, the uneven ITS length among species may promote preferential amplification and sequencing and incorrect estimation of their abundance. Therefore, the use of different targets is desirable. We evaluated the use of three different target amplicons for the characterization of fungal diversity. After an in silico primer evaluation, we compared three amplicons (the ITS1-ITS2 region [ITS1-2], 18S ribosomal small subunit RNA, and the D1/D2 domain of the 26S ribosomal large subunit RNA), using biological samples and a mock community of common fungal species. All three targets allowed for accurate identification of the species present. Nevertheless, high heterogeneity in ITS1-2 length was found, and this caused an overestimation of the abundance of species with a shorter ITS, while both 18S and 26S amplicons allowed for more reliable quantification. We demonstrated that ITS1-2 amplicon sequencing, although widely used, may lead to an incorrect evaluation of fungal communities, and efforts should be made to promote the use of different targets in sequencing-based microbial ecology studies. IMPORTANCE Amplicon-sequencing approaches for fungi may rely on different targets affecting the diversity and abundance of the fungal species. An increasing number of studies will address fungal diversity by high-throughput amplicon sequencing. The description of the communities must be accurate and reliable in order to draw useful insights and to address both ecological and biological questions. By analyzing a mock community and several biological samples, we demonstrate that using different amplicon targets may change the results of fungal microbiota analysis, and we highlight how a careful choice of the target is fundamental for a thorough description of the fungal communities.


1995 ◽  
Vol 41 (1) ◽  
pp. 69-72 ◽  
Author(s):  
M Guida ◽  
R S Marger ◽  
A C Papp ◽  
P J Snyder ◽  
M S Sedra ◽  
...  

Abstract Myotonic dystrophy (DM) is an autosomal dominant genetic disease caused by an unstable CTG repeat sequence in the 3' untranslated region of the myotonin protein kinase gene. The CTG repeat is present 5-30 times in the normal population, whereas DM patients have CTG expansions of 50 to several thousand repeats. The age of onset of the disorder and the severity of the phenotype is roughly correlated with the size of the CTG expansion. We developed a molecular protocol for the diagnosis of DM based on an initial polymerase chain reaction screen to detect normal-sized alleles and small expansions, followed by an improved Southern protocol to detect larger expansions.


2020 ◽  
Author(s):  
Sara D’Andreano ◽  
Anna Cuscó ◽  
Olga Francino

ABSTRACTThe availability of long-read technologies, like Oxford Nanopore Technologies, provides the opportunity to sequence longer fragments of the fungal ribosomal operon, up to 6 Kb (18S-ITS1-5.8S-ITS2-28S), and to improve the taxonomy assignment of the communities up to the species level and in real-time. We assess the taxonomy skills of amplicons targeting a 3.5 Kb region (V3 18S-ITS1-5.8S-ITS2-28S D2) and a 6 Kb region (V1 18S-ITS1-5.8S-ITS2-28S D12) with the What’s in my pot (WIMP) classifier. We used the ZymoBIOMICS™ mock community and different microbiological fungal cultures as positive controls. Long amplicon sequencing correctly identified Saccharomyces cerevisiae and Cryptococcus neoformans from the mock community and Malassezia pachydermatis, Microsporum canis, and Aspergillus fumigatus from the microbiological cultures. Besides, we identified Rhodotorula graminis in a culture mislabeled as Candida spp.We applied the same approach to external otitis in dogs. Malassezia was the dominant fungal genus in dogs’ ear skin, whereas M. pachydermatis was the main species in the healthy sample. Conversely, we identified a higher representation of M. globosa and M. sympodialis in otitis affected samples. We demonstrate the suitability of long ribosomal amplicons to characterize the fungal community of complex samples, either healthy or with clinical signs of infection.


2021 ◽  
Author(s):  
Christoph M. Deeg ◽  
Ben J. G. Sutherland ◽  
Tobi J. Ming ◽  
Colin Wallace ◽  
Kim Jonsen ◽  
...  

Genetic stock identification (GSI) by single nucleotide polymorphism (SNP) sequencing has become the gold standard for stock identification in Pacific salmon, which are found in mixed-stocks during the oceanic phase of their lifecycle. Sequencing platforms currently applied require large batch sizes and multi-day processing in specialized facilities to perform genotyping by the thousands. However, recent advances in third-generation single-molecule sequencing platforms, like the Oxford Nanopore minION, provide base calling on portable, pocket-sized sequencers and hold promise for the application of real-time, in-field stock identification on variable batch sizes. Here we report and evaluate utility and comparability of at-sea stock identification of coho salmon Oncorhynchus kisutch based on targeted SNP amplicon sequencing on the minION platform during the International Year of the Salmon Signature Expedition to the Gulf of Alaska in the winter of 2019. Long read sequencers are not optimized for short amplicons, therefore we concatenate amplicons to increase coverage and throughput. Nanopore sequencing at-sea yielded stock assignment for 50 of the 80 assessed individuals. Nanopore-based SNP calls agreed with Ion Torrent based genotypes in 83.25%, but assignment of individuals to stock of origin only agreed in 61.5% of individuals highlighting inherent challenges of Nanopore sequencing, such as resolution of homopolymer tracts and indels. However, poor representation of assayed coho salmon in the queried baseline dataset contributed to poor assignment confidence on both platforms. Future improvements will focus on lowering turnaround time, accuracy, throughput, and cost, as well as augmentation of the existing baselines, specifically in stocks from coastal northern BC and Alaska. If successfully implemented, Nanopore sequencing will provide an alternative method to the large-scale laboratory approach. Genotyping by amplicon sequencing in the hands of diverse stakeholders could inform management decisions over a broad expanse of the coast by allowing the analysis of small batches in remote areas in near real-time.


Sign in / Sign up

Export Citation Format

Share Document