Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data

PeerJ ◽

10.7717/peerj.4925 ◽

2018 ◽

Vol 6 ◽

pp. e4925 ◽

Cited By ~ 57

Author(s):

Jonathan M. Palmer ◽

Michelle A. Jusino ◽

Mark T. Banik ◽

Daniel L. Lindner

Keyword(s):

Amplicon Sequencing ◽

Variable Length ◽

Its Sequences ◽

Synthetic Control ◽

Mock Community ◽

Software Pipeline ◽

Initial Polymerase Chain Reaction ◽

Polymerase Chain ◽

Sequencing Platforms ◽

Mock Communities

High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.

Optimization of 16S amplicon analysis using mock communities: implications for estimating community diversity

10.7287/peerj.preprints.2196v2 ◽

2016 ◽

Cited By ~ 1

Author(s):

Andrew Krohn ◽

Bo Stevens ◽

Adam Robbins-Pianka ◽

Matthew Belus ◽

Gerard J Allan ◽

...

Keyword(s):

Amplicon Sequencing ◽

Community Diversity ◽

Accurate Estimation ◽

Marker Genes ◽

Sequencing Data ◽

Mock Community ◽

Data Set ◽

Environmental Diversity ◽

Quality Filtering ◽

Mock Communities

Diversity of complex microbial communities can be rapidly assessed by community amplicon sequencing of marker genes (e.g., 16S), often yielding many thousands of DNA sequences per sample. However, analysis of community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assigners performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community diversity by at least a factor of ten, compared to the optimized analysis which correctly characterized the taxonomic composition of the mock communities while still overestimating OTU diversity by about a factor of two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low quality base call resulting in sequence truncation during quality filtering. Low quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurate estimation of microbial community diversity.

Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run

mSystems ◽

10.1128/msystems.00023-18 ◽

2018 ◽

Vol 3 (3) ◽

Cited By ~ 21

Author(s):

Yi-Chun Yeh ◽

David M. Needham ◽

Ella T. Sieradzki ◽

Jed A. Fuhrman

Keyword(s):

Method Development ◽

Pcr Amplification ◽

Unknown Origin ◽

Community Analysis ◽

Sequencing Analysis ◽

Mock Community ◽

Microbiome Analysis ◽

Biological Studies ◽

Sequencing Platforms ◽

Mock Communities

ABSTRACT Mock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification and sequencing and to optimize pipeline outputs. Nevertheless, the strong value of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, we could have easily missed the problem had we not used the mock communities because of natural variation of microbiomes at our site. The “normal” results were validated over four MiSeqPE300 runs and three HiSeqPE250 runs, and run-to-run variation was usually low. While validating these “normal” results, we also discovered that some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We strongly advise the use of mock communities in every sequencing run to distinguish potentially serious aberrations from natural variations. The mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed to detect problems that show up only in some taxa and also to help validate clustering. IMPORTANCE Despite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be detected by the use of mock communities.

Taxon disappearance from microbiome analysis indicates need for mock communities as a standard in every sequencing run

10.1101/206219 ◽

2017 ◽

Cited By ~ 3

Author(s):

Yi-Chun Yeh ◽

David M. Needham ◽

Ella T. Sieradzki ◽

Jed A. Fuhrman

Keyword(s):

Method Development ◽

Pcr Amplification ◽

Unknown Origin ◽

Community Analysis ◽

Sequencing Analysis ◽

Mock Community ◽

Microbiome Analysis ◽

Biological Studies ◽

Sequencing Platforms ◽

Mock Communities

AbstractMock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification, sequencing, and to optimize pipeline outputs. Nevertheless, the necessity of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, due to natural variation of microbiomes at our site, we easily could have missed the problem had we not used the mock communities. The “normal” results were validated over 4 MiSeqPE300 runs and 3 HiSeqPE250 runs, and run-to-run variation was usually low (Bray-Curtis distance was 0.12±0.04). While validating these “normal” results, we also discovered some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We suggest that using mock communities in every sequencing run is essential to distinguish potentially serious aberrations from natural variations. Such mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed, to detect problems that show up only in some taxa, as we observed.ImportanceDespite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs, and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be avoided by the use of suitable mock communities.

CoMA – an intuitive and user-friendly pipeline for amplicon-sequencing data analysis

PLoS ONE ◽

10.1371/journal.pone.0243241 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243241

Author(s):

Sebastian Hupfauf ◽

Mohammad Etemadi ◽

Marina Fernández-Delgado Juárez ◽

María Gómez-Brandón ◽

Heribert Insam ◽

...

Keyword(s):

Operating System ◽

Data Analysis ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Taxonomic Assignment ◽

Benchmark Test ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Mock Communities

In recent years, there has been a veritable boost in next-generation sequencing (NGS) of gene amplicons in biological and medical studies. Huge amounts of data are produced and need to be analyzed adequately. Various online and offline analysis tools are available; however, most of them require extensive expertise in computer science or bioinformatics, and often a Linux-based operating system. Here, we introduce “CoMA–Comparative Microbiome Analysis” as a free and intuitive analysis pipeline for amplicon-sequencing data, compatible with any common operating system. Moreover, the tool offers various useful services including data pre-processing, quality checking, clustering to operational taxonomic units (OTUs), taxonomic assignment, data post-processing, data visualization, and statistical appraisal. The workflow results in highly esthetic and publication-ready graphics, as well as output files in standardized formats (e.g. tab-delimited OTU-table, BIOM, NEWICK tree) that can be used for more sophisticated analyses. The CoMA output was validated by a benchmark test, using three mock communities with different sample characteristics (primer set, amplicon length, diversity). The performance was compared with that of Mothur, QIIME and QIIME2-DADA2, popular packages for NGS data analysis. Furthermore, the functionality of CoMA is demonstrated on a practical example, investigating microbial communities from three different soils (grassland, forest, swamp). All tools performed well in the benchmark test and were able to reveal the majority of all genera in the mock communities. Also for the soil samples, the results of CoMA were congruent to those of the other pipelines, in particular when looking at the key microbial players.

Optimization of 16S amplicon analysis using mock communities: implications for estimating community diversity

10.7287/peerj.preprints.2196v1 ◽

2016 ◽

Author(s):

Andrew Krohn ◽

Bo Stevens ◽

Adam Robbins-Pianka ◽

Matthew Belus ◽

Gerard J Allan ◽

...

Keyword(s):

Amplicon Sequencing ◽

Community Diversity ◽

Accurate Estimation ◽

Marker Genes ◽

Sequencing Data ◽

Mock Community ◽

Data Set ◽

Environmental Diversity ◽

Quality Filtering ◽

Mock Communities

Diversity of complex microbial communities can be rapidly assessed by community amplicon sequencing of marker genes (e.g., 16S), often yielding many thousands of DNA sequences per sample. However, analysis of community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assigners performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community diversity by at least a factor of ten, compared to the optimized analysis which correctly characterized the taxonomic composition of the mock communities while still overestimating OTU diversity by about a factor of two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low quality base call resulting in sequence truncation during quality filtering. Low quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurate estimation of microbial community diversity.

A software pipeline for processing and identification of fungal ITS sequences

Source Code for Biology and Medicine ◽

10.1186/1751-0473-4-1 ◽

2009 ◽

Vol 4 (1) ◽

Cited By ~ 75

Author(s):

R Henrik Nilsson ◽

Gunilla Bok ◽

Martin Ryberg ◽

Erik Kristiansson ◽

Nils Hallenberg

Keyword(s):

Its Sequences ◽

Software Pipeline ◽

Fungal Its

Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling

ISME Communications ◽

10.1038/s43705-021-00033-z ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Sandra Reitmeier ◽

Thomas C. A. Hitch ◽

Nicole Treichel ◽

Nikolaos Fikas ◽

Bela Hausmann ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Diversity Analysis ◽

Careful Attention ◽

Operational Taxonomic Units ◽

Basic Concepts ◽

Gnotobiotic Mice ◽

Mock Communities

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.

Mycofier: a new machine learning-based classifier for fungal ITS sequences

BMC Research Notes ◽

10.1186/s13104-016-2203-3 ◽

2016 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Luisa Delgado-Serrano ◽

Silvia Restrepo ◽

Jose Ricardo Bustos ◽

Maria Mercedes Zambrano ◽

Juan Manuel Anzola

Keyword(s):

Machine Learning ◽

Its Sequences ◽

Fungal Its ◽

New Machine

Different Amplicon Targets for Sequencing-Based Studies of Fungal Diversity

Applied and Environmental Microbiology ◽

10.1128/aem.00905-17 ◽

2017 ◽

Vol 83 (17) ◽

Cited By ~ 40

Author(s):

Francesca De Filippis ◽

Manolo Laiola ◽

Giuseppe Blaiotta ◽

Danilo Ercolini

Keyword(s):

Microbial Ecology ◽

High Throughput ◽

Biological Samples ◽

Fungal Diversity ◽

High Throughput Sequencing ◽

Amplicon Sequencing ◽

Fungal Species ◽

Fungal Communities ◽

Its2 Region ◽

Mock Community

ABSTRACT Target-gene amplicon sequencing is the most exploited high-throughput sequencing application in microbial ecology. The targets are taxonomically relevant genes, with 16S rRNA being the gold standard for bacteria. As for fungi, the most commonly used target is the internal transcribed spacer (ITS). However, the uneven ITS length among species may promote preferential amplification and sequencing and incorrect estimation of their abundance. Therefore, the use of different targets is desirable. We evaluated the use of three different target amplicons for the characterization of fungal diversity. After an in silico primer evaluation, we compared three amplicons (the ITS1-ITS2 region [ITS1-2], 18S ribosomal small subunit RNA, and the D1/D2 domain of the 26S ribosomal large subunit RNA), using biological samples and a mock community of common fungal species. All three targets allowed for accurate identification of the species present. Nevertheless, high heterogeneity in ITS1-2 length was found, and this caused an overestimation of the abundance of species with a shorter ITS, while both 18S and 26S amplicons allowed for more reliable quantification. We demonstrated that ITS1-2 amplicon sequencing, although widely used, may lead to an incorrect evaluation of fungal communities, and efforts should be made to promote the use of different targets in sequencing-based microbial ecology studies. IMPORTANCE Amplicon-sequencing approaches for fungi may rely on different targets affecting the diversity and abundance of the fungal species. An increasing number of studies will address fungal diversity by high-throughput amplicon sequencing. The description of the communities must be accurate and reliable in order to draw useful insights and to address both ecological and biological questions. By analyzing a mock community and several biological samples, we demonstrate that using different amplicon targets may change the results of fungal microbiota analysis, and we highlight how a careful choice of the target is fundamental for a thorough description of the fungal communities.