scholarly journals Pitfalls and pointers: An accessible guide to marker gene amplicon sequencing in ecological applications

Author(s):  
Anita Porath‐Krause ◽  
Alexander T. Strauss ◽  
Jeremiah A. Henning ◽  
Eric W. Seabloom ◽  
Elizabeth T. Borer
Diagnostics ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 163 ◽  
Author(s):  
Camila Hernandes ◽  
Paola Silveira ◽  
Aline Fernanda Rodrigues Sereia ◽  
Ana Paula Christoff ◽  
Helen Mendes ◽  
...  

This work aimed to identify and compare the bacterial patterns present in endometriotic lesions, eutopic endometrium and vaginal fluid from endometriosis patients with those found in the vaginal fluid and eutopic endometrium of control patients. Vaginal fluid, eutopic endometrium and endometriotic lesions were collected. DNA was extracted and the samples were analyzed to identify microbiome by high-throughput DNA sequencing of the 16S rRNA marker gene. Amplicon sequencing from vaginal fluid, eutopic endometrium and endometriotic lesion resulted in similar profiles of microorganisms, composed most abundantly by the genus Lactobacillus, Gardnerella, Streptococcus and Prevotella. No significant differences were found in the diversity analysis of microbiome profiles between control and endometriotic patients; however deep endometriotic lesions seems to present different bacterial composition, less predominant of Lactobacillus and with more abundant Alishewanella, Enterococcus and Pseudomonas.


2017 ◽  
Author(s):  
Stephen Woloszynek ◽  
Joshua Chang Mell ◽  
Gideon Simpson ◽  
Michael P. O’Connor ◽  
Gail L. Rosen

ABSTRACTBackgroundAnalysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). But elucidating key associations is often difficult since microbiome data are compositional, high dimensional, and sparse. Also, the configuration of co-occurring taxa may represent overlapping subcommunities that contribute to, for example, host status. Preserving the configuration of co-occurring microbes rather than detecting specific indicator species is more likely to facilitate biologically meaningful interpretations. In addition, analyses that utilize both taxonomic and predicted functional abundances typically independently characterize the taxonomic and functional profiles before linking them to sample information. This prevents investigators from identifying the specific functional components associate with which subsets of co-occurring taxa.ResultsWe provide an approach to explore co-occurring taxa using “topics” generated via a topic model and then link these topics to specific sample classes (e.g., diseased versus healthy). Rather than inferring predicted functional content independently from taxonomic abundances, we instead focus on inference of functional content within topics, which we parse by estimating pathway-topic interactions through a multilevel, fully Bayesian regression model. We apply our methods to two large publically available 16S amplicon sequencing datasets: an inflammatory bowel disease (IBD) dataset from Gevers et al. and data from the American Gut (AG) project. When applied to the Gevers et al. IBD study, we demonstrate that a topic highly associated with Crohn’s disease (CD) diagnosis is (1) dominated by a cluster of bacteria known to be linked with CD and (2) uniquely enriched for a subset of lipopolysaccharide (LPS) synthesis genes. In the AG data, our approach found that individuals with plant-based diets were enriched with Lachnospiraceae, Roseburia, Blautia, and Ruminococcaceae, as well as fluorobenzoate degradation pathways, whereas pathways involved in LPS biosynthesis were depleted.ConclusionsWe introduce an approach for uncovering latent thematic structure in the context of sample features for 16S rRNA surveys. Using our topic-model approach, investigators can (1) capture groups of co-occurring taxa termed topics, (2) uncover within-topic functional potential, and (3) identify gene sets that may guide future inquiry. These methods have been implemented in a freely available R package https://github.com/EESI/themetagenomics.


Author(s):  
Nicholas A Bokulich ◽  
Jai Ram Rideout ◽  
Evguenia Kopylova ◽  
Evan Bolyen ◽  
Jessica Patnode ◽  
...  

Background: Taxonomic classification of marker-gene (i.e., amplicon) sequences represents an important step for molecular identification of microorganisms. Results: We present three advances in our ability to assign and interpret taxonomic classifications of short marker gene sequences: two new methods for taxonomy assignment, which reduce runtime up to two-fold and achieve high precision genus-level assignments; an evaluation of classification methods that highlights differences in performance with different marker genes and at different levels of taxonomic resolution; and an extensible framework for evaluating and optimizing new classification methods, which we hope will serve as a model for standardized and reproducible bioinformatics methods evaluations. Conclusions: Our new methods are accessible in QIIME 1.9.0, and our evaluation framework will support ongoing optimization of classification methods to complement rapidly evolving short-amplicon sequencing and bioinformatics technologies. Static versions of all of the analysis notebooks generated with this framework, which contain all code and analysis results, can be viewed at http://bit.ly/srta-010.


PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6174 ◽  
Author(s):  
Paul Greenfield ◽  
Nai Tran-Dinh ◽  
David Midgley

Introduction Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.


Author(s):  
Nicholas A Bokulich ◽  
Jai Ram Rideout ◽  
Evguenia Kopylova ◽  
Evan Bolyen ◽  
Jessica Patnode ◽  
...  

Background: Taxonomic classification of marker-gene (i.e., amplicon) sequences represents an important step for molecular identification of microorganisms. Results: We present three advances in our ability to assign and interpret taxonomic classifications of short marker gene sequences: two new methods for taxonomy assignment, which reduce runtime up to two-fold and achieve high-precision genus-level assignments; an evaluation of classification methods that highlights differences in performance with different marker genes and at different levels of taxonomic resolution; and an extensible framework for evaluating and optimizing new classification methods, which we hope will serve as a model for standardized and reproducible bioinformatics methods evaluations. Conclusions: Our new methods are accessible in QIIME 1.9.0, and our evaluation framework will support ongoing optimization of classification methods to complement rapidly evolving short-amplicon sequencing and bioinformatics technologies. Static versions of all of the analysis notebooks generated with this framework, which contain all code and analysis results, can be viewed at http://bit.ly/srta-012 .


Author(s):  
Nicholas A Bokulich ◽  
Jai Ram Rideout ◽  
Evguenia Kopylova ◽  
Evan Bolyen ◽  
Jessica Patnode ◽  
...  

Background: Taxonomic classification of marker-gene (i.e., amplicon) sequences represents an important step for molecular identification of microorganisms. Results: We present three advances in our ability to assign and interpret taxonomic classifications of short marker gene sequences: two new methods for taxonomy assignment, which reduce runtime up to two-fold and achieve high-precision genus-level assignments; an evaluation of classification methods that highlights differences in performance with different marker genes and at different levels of taxonomic resolution; and an extensible framework for evaluating and optimizing new classification methods, which we hope will serve as a model for standardized and reproducible bioinformatics methods evaluations. Conclusions: Our new methods are accessible in QIIME 1.9.0, and our evaluation framework will support ongoing optimization of classification methods to complement rapidly evolving short-amplicon sequencing and bioinformatics technologies. Static versions of all of the analysis notebooks generated with this framework, which contain all code and analysis results, can be viewed at http://bit.ly/srta-012 .


2021 ◽  
Vol 12 ◽  
Author(s):  
Petra Pjevac ◽  
Bela Hausmann ◽  
Jasmin Schwarz ◽  
Gudrun Kohl ◽  
Craig W. Herbold ◽  
...  

In microbiome research, phylogenetic and functional marker gene amplicon sequencing is the most commonly-used community profiling approach. Consequently, a plethora of protocols for the preparation and multiplexing of samples for amplicon sequencing have been developed. Here, we present two economical high-throughput gene amplification and sequencing workflows that are implemented as standard operating procedures at the Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna. These workflows are based on a previously-published two-step PCR approach, but have been updated to either increase the accuracy of results, or alternatively to achieve orders of magnitude higher numbers of samples to be multiplexed in a single sequencing run. The high-accuracy workflow relies on unique dual sample barcoding. It allows the same level of sample multiplexing as the previously-published two-step PCR approach, but effectively eliminates residual read missasignments between samples (crosstalk) which are inherent to single barcoding approaches. The high-multiplexing workflow is based on combinatorial dual sample barcoding, which theoretically allows for multiplexing up to 299,756 amplicon libraries of the same target gene in a single massively-parallelized amplicon sequencing run. Both workflows presented here are highly economical, easy to implement, and can, without significant modifications or cost, be applied to any target gene of interest.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1612 ◽  
Author(s):  
Zachery T. Lewis ◽  
Jasmine C.C. Davis ◽  
Jennifer T. Smilowitz ◽  
J. Bruce German ◽  
Carlito B. Lebrilla ◽  
...  

Infant fecal samples are commonly studied to investigate the impacts of breastfeeding on the development of the microbiota and subsequent health effects. Comparisons of infants living in different geographic regions and environmental contexts are needed to aid our understanding of evolutionarily-selected milk adaptations. However, the preservation of fecal samples from individuals in remote locales until they can be processed can be a challenge. Freeze-drying (lyophilization) offers a cost-effective way to preserve some biological samples for transport and analysis at a later date. Currently, it is unknown what, if any, biases are introduced into various analyses by the freeze-drying process. Here, we investigated how freeze-drying affected analysis of two relevant and intertwined aspects of infant fecal samples, marker gene amplicon sequencing of the bacterial community and the fecal oligosaccharide profile (undigested human milk oligosaccharides). No differences were discovered between the fecal oligosaccharide profiles of wet and freeze-dried samples. The marker gene sequencing data showed an increase in proportional representation ofBacteriodesand a decrease in detection of bifidobacteria and members of class Bacilli after freeze-drying. This sample treatment bias may possibly be related to the cell morphology of these different taxa (Gram status). However, these effects did not overwhelm the natural variation among individuals, as the community data still strongly grouped by subject and not by freeze-drying status. We also found that compensating for sample concentration during freeze-drying, while not necessary, was also not detrimental. Freeze-drying may therefore be an acceptable method of sample preservation and mass reduction for some studies of microbial ecology and milk glycan analysis.


Author(s):  
Lauren V. Alteio ◽  
Joana Séneca ◽  
Alberto Canarini ◽  
Roey Angel ◽  
Ksenia Guseva ◽  
...  

Microbial community analysis via marker gene amplicon sequencing has become a routine method in the field of soil research. In this perspective, we discuss technical challenges and limitations of amplicon sequencing studies in soil and present statistical and experimental approaches that can help addressing the spatio-temporal complexity of soil and the high diversity of organisms therein. We illustrate the impact of compositionality on the interpretation of relative abundance data and discuss effects of sample replication on the statistical power in soil community analysis. Additionally, we argue for the need of increased study reproducibility and data availability, as well as complementary techniques for generating deeper ecological insights into microbial roles and our understanding thereof in soil ecosystems. At this stage, we call upon researchers and specialized soil journals to consider the current state of data analysis, interpretation and availability to improve the rigor of future studies.


2018 ◽  
Author(s):  
Paul Greenfield ◽  
Nai Tran-Dinh ◽  
David Midgley

Introduction. Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods. Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results. The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions. Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.


Sign in / Sign up

Export Citation Format

Share Document