scholarly journals Genome-based targeted sequencing as a reproducible microbial community profiling assay

2020 ◽  
Author(s):  
Jacquelynn Benjamino ◽  
Benjamin Leopold ◽  
Daniel Phillips ◽  
Mark D. Adams

AbstractCurrent sequencing-based methods for profiling microbial communities rely on marker gene (e.g. 16S rRNA) or metagenome shotgun sequencing (mWGS) analysis. We present a new approach based on highly multiplexed oligonucleotide probes designed from reference genomes in a pooled primer-extension reaction during library construction to derive relative abundance data. This approach, termed MA-GenTA: Microbial Abundances from Genome Tagged Analysis, enables quantitative, straightforward, cost-effective microbiome profiling that combines desirable features of both 16S rRNA and mWGS strategies. To test the utility of the MA-GenTA assay, probes were designed for 830 genome sequences representing bacteria present in mouse stool specimens. Comparison of the MA-GenTA data with mWGS data demonstrated excellent correlation down to 0.01% relative abundance and a similar number of organisms detected per sample. Despite the incompleteness of the reference database, NMDS clustering based on the Bray-Curtis dissimilarity metric of sample groups was consistent between MA-GenTA, mWGS and 16S rRNA datasets. MA-GenTA represents a potentially useful new method for microbiome community profiling based on reference genomes.

mSphere ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
Jacquelynn Benjamino ◽  
Benjamin Leopold ◽  
Daniel Phillips ◽  
Mark D. Adams

ABSTRACT Current sequencing-based methods for profiling microbial communities rely on marker gene (e.g., 16S rRNA) or metagenome shotgun sequencing (mWGS) analysis. We present an approach based on a single-primer extension reaction using a highly multiplexed oligonucleotide probe pool. This approach, termed MA-GenTA (microbial abundances from genome tagged analysis), enables quantitative, straightforward, cost-effective microbiome profiling that combines desirable features of both 16S rRNA and mWGS strategies. The use of multiple probes per target genome and rigorous probe design criteria enabled robust determination of relative abundance. To test the utility of the MA-GenTA assay, probes were designed for 830 genome sequences representing bacteria present in mouse stool specimens. Comparison of the MA-GenTA data with mWGS data demonstrated excellent correlation down to 0.01% relative abundance and a similar number of organisms detected per sample. Despite the incompleteness of the reference database, nonmetric multidimensional scaling (NMDS) clustering based on the Bray-Curtis dissimilarity metric of sample groups was consistent between MA-GenTA, mWGS, and 16S rRNA data sets. MA-GenTA represents a potentially useful new method for microbiome community profiling based on reference genomes. IMPORTANCE New methods for profiling the microbial communities can create new approaches to understanding the composition and function of those communities. In this study, we combined bacterial genome-specific probe design with a highly multiplexed single primer extension reaction as a new method to profile microbial communities, using stool from various mouse strains as a test case. This method, termed MA-GenTA, was benchmarked against 16S rRNA gene sequencing and metagenome sequencing methods and delivered similar relative abundance and clustering data. Since the probes were generated from reference genomes, MA-GenTA was also able to provide functional pathway data for the stool microbiome in the assayed samples. The method is more informative than 16S rRNA analysis while being less costly than metagenome shotgun sequencing.


2014 ◽  
Author(s):  
Jai Ram Rideout ◽  
Yan He ◽  
Jose Antonio Navas-Molina ◽  
William A Walters ◽  
Luke K Ursell ◽  
...  

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to “classic” open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, “classic” open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of “classic” open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “classic” open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME’s uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME’s OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.


2014 ◽  
Author(s):  
Jai Ram Rideout ◽  
Yan He ◽  
Jose Antonio Navas-Molina ◽  
William A Walters ◽  
Luke K Ursell ◽  
...  

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because parts of our algorithm can be run in parallel, it makes open-reference OTU picking tractable on massive amplicon sequence data sets. We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “legacy” open-reference OTU picking, where less of the process can be parallelized, through comparisons on three well-studied datasets. We therefore recommend that subsampled open-reference OTU picking always be applied in favor of “legacy” open-reference OTU picking. An implementation of this algorithm is provided in the popular QIIME software package. Finally, we present a comparison of parameter settings in QIIME’s OTU picking workflows and make recommendations on settings for these free parameters.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4652 ◽  
Author(s):  
Robert C. Edgar

Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal.


2014 ◽  
Author(s):  
Jai Ram Rideout ◽  
Yan He ◽  
Jose Antonio Navas-Molina ◽  
William A Walters ◽  
Luke K Ursell ◽  
...  

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to “classic” open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, “classic” open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of “classic” open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “classic” open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME’s uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME’s OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.


2019 ◽  
Vol 11 (3) ◽  
pp. 228-234 ◽  
Author(s):  
Lawrence Gray ◽  
Kyoko Hasebe ◽  
Martin O’Hely ◽  
Anne-Louise Ponsonby ◽  
Peter Vuillermin ◽  
...  

AbstractGut bacteria from the genus Prevotella are found in high abundance in faeces of non-industrialised communities but low abundance in industrialised, Westernised communities. Prevotella copri is one of the principal Prevotella species within the human gut. As it has been associated with developmental health and disease states, we sought to (i) develop a real-time polymerase chain reaction (PCR) to rapidly determine P. copri abundance and (ii) investigate its abundance in a large group of Australian pregnant mothers.The Barwon Infant Study is a pre-birth cohort study (n = 1074). Faecal samples were collected from mothers at 36 weeks gestation. Primers with a probe specific to the V3 region of P. copri 16S rRNA gene were designed and optimised for real-time PCR. Universal 16S rRNA gene primers amplified pan-bacterial DNA in parallel. Relative abundance of P. copri was calculated using a 2-ΔCt method.Relative abundance of P. copri by PCR was observed in 165/605 (27.3%) women. The distribution was distinctly bimodal, defining women with substantial (n = 115/165, 69.7%) versus very low P. copri expression (n = 50/165, 30.3%). In addition, abundance of P. copri by PCR correlated with 16S rRNA gene MiSeq sequencing data (r2 = 0.67, P < 0.0001, n = 61).We have developed a rapid and cost-effective technique for identifying the relative abundance of P. copri using real-time PCR. The expression of P. copri was evident in only a quarter of the mothers, and either at substantial or very low levels. PCR detection of P. copri may facilitate assessment of this species in large, longitudinal studies across multiple populations and in various clinical settings.


2019 ◽  
Author(s):  
Gavin M. Douglas ◽  
Vincent J. Maffei ◽  
Jesse Zaneveld ◽  
Svetlana N. Yurgel ◽  
James R. Brown ◽  
...  

One major limitation of microbial community marker gene sequencing is that it does not provide direct information on the functional composition of sampled communities. Here, we present PICRUSt2 (https://github.com/picrust/picrust2), which expands the capabilities of the original PICRUSt method1 to predict the functional potential of a community based on marker gene sequencing profiles. This updated method and implementation includes several improvements over the previous algorithm: an expanded database of gene families and reference genomes, a new approach now compatible with any OTU-picking or denoising algorithm, and novel phenotype predictions. Upon evaluation, PICRUSt2 was more accurate than PICRUSt1 and other current approaches overall. PICRUSt2 is also now more flexible and allows the addition of custom reference databases. We highlight these improvements and also important caveats regarding the use of predicted metagenomes, which are related to the inherent challenges of analyzing metagenome data in general.


2019 ◽  
Vol 2019 (4) ◽  
pp. 7-22
Author(s):  
Georges Bridel ◽  
Zdobyslaw Goraj ◽  
Lukasz Kiszkowiak ◽  
Jean-Georges Brévot ◽  
Jean-Pierre Devaux ◽  
...  

Abstract Advanced jet training still relies on old concepts and solutions that are no longer efficient when considering the current and forthcoming changes in air combat. The cost of those old solutions to develop and maintain combat pilot skills are important, adding even more constraints to the training limitations. The requirement of having a trainer aircraft able to perform also light combat aircraft operational mission is adding unnecessary complexity and cost without any real operational advantages to air combat mission training. Thanks to emerging technologies, the JANUS project will study the feasibility of a brand-new concept of agile manoeuvrable training aircraft and an integrated training system, able to provide a live, virtual and constructive environment. The JANUS concept is based on a lightweight, low-cost, high energy aircraft associated to a ground based Integrated Training System providing simulated and emulated signals, simulated and real opponents, combined with real-time feedback on pilot’s physiological characteristics: traditionally embedded sensors are replaced with emulated signals, simulated opponents are proposed to the pilot, enabling out of sight engagement. JANUS is also providing new cost effective and more realistic solutions for “Red air aircraft” missions, organised in so-called “Aggressor Squadrons”.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Eric J. Raes ◽  
Kristen Karsh ◽  
Swan L. S. Sow ◽  
Martin Ostrowski ◽  
Mark V. Brown ◽  
...  

AbstractGlobal oceanographic monitoring initiatives originally measured abiotic essential ocean variables but are currently incorporating biological and metagenomic sampling programs. There is, however, a large knowledge gap on how to infer bacterial functions, the information sought by biogeochemists, ecologists, and modelers, from the bacterial taxonomic information (produced by bacterial marker gene surveys). Here, we provide a correlative understanding of how a bacterial marker gene (16S rRNA) can be used to infer latitudinal trends for metabolic pathways in global monitoring campaigns. From a transect spanning 7000 km in the South Pacific Ocean we infer ten metabolic pathways from 16S rRNA gene sequences and 11 corresponding metagenome samples, which relate to metabolic processes of primary productivity, temperature-regulated thermodynamic effects, coping strategies for nutrient limitation, energy metabolism, and organic matter degradation. This study demonstrates that low-cost, high-throughput bacterial marker gene data, can be used to infer shifts in the metabolic strategies at the community scale.


Sign in / Sign up

Export Citation Format

Share Document