scholarly journals Consistent and correctable bias in metagenomic sequencing experiments

2019 ◽  
Author(s):  
Michael R. McLaren ◽  
Amy D. Willis ◽  
Benjamin J. Callahan

AbstractMeasurements of biological communities by marker-gene and metagenomic sequencing are biased: The measured relative abundances of taxa or their genes are systematically distorted from their true values because each step in the experimental workflow preferentially detects some taxa over others. Bias can lead to qualitatively incorrect conclusions and makes measurements from different protocols quantitatively incomparable. A rigorous understanding of bias is therefore essential. Here we propose, test, and apply a simple mathematical model of how bias distorts marker-gene and metagenomics measurements: Bias multiplies the true relative abundances within each sample by taxon-and protocol-specific factors that describe the different efficiencies with which taxa are detected by the workflow. Critically, these factors are consistent across samples with different compositions, allowing bias to be estimated and corrected. We validate this model in 16S rRNA gene and shotgun metagenomics data from bacterial communities with defined compositions. We use it to reason about the effects of bias on downstream statistical analyses, finding that analyses based on taxon ratios are less sensitive to bias than analyses based on taxon proportions. Finally, we demonstrate how this model can be used to quantify bias from samples of defined composition, partition bias into steps such as DNA extraction and PCR amplification, and to correct biased measurements. Our model improves on previous models by providing a better fit to experimental data and by providing a composition-independent approach to analyzing, measuring, and correcting bias.

eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Michael R McLaren ◽  
Amy D Willis ◽  
Benjamin J Callahan

Marker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased toward detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues toward truly quantitative and reproducible metagenomics measurements.


mSystems ◽  
2020 ◽  
Vol 5 (4) ◽  
Author(s):  
Ganesh Babu Malli Mohan ◽  
Ceth W. Parker ◽  
Camilla Urbaniak ◽  
Nitin K. Singh ◽  
Anthony Hood ◽  
...  

ABSTRACT Microbial contamination during long-term confinements of space exploration presents potential risks for both crew members and spacecraft life support systems. A novel swab kit was used to sample various surfaces from a submerged, closed, analog habitat to characterize the microbial populations. Samples were collected from various locations across the habitat which were constructed from various surface materials (linoleum, dry wall, particle board, glass, and metal), and microbial populations were examined by culture, quantitative PCR (qPCR), microbiome 16S rRNA gene sequencing, and shotgun metagenomics. Propidium monoazide (PMA)-treated samples identified the viable/intact microbial population of the habitat. The cultivable microbial population ranged from below the detection limit to 106 CFU/sample, and their identity was characterized using Sanger sequencing. Both 16S rRNA amplicon and shotgun sequencing were used to characterize the microbial dynamics, community profiles, and functional attributes (metabolism, virulence, and antimicrobial resistance). The 16S rRNA amplicon sequencing revealed abundance of viable (after PMA treatment) Actinobacteria (Brevibacterium, Nesternkonia, Mycobacterium, Pseudonocardia, and Corynebacterium), Firmicutes (Virgibacillus, Staphylococcus, and Oceanobacillus), and Proteobacteria (especially Acinetobacter) on linoleum, dry wall, and particle board (LDP) surfaces, while members of Firmicutes (Leuconostocaceae) and Proteobacteria (Enterobacteriaceae) were high on the glass/metal surfaces. Nonmetric multidimensional scaling determined from both 16S rRNA and metagenomic analyses revealed differential microbial species on LDP surfaces and glass/metal surfaces. The shotgun metagenomic sequencing of samples after PMA treatment showed bacterial predominance of viable Brevibacterium (53.6%), Brachybacterium (7.8%), Pseudonocardia (9.9%), Mycobacterium (3.7%), and Staphylococcus (2.1%), while fungal analyses revealed Aspergillus and Penicillium dominance. IMPORTANCE This study provides the first assessment of monitoring cultivable and viable microorganisms on surfaces within a submerged, closed, analog habitat. The results of the analyses presented herein suggest that the surface material plays a role in microbial community structure, as the microbial populations differed between LDP and metal/glass surfaces. The metal/glass surfaces had less-complex community, lower bioburden, and more closely resembled the controls. These results indicated that material choice is crucial when building closed habitats, even if they are simply analogs. Finally, while a few species were associated with previously cultivated isolates from the International Space Station and MIR spacecraft, the majority of the microbial ecology of the submerged analog habitat differs greatly from that of previously studied analog habitats.


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 331
Author(s):  
Nachon Raethong ◽  
Massalin Nakphaichit ◽  
Narissara Suratannon ◽  
Witida Sathitkowitchai ◽  
Wanlapa Weerapakorn ◽  
...  

The gut microbiome plays a major role in the maintenance of human health. Characterizing the taxonomy and metabolic functions of the human gut microbiome is necessary for enhancing health. Here, we analyzed the metagenomic sequencing, assembly and construction of a meta-gene catalogue of the human gut microbiome with the overall aim of investigating the taxonomy and metabolic functions of the gut microbiome in Thai adults. As a result, the integrative analysis of 16S rRNA gene and whole metagenome shotgun (WMGS) sequencing data revealed that the dominant gut bacterial families were Lachnospiraceae and Ruminococcaceae of the Firmicutes phylum. Consistently, across 3.8 million (M) genes annotated from 163.5 gigabases (Gb) of WMGS sequencing data, a significant number of genes associated with carbohydrate metabolism of the dominant bacterial families were identified. Further identification of bacterial community-wide metabolic functions promisingly highlighted the importance of Roseburia and Faecalibacterium involvement in central carbon metabolism, sugar utilization and metabolism towards butyrate biosynthesis. This work presents an initial study of shotgun metagenomics in a Thai population-based cohort in a developing Southeast Asian country.


2020 ◽  
Author(s):  
Megan Sarah Beaudry ◽  
Jincheng Wang ◽  
Troy Kieran ◽  
Jesse Thomas ◽  
Natalia Juliana Bayona-Vasquez ◽  
...  

Environmental microbial diversity is often investigated from a molecular perspective using 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics. While amplicon methods are fast, low-cost, and have curated reference databases, they can suffer from amplification bias and are limited in genomic scope. In contrast, shotgun metagenomic methods sample more genomic regions with fewer sequence acquisition biases. However, shotgun metagenomic sequencing is much more expensive (even with moderate sequencing depth) and computationally challenging. Here, we develop a set of 16S rRNA sequence capture baits that offer a potential middle ground with the advantages from both approaches for investigating microbial communities. These baits cover the diversity of all 16S rRNA sequences available in the Greengenes (v. 13.5) database, with no sequence having < 80% sequence similarity to at least one bait for all segments of 16S. The use of our baits provide comparable results to 16S amplicon libraries and shotgun metagenomic libraries when assigning taxonomic units from 16S sequences within the metagenomic reads. We demonstrate that 16S rRNA capture baits can be used on a range of microbial samples (i.e., mock communities and rodent fecal samples) to increase the proportion of 16S rRNA sequences (average >400-fold) and decrease analysis time to obtain consistent community assessments. Furthermore, our study reveals that bioinformatic methods used to analyze sequencing data may have a greater influence on estimates of community composition than library preparation method used, likely in part to the extent and curation of the reference databases considered.


2020 ◽  
Author(s):  
Po-E Li ◽  
Joseph A. Russell ◽  
David Yarmosh ◽  
Alan G. Shteyman ◽  
Kyle Parker ◽  
...  

ABSTRACTMetagenomics is emerging as an important tool in biosurveillance, public health, and clinical applications. However, ease-of-use for execution and data analysis remains a barrier-of-entry to the adoption of metagenomics in applied health and forensics settings. In addition, these venues often have more stringent requirements for reporting, accuracy, and precision than the traditional ecological research role of the technology. Here, we present PanGIA (Pan-Genomics for Infectious Agents), a novel bioinformatics analysis platform for hosting, processing, analyzing, and reporting shotgun metagenomics data of complex samples suspected of containing one or more pathogens. PanGIA was developed to address gaps that often preclude clinicians, medical technicians, forensics personnel, or other non-expert end-users from the routine application of metagenomics for pathogen identification. Though primarily designed to detect pathogenic microorganisms within clinical and environmental metagenomics data, PanGIA also serves as an analytical framework for microbial community profiling and comparative metagenomics. To provide statistical confidence in PanGIA’s taxonomic assignments, the system provides two independent estimations of probability for species and strain level detection. First, PanGIA integrates coverage data with ‘uniqueness’ information mapped across each reference genome for a stand-alone determination of confidence for each query sequence at each taxonomy level. Second, if a negative-control sample is provided, PanGIA compares this sample with a corresponding experimental unknown sample and determines a measure of confidence associated with ‘detection above background’. An integrated graphical user interface allows interactive interrogation and enables users to summarize multiple sample results by confidence score, normalized read abundance, reference genome linear coverage, depth-of-coverage, RPKM, and other metrics to detect specific organisms-of-interest. Comparison testing of the PanGIA algorithm against a number of recent k-mer, read-mapping, and marker-gene based taxonomy classifiers across various real-world datasets with spiked targets shows superior mean positive predictive value, sensitivity, and specificity. PanGIA can process a five million paired-end read dataset in under 1 hour on commodity computational hardware. The source code and documentation are publicly available at https://github.com/LANL-Bioinformatics/PanGIA or https://github.com/mriglobal/PanGIA. The database for PanGIA can be downloaded from ftp://bioinformatics.mriglobal.org/. The full GUI-based PanGIA analysis environment is available in a Docker container and can be installed from https://hub.docker.com/r/poeli/pangia/.


2020 ◽  
Vol 48 (16) ◽  
pp. e93-e93
Author(s):  
Anna Tovo ◽  
Peter Menzel ◽  
Anders Krogh ◽  
Marco Cosentino Lagomarsino ◽  
Samir Suweis

Abstract Characterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. Determining microbiomes diversity implies the assignment of individual reads to taxa by comparison to reference databases. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. In this study, we first apply and compare different bioinformatics methods based on 16S ribosomal RNA gene and shotgun sequencing to three mock communities of bacteria, of which the compositions are known. We show that none of these methods can infer both the true number of taxa and their abundances. We thus propose a novel approach, named Core-Kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to 16S, but based on emergent statistics of core protein domain families. We thus test the proposed method on various mock communities and we show that Core-Kaiju reliably predicts both number of taxa and abundances. Finally, we apply our method on human gut samples, showing how Core-Kaiju may give more accurate ecological characterization and a fresh view on real microbiomes.


2014 ◽  
Vol 80 (16) ◽  
pp. 5116-5123 ◽  
Author(s):  
Luisa W. Hugerth ◽  
Hugo A. Wefer ◽  
Sverker Lundin ◽  
Hedvig E. Jakobsson ◽  
Mathilda Lindberg ◽  
...  

ABSTRACTThe taxonomic composition of a microbial community can be deduced by analyzing its rRNA gene content by, e.g., high-throughput DNA sequencing or DNA chips. Such methods typically are based on PCR amplification of rRNA gene sequences using broad-taxonomic-range PCR primers. In these analyses, the use of optimal primers is crucial for achieving an unbiased representation of community composition. Here, we present the computer program DegePrime that, for each position of a multiple sequence alignment, finds a degenerate oligomer of as high coverage as possible and outputs its coverage among taxonomic divisions. We show that our novel heuristic, which we call weighted randomized combination, performs better than previously described algorithms for solving the maximum coverage degenerate primer design problem. We previously used DegePrime to design a broad-taxonomic-range primer pair that targets the bacterial V3-V4 region (341F-805R) (D. P. Herlemann, M. Labrenz, K. Jurgens, S. Bertilsson, J. J. Waniek, and A. F. Andersson, ISME J. 5:1571–1579, 2011,http://dx.doi.org/10.1038/ismej.2011.41), and here we use the program to significantly increase the coverage of a primer pair (515F-806R) widely used for Illumina-based surveys of bacterial and archaeal diversity. By comparison with shotgun metagenomics, we show that the primers give an accurate representation of microbial diversity in natural samples.


2020 ◽  
Author(s):  
Anna Tovo ◽  
Peter Menzel ◽  
Anders Krogh ◽  
Marco Cosentino Lagomarsino ◽  
Samir Suweis

ABSTRACTCharacterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. However, determining microbiomes diversity implies the classification of taxa composition within the sampled community, which is often done via the assignment of individual reads to taxa by comparison to reference databases. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. In this study, we first apply and compare different bioinformatics methods based on 16S ribosomal RNA gene and whole genome shotgun sequencing for taxonomic classification to three small mock communities of bacteria, of which the compositions are known. We show that none of these methods can infer both the true number of taxa and their abundances. We thus propose a novel approach, named Core-Kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to 16S, but based on emergent statistics of core protein domain families. We thus test the proposed method on the three small mock communities and also on medium- and highly complex mock community datasets taken from the Critical Assessment of Metagenome Interpretation challenge. We show that Core-Kaiju reliably predicts both number of taxa and abundance of the analysed mock bacterial communities. Finally we apply our method on human gut samples, showing how Core-Kaiju may give more accurate ecological characterization and fresh view on real microbiomes.


2021 ◽  
Author(s):  
Qiyun Zhu ◽  
Shi Huang ◽  
Antonio Gonzalez ◽  
Imran McGrath ◽  
Daniel McDonald ◽  
...  

We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies.


2017 ◽  
Author(s):  
Nicole M. Davis ◽  
Diana M. Proctor ◽  
Susan P. Holmes ◽  
David A. Relman ◽  
Benjamin J. Callahan

AbstractBackgroundThe accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants — DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples, and are often found in negative controls.Resultsdecontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome, and that some low-frequency taxa seemingly associated with preterm birth were contaminants.Conclusionsdecontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. decontam integrates easily with existing MGS workflows, and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.


Sign in / Sign up

Export Citation Format

Share Document