scholarly journals Meta-Apo improves accuracy of 16S-amplicon-based prediction of microbiome function

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Yufeng Zhang ◽  
Wenzhi Cui ◽  
Lu Liu ◽  
Jian Xu ◽  
...  

Abstract Background Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. Results Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. Conclusions This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub (https://github.com/qibebt-bioinfo/meta-apo) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.

2021 ◽  
Vol 12 ◽  
Author(s):  
Wenyi Xu ◽  
Tianda Chen ◽  
Yuwei Pei ◽  
Hao Guo ◽  
Zhuanyu Li ◽  
...  

Characterization of the bacterial composition and functional repertoires of microbiome samples is the most common application of metagenomics. Although deep whole-metagenome shotgun sequencing (WMS) provides high taxonomic resolution, it is generally cost-prohibitive for large longitudinal investigations. Until now, 16S rRNA gene amplicon sequencing (16S) has been the most widely used approach and usually cooperates with WMS to achieve cost-efficiency. However, the accuracy of 16S results and its consistency with WMS data have not been fully elaborated, especially by complicated microbiomes with defined compositional information. Here, we constructed two complex artificial microbiomes, which comprised more than 60 human gut bacterial species with even or varied abundance. Utilizing real fecal samples and mock communities, we provided solid evidence demonstrating that 16S results were of poor consistency with WMS data, and its accuracy was not satisfactory. In contrast, shallow whole-metagenome shotgun sequencing (shallow WMS, S-WMS) with a sequencing depth of 1 Gb provided outputs that highly resembled WMS data at both genus and species levels and presented much higher accuracy taxonomic assignments and functional predictions than 16S, thereby representing a better and cost-efficient alternative to 16S for large-scale microbiome studies.


2021 ◽  
Author(s):  
Jonas Greve Lauritsen ◽  
Morten Lindqvist Hansen ◽  
Pernille Kjersgaard Bech ◽  
Lars Jelsbak ◽  
Lone Gram ◽  
...  

Species of the genus Pseudomonas are used for several biotechnological purposes, including plant biocontrol and bioremediation. To exploit the Pseudomonas genus in environmental, agricultural or industrial settings, the organisms must be profiled at species level as their bioactivity potential differs markedly between species. Standard 16S rRNA gene amplicon profiling does not allow for accurate species differentiation. Thus, the purpose of this study was to develop an amplicon-based high-resolution method targeting a 760 nt region of the rpoD gene enabling taxonomic differentiation of Pseudomonas species in soil samples. The method was benchmarked on a sixteen membered Pseudomonas species mock community. All 16 species were correctly and semi-quantitatively identified using rpoD gene amplicons, whereas 16S rRNA V3V4 amplicon sequencing only correctly identified one species. We analysed the Pseudomonas profile in thirteen soil samples in northern Zealand, Denmark, where samples were collected from grassland (3 samples) and agriculture soil (10 samples). Pseudomonas species represented up to 0.7% of the microbial community, of which each sampling site contained a unique Pseudomonas composition. Thirty culturable Pseudomonas strains were isolated from each grassland site and ten from each agriculture site and identified by Sanger sequencing of the rpoD gene. In all cases, the rpoD-amplicon approach identified more species than found by cultivation, including hard-to-culture non-fluorescent pseudomonads, as well as more than found by 16S rRNA V3V4 amplicon sequencing. Thus, rpoD profiling can be used for species profiling of Pseudomonas, and large scale prospecting of bioactive Pseudomonas may be guided by initial screening using this method.


2020 ◽  
Author(s):  
Tung Dang ◽  
Hirohisa Kishino

AbstractA central focus of microbiome studies is the characterization of differences in the microbiome composition across groups of samples. A major challenge is the high dimensionality of microbiome datasets, which significantly reduces the power of current approaches for identifying true differences and increases the chance of false discoveries. We have developed a new framework to address these issues by combining (i) identifying a few significant features by a massively parallel forward variable selection procedure, (ii) mapping the selected species on a phylogenetic tree, and (iii) predicting functional profiles by functional gene enrichment analysis from metagenomic 16S rRNA data. We demonstrated the performance of the proposed approach by analyzing two published datasets from large-scale case-control studies: (i) 16S rRNA gene amplicon data for Clostridioides difficile infection (CDI) and (ii) shotgun metagenomics data for human colorectal cancer (CRC). The proposed approach improved the accuracy from 81% to 99.01% for CDI and from 75.14% to 90.17% for CRC. We identified a core set of 96 species that were significantly enriched in CDI and a core set of 75 species that were enriched in CRC. Moreover, although the quality of the data differed for the functional profiles predicted from the 16S rRNA dataset and functional metagenome profiling, our approach performed well for both databases and detected main functions that can be used to diagnose and study further the growth stage of diseases.Supplementary informationHirohisa Kishino: [email protected] Dang: [email protected]


Author(s):  
Yoshiyuki Matsuo ◽  
Shinnosuke Komiya ◽  
Yoshiaki Yasumizu ◽  
Yuki Yasuoka ◽  
Katsura Mizushima ◽  
...  

AbstractBackgroundSpecies-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples.ResultsWe modified our existing protocol for full-length 16S amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition.ConclusionsOur present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene, which provided the requisite species-level resolution and accuracy in clinical settings.


2021 ◽  
Vol 12 ◽  
Author(s):  
Suhyun Kim ◽  
Md. Rashedul Islam ◽  
Ilnam Kang ◽  
Jang-Cheon Cho

Although many culture-independent molecular analyses have elucidated a great diversity of freshwater bacterioplankton, the ecophysiological characteristics of several abundant freshwater bacterial groups are largely unknown due to the scarcity of cultured representatives. Therefore, a high-throughput dilution-to-extinction culturing (HTC) approach was implemented herein to enable the culture of these bacterioplankton lineages using water samples collected at various seasons and depths from Lake Soyang, an oligotrophic reservoir located in South Korea. Some predominant freshwater bacteria have been isolated from Lake Soyang via HTC (e.g., the acI lineage); however, large-scale HTC studies encompassing different seasons and water depths have not been documented yet. In this HTC approach, bacterial growth was detected in 14% of 5,376 inoculated wells. Further, phylogenetic analyses of 16S rRNA genes from a total of 605 putatively axenic bacterial cultures indicated that the HTC isolates were largely composed of Actinobacteria, Bacteroidetes, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, and Verrucomicrobia. Importantly, the isolates were distributed across diverse taxa including phylogenetic lineages that are widely known cosmopolitan and representative freshwater bacterial groups such as the acI, acIV, LD28, FukuN57, MNG9, and TRA3–20 lineages. However, some abundant bacterial groups including the LD12 lineage, Chloroflexi, and Acidobacteria could not be domesticated. Among the 71 taxonomic groups in the HTC isolates, representative strains of 47 groups could either form colonies on agar plates or be revived from frozen glycerol stocks. Additionally, season and water depth significantly affected bacterial community structure, as demonstrated by 16S rRNA gene amplicon sequencing analyses. Therefore, our study successfully implemented a dilution-to-extinction cultivation strategy to cultivate previously uncultured or underrepresented freshwater bacterial groups, thus expanding the basis for future multi-omic studies.


2017 ◽  
Author(s):  
Eric J. de Muinck ◽  
Pål Trosvik ◽  
Gregor D. Gilfillan ◽  
Arvind Y. M. Sundaram

AbstractBackgroundAdvances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost, and benchmarking the techniques so that potential sources of bias can be better characterized.ResultsWe present a triple-index amplicon sequencing strategy that uses a two-stage PCR protocol. The strategy was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost.ConclusionsHere, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Janis R. Bedarf ◽  
Naiara Beraza ◽  
Hassan Khazneh ◽  
Ezgi Özkurt ◽  
David Baker ◽  
...  

Abstract Background Recent studies suggested the existence of (poly-)microbial infections in human brains. These have been described either as putative pathogens linked to the neuro-inflammatory changes seen in Parkinson’s disease (PD) and Alzheimer’s disease (AD) or as a “brain microbiome” in the context of healthy patients’ brain samples. Methods Using 16S rRNA gene sequencing, we tested the hypothesis that there is a bacterial brain microbiome. We evaluated brain samples from healthy human subjects and individuals suffering from PD (olfactory bulb and pre-frontal cortex), as well as murine brains. In line with state-of-the-art recommendations, we included several negative and positive controls in our analysis and estimated total bacterial biomass by 16S rRNA gene qPCR. Results Amplicon sequencing did detect bacterial signals in both human and murine samples, but estimated bacterial biomass was extremely low in all samples. Stringent reanalyses implied bacterial signals being explained by a combination of exogenous DNA contamination (54.8%) and false positive amplification of host DNA (34.2%, off-target amplicons). Several seemingly brain-enriched microbes in our dataset turned out to be false-positive signals upon closer examination. We identified off-target amplification as a major confounding factor in low-bacterial/high-host-DNA scenarios. These amplified human or mouse DNA sequences were clustered and falsely assigned to bacterial taxa in the majority of tested amplicon sequencing pipelines. Off-target amplicons seemed to be related to the tissue’s sterility and could also be found in independent brain 16S rRNA gene sequences. Conclusions Taxonomic signals obtained from (extremely) low biomass samples by 16S rRNA gene sequencing must be scrutinized closely to exclude the possibility of off-target amplifications, amplicons that can only appear enriched in biological samples, but are sometimes assigned to bacterial taxa. Sequences must be explicitly matched against any possible background genomes present in large quantities (i.e., the host genome). Using close scrutiny in our approach, we find no evidence supporting the hypothetical presence of either a brain microbiome or a bacterial infection in PD brains.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Eric J. Raes ◽  
Kristen Karsh ◽  
Swan L. S. Sow ◽  
Martin Ostrowski ◽  
Mark V. Brown ◽  
...  

AbstractGlobal oceanographic monitoring initiatives originally measured abiotic essential ocean variables but are currently incorporating biological and metagenomic sampling programs. There is, however, a large knowledge gap on how to infer bacterial functions, the information sought by biogeochemists, ecologists, and modelers, from the bacterial taxonomic information (produced by bacterial marker gene surveys). Here, we provide a correlative understanding of how a bacterial marker gene (16S rRNA) can be used to infer latitudinal trends for metabolic pathways in global monitoring campaigns. From a transect spanning 7000 km in the South Pacific Ocean we infer ten metabolic pathways from 16S rRNA gene sequences and 11 corresponding metagenome samples, which relate to metabolic processes of primary productivity, temperature-regulated thermodynamic effects, coping strategies for nutrient limitation, energy metabolism, and organic matter degradation. This study demonstrates that low-cost, high-throughput bacterial marker gene data, can be used to infer shifts in the metabolic strategies at the community scale.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Sandra Reitmeier ◽  
Thomas C. A. Hitch ◽  
Nicole Treichel ◽  
Nikolaos Fikas ◽  
Bela Hausmann ◽  
...  

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Yusuke Okazaki ◽  
Shohei Fujinaga ◽  
Michaela M. Salcher ◽  
Cristiana Callieri ◽  
Atsushi Tanaka ◽  
...  

Abstract Background Freshwater ecosystems are inhabited by members of cosmopolitan bacterioplankton lineages despite the disconnected nature of these habitats. The lineages are delineated based on > 97% 16S rRNA gene sequence similarity, but their intra-lineage microdiversity and phylogeography, which are key to understanding the eco-evolutional processes behind their ubiquity, remain unresolved. Here, we applied long-read amplicon sequencing targeting nearly full-length 16S rRNA genes and the adjacent ribosomal internal transcribed spacer sequences to reveal the intra-lineage diversities of pelagic bacterioplankton assemblages in 11 deep freshwater lakes in Japan and Europe. Results Our single nucleotide-resolved analysis, which was validated using shotgun metagenomic sequencing, uncovered 7–101 amplicon sequence variants for each of the 11 predominant bacterial lineages and demonstrated sympatric, allopatric, and temporal microdiversities that could not be resolved through conventional approaches. Clusters of samples with similar intra-lineage population compositions were identified, which consistently supported genetic isolation between Japan and Europe. At a regional scale (up to hundreds of kilometers), dispersal between lakes was unlikely to be a limiting factor, and environmental factors or genetic drift were potential determinants of population composition. The extent of microdiversification varied among lineages, suggesting that highly diversified lineages (e.g., Iluma-A2 and acI-A1) achieve their ubiquity by containing a consortium of genotypes specific to each habitat, while less diversified lineages (e.g., CL500-11) may be ubiquitous due to a small number of widespread genotypes. The lowest extent of intra-lineage diversification was observed among the dominant hypolimnion-specific lineage (CL500-11), suggesting that their dispersal among lakes is not limited despite the hypolimnion being a more isolated habitat than the epilimnion. Conclusions Our novel approach complemented the limited resolution of short-read amplicon sequencing and limited sensitivity of the metagenome assembly-based approach, and highlighted the complex ecological processes underlying the ubiquity of freshwater bacterioplankton lineages. To fully exploit the performance of the method, its relatively low read throughput is the major bottleneck to be overcome in the future.


Sign in / Sign up

Export Citation Format

Share Document