scholarly journals Ultrafast and accurate 16S microbial community analysis using Kraken 2

Author(s):  
Jennifer Lu ◽  
Steven L Salzberg

AbstractFor decades, 16S ribosomal RNA sequencing has been the primary means for identifying the bacterial species present in a sample with unknown composition. One of the most widely-used tools for this purpose today is the QIIME (Quantitative Insights Into Microbial Ecology) package. Recent results have shown that the newest release, QIIME 2, has higher accuracy than QIIME, MAPseq, and mothur when classifying bacterial genera from simulated human gut, ocean, and soil metagenomes, although QIIME 2 also proved to be the most computationally expensive method. Kraken, first released in 2014, has been shown to provide exceptionally fast and accurate classification for shotgun metagenomics sequencing projects. Bracken, released in 2016, then provided users with the ability to accurately estimate species or genus abundances using Kraken classification results. Kraken 2, which matches the accuracy and speed of Kraken 1, now supports 16S rRNA databases, allowing for direct comparisons to QIIME and similar systems. Here we show that, using the same simulated 16S rRNA metagenomic data as previous studies, Kraken 2 and Bracken are up to 300 times faster and also more accurate at 16S profiling than QIIME 2.

2013 ◽  
Vol 80 (1) ◽  
pp. 177-183 ◽  
Author(s):  
Lavane Kim ◽  
Eulyn Pagaling ◽  
Yi Y. Zuo ◽  
Tao Yan

ABSTRACTThe impact of substratum surface property change on biofilm community structure was investigated using laboratory biological aerated filter (BAF) reactors and molecular microbial community analysis. Two substratum surfaces that differed in surface properties were created via surface coating and used to develop biofilms in test (modified surface) and control (original surface) BAF reactors. Microbial community analysis by 16S rRNA gene-based PCR-denaturing gradient gel electrophoresis (DGGE) showed that the surface property change consistently resulted in distinct profiles of microbial populations during replicate reactor start-ups. Pyrosequencing of the bar-coded 16S rRNA gene amplicons surveyed more than 90% of the microbial diversity in the microbial communities and identified 72 unique bacterial species within 19 bacterial orders. Among the 19 orders of bacteria detected,BurkholderialesandRhodocyclalesof theBetaproteobacteriaclass were numerically dominant and accounted for 90.5 to 97.4% of the sequence reads, and their relative abundances in the test and control BAF reactors were different in consistent patterns during the two reactor start-ups. Three of the five dominant bacterial species also showed consistent relative abundance changes between the test and control BAF reactors. The different biofilm microbial communities led to different treatment efficiencies, with consistently higher total organic carbon (TOC) removal in the test reactor than in the control reactor. Further understanding of how surface properties affect biofilm microbial communities and functional performance would enable the rational design of new generations of substrata for the improvement of biofilm-based biological treatment processes.


2010 ◽  
Vol 76 (17) ◽  
pp. 5902-5910 ◽  
Author(s):  
D. S. Jones ◽  
D. J. Tobler ◽  
I. Schaperdoth ◽  
M. Mainiero ◽  
J. L. Macalady

ABSTRACT We performed a microbial community analysis of biofilms inhabiting thermal (35 to 50°C) waters more than 60 m below the ground surface near Acquasanta Terme, Italy. The groundwater hosting the biofilms has 400 to 830 μM sulfide, <10 μM O2, pH of 6.3 to 6.7, and specific conductivity of 8,500 to 10,500 μS/cm. Based on the results of 16S rRNA gene cloning and fluorescent in situ hybridization (FISH), the biofilms have low species richness, and lithoautotrophic (or possibly mixotrophic) Gamma- and Epsilonproteobacteria are the principle biofilm architects. Deltaproteobacteria sequences retrieved from the biofilms have <90% 16S rRNA similarity to their closest relatives in public databases and may represent novel sulfate-reducing bacteria. The Acquasanta biofilms share few species in common with Frasassi cave biofilms (13°C, 80 km distant) but have a similar community structure, with representatives in the same major clades. The ecological success of Sulfurovumales-group Epsilonproteobacteria in the Acquasanta biofilms is consistent with previous observations of their dominance in sulfidic cave waters with turbulent water flow and high dissolved sulfide/oxygen ratios.


2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Isamu Maeda ◽  
Mohammad Shohel Rana Siddiki ◽  
Tsutomu Nozawa-Takeda ◽  
Naoki Tsukahara ◽  
Yuri Tani ◽  
...  

Jungle Crows (Corvus macrorhynchos) prefer human habitats because of their versatility in feeding accompanied with human food consumption. Therefore, it is important from a public health viewpoint to characterize their intestinal microbiota. However, no studies have been involved in molecular characterization of the microbiota based on huge and reliable number of data acquisition. In this study, 16S rRNA gene-based microbial community analysis coupled with the next-generation DNA sequencing techniques was applied to the taxonomic classification of intestinal microbiome for three jungle crows. Clustering of the reads into 130 operational taxonomic units showed that at least 70% of analyzed sequences for each crow were highly homologous toEimeriasp., which belongs to the protozoan phylumApicomplexa. The microbiotas of three crows also contained potentially pathogenic bacteria with significant percentages, such as the generaCampylobacterandBrachyspira. Thus, the profiling of a large number of 16S rRNA gene sequences in crow intestinal microbiomes revealed the high-frequency existence or vestige of potentially pathogenic microorganisms.


2006 ◽  
Vol 4 (4) ◽  
pp. 32-37
Author(s):  
Elisaveta V Korostik ◽  
Alexander G Pinaev ◽  
Gulnar A Akhtemova ◽  
Evgeniy E Andronov

New universal 16S rRNa primers were constructed and tested. These primers allow identifying correct taxonomic position of bacterial isolates and were shown to be useful in microbial community studies. The primers enable to detect the vast majority of unique 16S rRNa gene sequences. In the study 160 restriction types were found in 16S rRNa clone library (190 clones).


2021 ◽  
Vol 18 (4) ◽  
pp. 733-743
Author(s):  
Doan Thi Nhung ◽  
Bui Van Ngoc

Recent advances in metagenomics and bioinformatics allow the robust analysis of the composition and abundance of microbial communities, functional genes, and their metabolic pathways. So far, there has been a variety of computational/statistical tools or software for analyzing microbiome, the common problems that occurred in its implementation are, however, the lack of synchronization and compatibility of output/input data formats between such software. To overcome these challenges, in this study context, we aim to apply the DADA2 pipeline (written in R programming language) instead of using a set of different bioinformatics tools to create our own workflow for microbial community analysis in a continuous and synchronous manner. For the first effort, we tried to investigate the composition and abundance of coral-associated bacteria using their 16S rRNA gene amplicon sequences. The workflow or framework includes the following steps: data processing, sequence clustering, taxonomic assignment, and data visualization. Moreover, we also like to catch readers’ attention to the information about bacterial communities living in the ocean as most marine microorganisms are unculturable, especially residing in coral reefs, namely, bacteria are associated with the coral Acropora tenuis in this case. The outcomes obtained in this study suggest that the DADA2 pipeline written in R programming language is one of the potential bioinformatics approaches in the context of microbiome analysis other than using various software. Besides, our modifications for the workflow execution help researchers to illustrate metagenomic data more easily and systematically, elucidate the composition, abundance, diversity, and relationship between microorganism communities as well as to develop other bioinformatic tools more effectively.


2019 ◽  
Author(s):  
Vanessa R. Marcelino ◽  
Philip T.L.C. Clausen ◽  
Jan P. Buchmann ◽  
Michelle Wille ◽  
Jonathan R. Iredell ◽  
...  

AbstractHigh-throughput sequencing of DNA and RNA from environmental and host-associated samples (metagenomics and metatranscriptomics) is a powerful tool to assess which organisms are present in a sample. Taxonomic identification software usually align individual short sequence reads to a reference database, sometimes containing taxa with complete genomes only. This is a challenging task given that different species can share identical sequence regions and complete genome sequences are only available for a fraction of organisms. A recently developed approach to map sequence reads to reference databases involves weighing all high scoring read-mappings to the data base as a whole to produce better-informed alignments. We used this novel concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. Using simulated fungal and bacterial metagenomes, we demonstrate that CCMetagen substantially outperforms other commonly used metagenome classifiers, attaining a 3 – 1580 fold increase in precision and a 2 – 922 fold increase in F1 scores for species-level classifications when compared to Kraken2, Centrifuge and KrakenUniq. CCMetagen is sufficiently fast and memory efficient to use the entire NCBI nucleotide collection (nt) as reference, enabling the assessment of species with incomplete genome sequence data from all biological kingdoms. Our pipeline efficiently produced a comprehensive overview of the microbiome of two biological data sets, including both eukaryotes and prokaryotes. CCMetagen is user-friendly and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.


2014 ◽  
Author(s):  
Jai Ram Rideout ◽  
Yan He ◽  
Jose Antonio Navas-Molina ◽  
William A Walters ◽  
Luke K Ursell ◽  
...  

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to “classic” open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, “classic” open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of “classic” open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “classic” open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME’s uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME’s OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.


Sign in / Sign up

Export Citation Format

Share Document