scholarly journals NanoCLUST: a species-level analysis of 16S rRNA nanopore sequencing data

Author(s):  
Héctor Rodríguez-Pérez ◽  
Laura Ciuffreda ◽  
Carlos Flores

Abstract Summary NanoCLUST is an analysis pipeline for the classification of amplicon-based full-length 16S rRNA nanopore reads. It is characterized by an unsupervised read clustering step, based on Uniform Manifold Approximation and Projection (UMAP), followed by the construction of a polished read and subsequent Blast classification. Here, we demonstrate that NanoCLUST performs better than other state-of-the-art software in the characterization of two commercial mock communities, enabling accurate bacterial identification and abundance profile estimation at species-level resolution. Availability and implementation Source code, test data and documentation of NanoCLUST are freely available at https://github.com/genomicsITER/NanoCLUST under MIT License. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Héctor Rodríguez-Pérez ◽  
Laura Ciuffreda ◽  
Carlos Flores

AbstractSummaryNanoCLUST is an analysis pipeline for classification of amplicon-based full-length 16S rRNA nanopore reads. It is characterized by an unsupervised read clustering step, based on Uniform Manifold Approximation and Projection (UMAP), followed by the construction of a polished read and subsequent Blast classification. Here we demonstrate that NanoCLUST performs better than other state-of-the-art software in the characterization of two commercial mock communities, enabling accurate bacterial identification and abundance profile estimation at species level resolution.Availability and implementationSource code, test data and documentation of NanoCLUST is freely available at https://github.com/genomicsITER/NanoCLUST under MIT [email protected]


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


2021 ◽  
Author(s):  
Andrew E. Schriefer ◽  
Brajendra Kumar ◽  
Avihai Zolty ◽  
Preetam R ◽  
Adam Didier ◽  
...  

The M-CAMP™ (Microbiome Computational Analysis for Multiomic Profiling) Cloud Platform was designed to provide users with an easy-to-use web interface to access best in class microbiome analysis tools. This interface allows bench scientists to conduct bioinformatic analysis on their samples and then download publication-ready graphics and reports. The core pipeline of the platform is the 16S-seq taxonomic classification algorithm which provides species-level classification of Illumina 16s sequencing. This algorithm uses a novel approach combining alignment and kmer based taxonomic classification methodologies to produce a highly accurate and comprehensive profile. Additionally, a comprehensive proprietary database combining reference sequences from multiple sources was curated and contains 18056 unique V3-V4 sequences covering 11527 species. The M-CAMPTM 16S taxonomic classification algorithm was validated on 52 sequencing samples from both public and in-house standard sample mixtures with known fractions. Compared to current popular public classification algorithms, our classification algorithm provides the most accurate species-level classification of 16S rRNA sequencing data.


2019 ◽  
Author(s):  
Yu Liu ◽  
Paul W Bible ◽  
Bin Zou ◽  
Qiaoxing Liang ◽  
Cong Dong ◽  
...  

Abstract Motivation Microbiome analyses of clinical samples with low microbial biomass are challenging because of the very small quantities of microbial DNA relative to the human host, ubiquitous contaminating DNA in sequencing experiments and the large and rapidly growing microbial reference databases. Results We present computational subtraction-based microbiome discovery (CSMD), a bioinformatics pipeline specifically developed to generate accurate species-level microbiome profiles for clinical samples with low microbial loads. CSMD applies strategies for the maximal elimination of host sequences with minimal loss of microbial signal and effectively detects microorganisms present in the sample with minimal false positives using a stepwise convergent solution. CSMD was benchmarked in a comparative evaluation with other classic tools on previously published well-characterized datasets. It showed higher sensitivity and specificity in host sequence removal and higher specificity in microbial identification, which led to more accurate abundance estimation. All these features are integrated into a free and easy-to-use tool. Additionally, CSMD applied to cell-free plasma DNA showed that microbial diversity within these samples is substantially broader than previously believed. Availability and implementation CSMD is freely available at https://github.com/liuyu8721/csmd. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Sebastian Deorowicz

AbstractMotivationThe amount of genomic data that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives.ResultsWe present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world. The compression ratios are often tens of percent better than offered by the state-of-the-art tools.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.


2018 ◽  
Author(s):  
Anna Cusco ◽  
Carlotta Catozzi ◽  
Joaquim Vines ◽  
Armand Sanchez ◽  
Olga Francino

Background: Profiling microbiome on low biomass samples is challenging for metagenomics since these samples are prone to present DNA from other sources, such as the host or the environment. The usual approach is sequencing specific hypervariable regions of the 16S rRNA gene, which fails to assign taxonomy to genus and species level. Here, we aim to assess long-amplicon PCR-based approaches for assigning taxonomy at the genus and species level. We use Nanopore sequencing with two different markers: full-length 16S rRNA (~1,500 bp) and the whole rrn operon (16S rRNA gene - ITS - 23S rRNA gene; 4,500 bp). Methods: We sequenced a clinical isolate of Staphylococcus pseudintermedius, two mock communities (HM-783D, Bei Resources; D6306, ZymoBIOMICS) and two pools of low-biomass samples (dog skin). Nanopore sequencing was performed on MinION (Oxford Nanopore Technologies) using 1D PCR barcoding kit. Sequences were pre-processed, and data were analyzed using WIMP workflow on EPI2ME (ONT) or Minimap2 software with rrn database. Results: Full-length 16S rRNA and the rrn operon retrieved the microbiota composition from the bacterial isolate, the mock communities and the complex skin samples, even at the genus and species level. For Staphylococcus pseudintermedius isolate, when using EPI2ME, the amplicons were assigned to the correct bacterial species in ~98% of the cases with rrn operon as the marker, and ~68% of the cases with 16S rRNA gene respectively. In both skin microbiota samples, we detected many species with an environmental origin. In chin, we found different Pseudomonas species in high abundance, whereas in the dorsal skin there were more taxa with lower abundances. Conclusions: Both full-length 16S rRNA and the rrn operon retrieved the microbiota composition of simple and complex microbial communities, even from the low-biomass samples such as dog skin. For an increased resolution at the species level, rrn operon would be the best choice.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yoshiyuki Matsuo ◽  
Shinnosuke Komiya ◽  
Yoshiaki Yasumizu ◽  
Yuki Yasuoka ◽  
Katsura Mizushima ◽  
...  

Abstract Background Species-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples. Results We modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition. Conclusions Our present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene.


Author(s):  
Bo Zhang ◽  
Matthew Brock ◽  
Carlos Arana ◽  
Chaitanya Dende ◽  
Nicolai Stanislas van Oers ◽  
...  

Bead-beating within a DNA extraction protocol is critical for complete microbial cell lysis and accurate assessment of the abundance and composition of the microbiome. While the impact of bead-beating on the recovery of OTUs at the phylum and class level have been studied, its influence on species-level microbiome recovery is not clear. Recent advances in sequencing technology has allowed species-level resolution of the microbiome using full length 16S rRNA gene sequencing instead of smaller amplicons that only capture a few hypervariable regions of the gene. We sequenced the v3-v4 hypervariable region as well as the full length 16S rRNA gene in mouse and human stool samples and discovered major clusters of gut bacteria that exhibit different levels of sensitivity to bead-beating treatment. Full length 16S rRNA gene sequencing unraveled vast species diversity in the mouse and human gut microbiome and enabled characterization of several unclassified OTUs in amplicon data. Many species of major gut commensals such as Bacteroides, Lactobacillus, Blautia, Clostridium, Escherichia, Roseburia, Helicobacter, and Ruminococcus were identified. Interestingly, v3-v4 amplicon data classified about 50% of Ruminococcus reads as Ruminococcus gnavus species which showed maximum abundance in a 9 min beaten sample. However, the remaining 50% of reads could not be assigned to any species. Full length 16S rRNA gene sequencing data showed that the majority of the unclassified reads were Ruminococcus albus species which unlike R. gnavus showed maximum recovery in the unbeaten sample instead. Furthermore, we found that the Blautia hominis and Streptococcus parasanguinis species were differently sensitive to bead-beating treatment than the rest of the species in these genera. Thus, the present study demonstrates species level variations in sensitivity to bead-beating treatment that could only be resolved with full length 16S rRNA sequencing. This study identifies species of common gut commensals and potential pathogens that require minimum (0-1 min) or extensive (4-9 min) bead-beating for their maximal recovery.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1755 ◽  
Author(s):  
Anna Cuscó ◽  
Carlotta Catozzi ◽  
Joaquim Viñes ◽  
Armand Sanchez ◽  
Olga Francino

Background: Profiling the microbiome of low-biomass samples is challenging for metagenomics since these samples are prone to contain DNA from other sources (e.g. host or environment). The usual approach is sequencing short regions of the 16S rRNA gene, which fails to assign taxonomy to genus and species level. To achieve an increased taxonomic resolution, we aim to develop long-amplicon PCR-based approaches using Nanopore sequencing. We assessed two different genetic markers: the full-length 16S rRNA (~1,500 bp) and the 16S-ITS-23S region from the rrn operon (4,300 bp). Methods: We sequenced a clinical isolate of Staphylococcus pseudintermedius, two mock communities and two pools of low-biomass samples (dog skin). Nanopore sequencing was performed on MinION™ using the 1D PCR barcoding kit. Sequences were pre-processed, and data were analyzed using EPI2ME or Minimap2 with rrn database. Consensus sequences of the 16S-ITS-23S genetic marker were obtained using canu. Results: The full-length 16S rRNA and the 16S-ITS-23S region of the rrn operon were used to retrieve the microbiota composition of the samples at the genus and species level. For the Staphylococcus pseudintermedius isolate, the amplicons were assigned to the correct bacterial species in ~98% of the cases with the16S-ITS-23S genetic marker, and in ~68%, with the 16S rRNA gene when using EPI2ME. Using mock communities, we found that the full-length 16S rRNA gene represented better the abundances of a microbial community; whereas, 16S-ITS-23S obtained better resolution at the species level. Finally, we characterized low-biomass skin microbiota samples and detected species with an environmental origin. Conclusions: Both full-length 16S rRNA and the 16S-ITS-23S of the rrn operon retrieved the microbiota composition of simple and complex microbial communities, even from the low-biomass samples such as dog skin. For an increased resolution at the species level, targeting the 16S-ITS-23S of the rrn operon would be the best choice.


Sign in / Sign up

Export Citation Format

Share Document