scholarly journals BugSeq 16S: NanoCLUST with Improved Consensus Sequence Classification

2021 ◽  
Author(s):  
Ana Jung ◽  
Samuel D Chorlton

NanoCLUST has enabled species-level taxonomic classification from noisy nanopore 16S sequencing data for BugSeq's users and the broader nanopore sequencing community. We noticed a high misclassification rate of NanoCLUST-derived consensus 16S sequences due to its use of BLAST top hit taxonomy assignment. We replaced the consensus sequence classifier of NanoCLUST with QIIME2's VSEARCH-based classifier to enable greater accuracy. We use mock microbial community and clinical 16S sequencing data to show that this replacement results in significantly improved nanopore 16S accuracy (over 5% recall and 19% precision), and make this new tool (BugSeq 16S) freely available for academic use at BugSeq.com/free.

2021 ◽  
Author(s):  
Andrew E. Schriefer ◽  
Brajendra Kumar ◽  
Avihai Zolty ◽  
Preetam R ◽  
Adam Didier ◽  
...  

The M-CAMP™ (Microbiome Computational Analysis for Multiomic Profiling) Cloud Platform was designed to provide users with an easy-to-use web interface to access best in class microbiome analysis tools. This interface allows bench scientists to conduct bioinformatic analysis on their samples and then download publication-ready graphics and reports. The core pipeline of the platform is the 16S-seq taxonomic classification algorithm which provides species-level classification of Illumina 16s sequencing. This algorithm uses a novel approach combining alignment and kmer based taxonomic classification methodologies to produce a highly accurate and comprehensive profile. Additionally, a comprehensive proprietary database combining reference sequences from multiple sources was curated and contains 18056 unique V3-V4 sequences covering 11527 species. The M-CAMPTM 16S taxonomic classification algorithm was validated on 52 sequencing samples from both public and in-house standard sample mixtures with known fractions. Compared to current popular public classification algorithms, our classification algorithm provides the most accurate species-level classification of 16S rRNA sequencing data.


2016 ◽  
Author(s):  
Chenhao Li ◽  
Kern Rei Chng ◽  
Jia Hui Esther Boey ◽  
Hui Qi Amanda Ng ◽  
Andreas Wilm ◽  
...  

Nanopore sequencing provides a rapid, cheap and portable real-time sequencing platform with the potential to revolutionize genomics. Several applications, including RNA-seq, haplotype sequencing and 16S sequencing, are however limited by its relatively high single read error rate (>10%). We present INC-Seq (Intramolecular-ligated Nanopore Consensus Sequencing) as a strategy for obtaining long and accurate nanopore reads starting with low input DNA. Applying INC-Seq for 16S rRNA based bacterial profiling generated full-length amplicon sequences with median accuracy >97%. INC-Seq reads enable accurate species-level classification, identification of species at 0.1% abundance and robust quantification of relative abundances, providing a cheap and effective approach for pathogen detection and microbiome profiling on the MinION system.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Benjamin J. Callahan ◽  
Dmitry Grinevich ◽  
Siddhartha Thakur ◽  
Michael A. Balamotis ◽  
Tuval Ben Yehezkel

Abstract Background Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Methods Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. Results LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. Conclusions The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2018 ◽  
Vol 6 (7) ◽  
Author(s):  
Annette Fagerlund ◽  
Solveig Langsrud ◽  
Birgitte Moen ◽  
Even Heir ◽  
Trond Møretrø

ABSTRACT Listeria monocytogenes is a foodborne pathogen that causes the often-fatal disease listeriosis. We present here the complete genome sequences of six L. monocytogenes isolates of sequence type 9 (ST9) collected from two different meat processing facilities in Norway. The genomes were assembled using Illumina and Nanopore sequencing data.


Author(s):  
Hannah Bolinger ◽  
David Tran ◽  
Kenneth Harary ◽  
George C. Paoli ◽  
Giselle Guron ◽  
...  

Traditional microbiological testing methods are slow, and many molecular-based techniques rely on culture-based enrichment to overcome low limits of detection. Recent advancements in sequencing technologies may make it possible to utilize machine learning (ML) to identify patterns in microbiome data to potentially predict the presence or absence of pathogens. In this study, 299 poultry rinsate samples from various points in the processing chain were analyzed to determine if microbiota could inform about a sample’s risk for containing Salmonella . Samples were culture confirmed as Salmonella -positive or -negative following modified USDA MLG protocols. The culture confirmation result was used as a reference to compare with 16S sequencing data. Pre-chill samples tested positive (71/82) at a higher frequency than post-chill samples (30/217) and contained greater microbial diversity. Due to their larger sample size, post-chill samples were analyzed more deeply. Analysis of variance (ANOVA) identified a significant effect of chilling on the number of genera (p<0.001), but analysis of similarities (ANOSIM) failed to provide evidence for microbial dissimilarity between pre- and post-chill samples (p=0.001, R=0.443). Various ML models were trained using post-chill samples to predict if a sample contained Salmonella based on the samples’ microbiota pre-enrichment. The optimal model was a Random Forest-based model with a performance as follows: accuracy (88%), sensitivity (85%), specificity (90%). While the algorithms described in this paper are prototypes, these risk-based algorithms demonstrate the potential and need for further studies to provide insight alongside diagnostic tests. Combining risk-based information with diagnostic tools can help poultry processors make informed decisions to help identify and prevent the spread of Salmonella . These data add to the growing body of literature exploring novel ways to utilize microbiome data for predictive food safety.


2020 ◽  
Vol 367 (11) ◽  
Author(s):  
Andrea Fasolo ◽  
Laura Treu ◽  
Piergiorgio Stevanato ◽  
Giuseppe Concheri ◽  
Stefano Campanaro ◽  
...  

ABSTRACT Microbial metabarcoding is the standard approach to assess communities’ diversity. However reports are often limited to simple OTU abundances for each phylum, giving rather one-dimensional views of microbial assemblages, overlooking other accessible aspects. The first is masked by databases incompleteness; OTU picking involves clustering at 97% (near-species) sequence identity, but different OTUs regularly end up under a same taxon name. When expressing diversity as number of obtained taxonomical names, a large portion of the real diversity lying within the data remains underestimated. Using the 16S sequencing results of an environmental transect across a gradient of 17 coastal habitats we first extracted the number of OTUs hidden under the same name. Further, we observed which was the deepest rank yielded by annotation, revealing for which microbial groups are we missing most knowledge. Data were then used to infer an evolutionary aspect: what is, in each phylum the success of the present time individuals (abundances for each OTU) in relation to their prior evolutionary success in differentiation (number of OTUs). This information reveals whether the past speciation/diversification force is matched by the present competitiveness in reproduction/persistence. The final layer explored is functional diversity, i.e. abundances of groups involved in specific environmental processes.


2017 ◽  
Author(s):  
Zhemin Zhou ◽  
Nina Luhmann ◽  
Nabil-Fareed Alikhan ◽  
Christopher Quince ◽  
Mark Achtman

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.


2020 ◽  
Author(s):  
Timour Baslan ◽  
Sam Kovaka ◽  
Fritz J. Sedlazeck ◽  
Yanming Zhang ◽  
Robert Wappel ◽  
...  

ABSTRACTGenome copy number is an important source of genetic variation in health and disease. In cancer, clinically actionable Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in terms of the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms. This represents a challenge for applications that require high read counts such as CNA inference. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that sequencing short DNA molecules reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for the sequencing of relatively short DNA molecules on nanopore devices with applications in research and medicine, that include but are not limited to, CNAs.


Sign in / Sign up

Export Citation Format

Share Document