NanoRTax, a real-time pipeline for taxonomic and diversity analysis of nanopore 16S rRNA amplicon sequencing data

Abstract The study of microbial communities and their applications have been leveraged by the advances in sequencing techniques and bioinformatics tools. The Oxford Nanopore Technologies long read sequencing by nanopores provides a portable and cost-efficient platform for sequencing assays opening the possibility of its application outside specialized environments and real-time analysis of data. To complement the existing efficient library preparation protocol with a streamlined analytic workflow, here we present NanoRTax, a nextflow pipeline for nanopore 16S rRNA amplicon data that features state-of-art taxonomic classification tools and real-time capability. The pipeline is paired with a web-based visual interface to enable user-friendly inspections of the experiment in progress.

Download Full-text

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Nature Communications ◽

10.1038/s41467-021-22203-2 ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

Caitlin M. Singleton ◽

Francesca Petriglieri ◽

Jannie M. Kristensen ◽

Rasmus H. Kirkegaard ◽

Thomas Y. Michaelsen ◽

...

Keyword(s):

16S Rrna ◽

Wastewater Treatment Plants ◽

In Situ Hybridisation ◽

Amplicon Sequencing ◽

Rrna Genes ◽

Fluorescence In Situ Hybridisation ◽

Sequencing Data ◽

High Quality ◽

16S Rrna Amplicon Sequencing ◽

Long Read

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.

Download Full-text

Dramatic differences in gut bacterial densities help to explain the relationship between diet and habitat in rainforest ants

10.1101/114512 ◽

2017 ◽

Cited By ~ 4

Author(s):

Jon G Sanders ◽

Piotr Lukasik ◽

Megan E Frederickson ◽

Jacob A Russell ◽

Ryuichi Koga ◽

...

Keyword(s):

16S Rrna ◽

Microbial Diversity ◽

Tropical Rainforest ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Lowland Tropical Forest ◽

16S Rrna Amplicon Sequencing ◽

Microbial Symbionts ◽

Microbial Symbiosis ◽

Diversity Profiles

AbstractAbundance is a key parameter in microbial ecology, and important to estimates of potential metabolite flux, impacts of dispersal, and sensitivity of samples to technical biases such as laboratory contamination. However, modern amplicon-based sequencing techniques by themselves typically provide no information about the absolute abundance of microbes. Here, we use fluorescence microscopy and quantitative PCR as independent estimates of microbial abundance to test the hypothesis that microbial symbionts have enabled ants to dominate tropical rainforest canopies by facilitating herbivorous diets, and compare these methods to microbial diversity profiles from 16S rRNA amplicon sequencing. Through a systematic survey of ants from a lowland tropical forest, we show that the density of gut microbiota varies across several orders of magnitude among ant lineages, with median individuals from many genera only marginally above detection limits. Supporting the hypothesis that microbial symbiosis is important to dominance in the canopy, we find that the abundance of gut bacteria is positively correlated with stable isotope proxies of herbivory among canopy-dwelling ants, but not among ground-dwelling ants. Notably, these broad findings are much more evident in the quantitative data than in the 16S rRNA sequencing data. Our results help to resolve a longstanding question in tropical rainforest ecology, and have broad implications for the interpretation of sequence-based surveys of microbial diversity.

Download Full-text

Evaluation of the microbial community structure of potable water samples from occupied and unoccupied buildings using16S rRNA amplicon sequencing

10.1101/2020.07.17.209346 ◽

2020 ◽

Author(s):

Kimothy L Smith ◽

Howard A Shuman ◽

Douglas Findeisen

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Water Samples ◽

Ad Hoc ◽

Microbial Community Composition ◽

Amplicon Sequencing ◽

The Other ◽

Water Usage ◽

16S Rrna Amplicon Sequencing ◽

Oxford Nanopore

AbstractWe conducted two studies of water samples from buildings with normal occupancy and water usage compared to water from buildings that were unoccupied with little or no water usage due to the COVID-19 shutdown. Study 1 had 52 water samples obtained ad hoc from buildings in four metropolitan locations in different states in the US and a range of building types. Study 2 had 36 water samples obtained from two buildings in one metropolitan location with matched water sample types. One of the buildings had been continuously occupied, and the other substantially vacant for approximately 3 months. All water samples were analyzed using 16S rRNA amplicon sequencing with a MinION from Oxford Nanopore Technologies. More than 127 genera of bacteria were identified, including genera with members that are known to include more than 50 putative frank and opportunistic pathogens. While specific results varied among sample locations, 16S rRNA amplicon abundance and the diversity of bacteria were higher in water samples from unoccupied buildings than normally occupied buildings as was the abundance of sequenced amplicons of genera known to include pathogenic bacterial members. In both studies Legionella amplicon abundance was relatively small compared to the abundance of the other bacteria in the samples. Indeed, when present, the relative abundance of Legionella amplicons was lower in samples from unoccupied buildings. Legionella did not predominate in any of the water samples and were found, on average, in 9.6% of samples in Study 1 and 8.3% of samples in Study 2.SynopsisComparison of microbial community composition in the plumbing of occupied and unoccupied buildings during the COVID-19 pandemic shutdown.

Download Full-text

A Bioinformatics Analysis workflow for 16S rRNA Amplicon Sequencing data v1 (protocols.io.bntpmemn)

protocols.io ◽

10.17504/protocols.io.bntpmemn ◽

2020 ◽

Cited By ~ 1

Author(s):

Lilan Hao

Keyword(s):

16S Rrna ◽

Bioinformatics Analysis ◽

Amplicon Sequencing ◽

Sequencing Data ◽

16S Rrna Amplicon Sequencing ◽

Analysis Workflow

Download Full-text

Real-time monitoring and analysis of SARS-CoV-2 nanopore sequencing with minoTour.

10.1101/2021.09.13.459777 ◽

2021 ◽

Author(s):

Rory James Munro ◽

Nadine Holmes ◽

Christopher Moore ◽

Matthew Carlile ◽

Alex Payne ◽

...

Keyword(s):

Real Time ◽

Phylogenetic Trees ◽

Sequencing Data ◽

Time Analysis ◽

Real Time Analysis ◽

Oxford Nanopore ◽

Individual Snps ◽

Wide Range ◽

Time Required ◽

Viral Sequencing

Motivation: The ongoing SARS-CoV-2 pandemic has demonstrated the utility of real-time analysis of sequencing data, with a wide range of databases and resources for analysis now available. Here we show how the real-time nature of Oxford Nanopore Technologies sequencers can accelerate consensus generation, lineage and variant status assignment. We exploit the fact that multiplexed viral sequencing libraries quickly generate sufficient data for the majority of samples, with diminishing returns on remaining samples as the sequencing run progresses. We demonstrate methods to determine when a sequencing run has passed this point in order to reduce the time required and cost of sequencing. Results: We extended MinoTour, our real-time analysis and monitoring platform for nanopore sequencers, to provide SARS-CoV2 analysis using ARTIC network pipelines. We additionally developed an algorithm to predict which samples will achieve sufficient coverage, automatically running the ARTIC medaka informatics pipeline once specific coverage thresholds have been reached on these samples. After testing on run data, we find significant run time savings are possible, enabling flow cells to be used more efficiently and enabling higher throughput data analysis. The resultant consensus genomes are assigned both PANGO lineage and variant status as defined by Public Health England. Samples from within individual runs are used to generate phylogenetic trees incorporating optional background samples as well as summaries of individual SNPs. As minoTour uses ARTIC pipelines, new primer schemes and pathogens can be added to allow minoTour to aid in real-time analysis of pathogens in the future.

Download Full-text

NanoR: a user-friendly R package to analyze and compare nanopore sequencing data

10.1101/514232 ◽

2019 ◽

Author(s):

Davide Bolognini ◽

Niccolò Bartalucci ◽

Alessandra Mingrino ◽

Alessandro Maria Vannucchi ◽

Alberto Magi

Keyword(s):

Real Time ◽

Low Cost ◽

R Package ◽

Sequencing Data ◽

High Performing ◽

Dna And Rna ◽

Oxford Nanopore ◽

The One ◽

User Friendly ◽

Oxford Nanopore Technologies

AbstractMinION and GridION X5 from Oxford Nanopore Technologies are devices for real-time DNA and RNA sequencing. On the one hand, MinION is the only real-time, low cost and portable sequencing device and, thanks to its unique properties, is becoming more and more popular among biologists; on the other, GridION X5, mainly for its costs, is less widespread but highly suitable for researchers with large sequencing projects. Despite the fact that Oxford Nanopore Technologies’ devices have been increasingly used in the last few years, there is a lack of high-performing and user-friendly tools to handle the data outputted by both MinION and GridION X5 platforms. Here we present NanoR, a cross-platform R package designed with the purpose to simplify and improve nanopore data visualization. Indeed, NanoR is built on few functions but overcomes the capabilities of existing tools to extract meaningful informations from MinION sequencing data; in addition, as exclusive features, NanoR can deal with GridION X5 sequencing outputs and allows comparison of both MinION and GridION X5 sequencing data in one command. NanoR is released as free package for R at https://github.com/davidebolo1993/NanoR.

Download Full-text

LotuS2: An ultrafast and highly accurate tool for amplicon sequencing analysis

10.1101/2021.12.24.474111 ◽

2021 ◽

Author(s):

Ezgi Ozkurt ◽

Joachim Fritscher ◽

Nicola Soranzo ◽

Duncan Y.K. Ng ◽

Robert P. Davey ◽

...

Keyword(s):

Data Analysis ◽

Clustering Algorithms ◽

Amplicon Sequencing ◽

Sequencing Analysis ◽

Alpha And Beta Diversity ◽

High Data ◽

Data Usage ◽

Long Read ◽

Cost Efficient ◽

User Friendly

Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (Less OTU Scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences. Results: In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines - yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxa composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified genera and species (98% and 57%, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reconstructed 16S sequences. Conclusion: LotuS2 is a lightweight and user-friendly pipeline that is fast, precise and streamlined. High data usage rates and reliability enable high-throughput microbiome analysis in minutes. Availability: LotuS2 is available from GitHub, conda or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/.

Download Full-text

Machine Learning-assisted Identification of Bioindicators Predicts Medium-chain Carboxylate Production Performance of an Anaerobic Mixed Culture

10.21203/rs.3.rs-78714/v1 ◽

2020 ◽

Author(s):

Bin Liu ◽

Heike Sträuber ◽

Joao Saraiva ◽

Hauke Harms ◽

Sandra Godinho Silva ◽

...

Keyword(s):

Machine Learning ◽

16S Rrna ◽

Retention Time ◽

Hydraulic Retention Time ◽

Amplicon Sequencing ◽

Production Performance ◽

Chain Elongation ◽

Sequencing Data ◽

Medium Chain ◽

16S Rrna Amplicon Sequencing

Abstract Background: The ability to quantitatively predict ecophysiological functions of microbial communities provides an important step to engineer microbiota for desired functions related to specific biochemical conversions. Here, we present the quantitative prediction of medium-chain carboxylate production in two continuous anaerobic bioreactors from 16S rRNA gene dynamics in enrichment cultures. Results: By progressively shortening the hydraulic retention time from 8 days to 2 days with different temporal schemes in both bioreactors operated for 211 days, we achieved higher productivities and yields of the target products n-caproate and n-caprylate. The datasets generated from each bioreactor were applied independently for training and testing in machine learning. A predictive model was generated by employing the random forest algorithm using 16S rRNA amplicon sequencing data. More than 90% accuracy in the prediction of n-caproate and n-caprylate productivities was achieved. Four inferred bioindicators belonging to the genera Olsenella, Lactobacillus, Syntrophococcus and Clostridium IV suggest their relevance to the higher carboxylate productivity at shorter hydraulic retention time. The recovery of metagenome-assembled genomes of these bioindicators confirmed their genetic potential to perform key steps of medium-chain carboxylate production.Conclusions: Shortening the hydraulic retention time of the continuous bioreactor systems allows to shape the communities with desired chain elongation functions. Using machine-learning, we demonstrated that 16S rRNA amplicon sequencing data can be used to predict bioreactor process performance quantitatively and accurately. Characterising and harnessing bioindicators holds promise to manage reactor microbiota towards selection of the target processes. Our mathematical framework is transferrable to other ecosystem processes and 3 microbial systems where community dynamics is linked to key functions. The general methodology can be adapted to data types of other functional categories such as genes, transcripts, proteins or metabolites.

Download Full-text

NGSpeciesID: DNA barcode and amplicon consensus generation from long-read sequencing data

10.22541/au.160262406.62842291/v2 ◽

2020 ◽

Author(s):

Kristoffer Sahlin ◽

Marisa Lim ◽

Stefan Prost

Keyword(s):

High Throughput Sequencing ◽

Dna Barcode ◽

Amplicon Sequencing ◽

Error Rates ◽

Sequencing Data ◽

Sequencing Platform ◽

Consensus Sequences ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read

Third generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), have gained popularity over the last years. These platforms can generate millions of long read sequences. This is not only advantageous for genome sequencing projects, but also for amplicon-based high-throughput sequencing experiments, such as DNA barcoding. However, the relatively high error rates associated with these technologies still pose challenges for generating high quality consensus sequences. Here we present NGSpeciesID, a program which can generate highly accurate consensus sequences from long-read amplicon sequencing technologies, including ONT and PacBio. The tool includes clustering of the reads to help filter out contaminants or reads with high error rates and employs polishing strategies specific to the appropriate sequencing platform. We show that NGSpeciesID produces consensus sequences with improved usability by minimizing preprocessing and software installation and scalability by enabling rapid processing of hundreds to thousands of samples, while maintaining similar consensus accuracy as current pipelines

Download Full-text

Bacterial Diversity of Breast Milk in Healthy Spanish Women: Evolution from Birth to Five Years Postpartum

Nutrients ◽

10.3390/nu13072414 ◽

2021 ◽

Vol 13 (7) ◽

pp. 2414

Author(s):

Laura Sanjulián ◽

Alexandre Lamas ◽

Rocío Barreiro ◽

Alberto Cepeda ◽

Cristina A. Fente ◽

...

Keyword(s):

Breast Milk ◽

16S Rrna ◽

Human Milk ◽

Alpha Diversity ◽

Amplicon Sequencing ◽

Maternal Body Mass Index ◽

16S Rrna Amplicon Sequencing ◽

Spanish Women ◽

Calcium Magnesium ◽

Abundant Genus

The objective of this work was to characterize the microbiota of breast milk in healthy Spanish mothers and to investigate the effects of lactation time on its diversity. A total of ninety-nine human milk samples were collected from healthy Spanish women and were assessed by means of next-generation sequencing of 16S rRNA amplicons and by qPCR. Firmicutes was the most abundant phylum, followed by Bacteroidetes, Actinobacteria, and Proteobacteria. Accordingly, Streptococcus was the most abundant genus. Lactation time showed a strong influence in milk microbiota, positively correlating with Actinobacteria and Bacteroidetes, while Firmicutes was relatively constant over lactation. 16S rRNA amplicon sequencing showed that the highest alpha-diversity was found in samples of prolonged lactation, along with wider differences between individuals. As for milk nutrients, calcium, magnesium, and selenium levels were potentially associated with Streptococcus and Staphylococcus abundance. Additionally, Proteobacteria was positively correlated with docosahexaenoic acid (DHA) levels in breast milk, and Staphylococcus with conjugated linoleic acid. Conversely, Streptococcus and trans-palmitoleic acid showed a negative association. Other factors such as maternal body mass index or diet also showed an influence on the structure of these microbial communities. Overall, human milk in Spanish mothers appeared to be a complex niche shaped by host factors and by its own nutrients, increasing in diversity over time.

Download Full-text