microeco: An R package for data mining in microbial community ecology

Abstract A large amount of sequencing data is produced in microbial community ecology studies using the high-throughput sequencing technique, especially amplicon-sequencing-based community data. After conducting the initial bioinformatic analysis of amplicon sequencing data, performing the subsequent statistics and data mining based on the operational taxonomic unit and taxonomic assignment tables is still complicated and time-consuming. To address this problem, we present an integrated R package-‘microeco’ as an analysis pipeline for treating microbial community and environmental data. This package was developed based on the R6 class system and combines a series of commonly used and advanced approaches in microbial community ecology research. The package includes classes for data preprocessing, taxa abundance plotting, venn diagram, alpha diversity analysis, beta diversity analysis, differential abundance test and indicator taxon analysis, environmental data analysis, null model analysis, network analysis and functional analysis. Each class is designed to provide a set of approaches that can be easily accessible to users. Compared with other R packages in the microbial ecology field, the microeco package is fast, flexible and modularized to use, and provides powerful and convenient tools for researchers. The microeco package can be installed from CRAN (The Comprehensive R Archive Network) or github (https://github.com/ChiLiubio/microeco).

Download Full-text

Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities

Scientific Reports ◽

10.1038/s41598-021-01636-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ellen S. Cameron ◽

Philip J. Schmidt ◽

Benjamin J.-M. Tremblay ◽

Monica B. Emelko ◽

Kirsten M. Müller

Keyword(s):

Microbial Community ◽

Pathogenic Bacteria ◽

Graphical Representation ◽

Community Analysis ◽

Amplicon Sequencing ◽

Microbial Community Analysis ◽

Next Generation Sequencing Data ◽

Diversity Analysis ◽

Sequencing Data ◽

Library Size

AbstractAmplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.

Download Full-text

Microbial (Community) Ecology

Oxford Bibliographies Online Datasets ◽

10.1093/obo/9780199830060-0153 ◽

2016 ◽

Author(s):

Sang-Hoon Lee ◽

Ashley Shade

Keyword(s):

Microbial Community ◽

Community Ecology ◽

Microbial Community Ecology

Download Full-text

A high-resolution pipeline for 16S-sequencing identifies bacterial strains in human microbiome

10.1101/565572 ◽

2019 ◽

Cited By ~ 1

Author(s):

Igor Segota ◽

Tao Long

Keyword(s):

Bacterial Species ◽

Human Microbiome ◽

Amplicon Sequencing ◽

R Package ◽

Strain Level ◽

Sequencing Data ◽

Bacterial Strains ◽

16S Sequencing ◽

16S Amplicon Sequencing ◽

Sequencing Data Analysis

We developed a High-resolution Microbial Analysis Pipeline (HiMAP) for 16S amplicon sequencing data analysis, aiming at bacterial species or strain-level identification from human microbiome to enable experimental validation for causal effects of the associated bacterial strains on health and diseases. HiMAP achieved higher accuracy in identifying species in human microbiome mock community than other pipelines. HiMAP identified majority of the species, with strain-level resolution wherever possible, as detected by whole genome shotgun sequencing using MetaPhlAn2 and reported comparable relative abundances. HiMAP is an open-source R package available at https://github.com/taolonglab/himap.

Download Full-text

Ensuring that fundamentals of quantitative microbiology are reflected in microbial diversity analyses based on next-generation sequencing

10.1101/2021.06.19.449110 ◽

2021 ◽

Author(s):

Philip J Schmidt ◽

Ellen S Cameron ◽

Kirsten M Müller ◽

Monica B Emelko

Keyword(s):

Single Point ◽

Amplicon Sequencing ◽

Shannon Index ◽

Most Probable Number ◽

Diversity Analysis ◽

Sequencing Data ◽

Sample Collection ◽

Library Size ◽

Probable Number ◽

Microbiological Methods

Diversity analysis of amplicon sequencing data is mainly limited to plug-in estimates calculated using normalized data to obtain a single value of an alpha diversity metric or a single point on a beta diversity ordination plot for each sample. As recognized for count data generated using classical microbiological methods, read counts obtained from a sample are random data linked to source properties by a probabilistic process. Thus, diversity analysis has focused on diversity of (normalized) samples rather than probabilistic inference about source diversity. This study applies fundamentals of statistical analysis for quantitative microbiology (e.g., microscopy, plating, most probable number methods) to sample collection and processing procedures of amplicon sequencing methods to facilitate inference reflecting the probabilistic nature of such data and evaluation of uncertainty in diversity metrics. Types of random error are described and clustering of microorganisms in the source, differential analytical recovery during sample processing, and amplification are found to invalidate a multinomial relative abundance model. The zeros often abounding in amplicon sequencing data and their implications are addressed, and Bayesian analysis is applied to estimate the source Shannon index given unnormalized data (both simulated and real). Inference about source diversity is found to require knowledge of the exact number of unique variants in the source, which is practically unknowable due to library size limitations and the inability to differentiate zeros corresponding to variants that are actually absent in the source from zeros corresponding to variants that were merely not detected. Given these problems with estimation of diversity in the source even when the basic multinomial model is valid, sample-level diversity analysis approaches are discussed.

Download Full-text

To rarefy or not to rarefy: Enhancing microbial community analysis through next-generation sequencing

10.1101/2020.09.09.290049 ◽

2020 ◽

Author(s):

Ellen S. Cameron ◽

Philip J. Schmidt ◽

Benjamin J.-M. Tremblay ◽

Monica B. Emelko ◽

Kirsten M. Müller

Keyword(s):

Microbial Community ◽

Community Analysis ◽

Amplicon Sequencing ◽

Microbial Community Analysis ◽

List Type ◽

Sequencing Data ◽

Water And Wastewater Treatment ◽

Library Size ◽

Size Selection ◽

The Impact

AbstractThe application of amplicon sequencing in water research provides a rapid and sensitive technique for microbial community analysis in a variety of environments ranging from freshwater lakes to water and wastewater treatment plants. It has revolutionized our ability to study DNA collected from environmental samples by eliminating the challenges associated with lab cultivation and taxonomic identification. DNA sequencing data consist of discrete counts of sequence reads, the total number of which is the library size. Samples may have different library sizes and thus, a normalization technique is required to meaningfully compare them. The process of randomly subsampling sequences to a selected normalized library size from the sample library—rarefying—is one such normalization technique. However, rarefying has been criticized as a normalization technique because data can be omitted through the exclusion of either excess sequences or entire samples, depending on the rarefied library size selected. Although it has been suggested that rarefying should be avoided altogether, we propose that repeatedly rarefying enables (i) characterization of the variation introduced to diversity analyses by this random subsampling and (ii) selection of smaller library sizes where necessary to incorporate all samples in the analysis. Rarefying may be a statistically valid normalization technique, but researchers should evaluate their data to make appropriate decisions regarding library size selection and subsampling type. The impact of normalized library size selection and rarefying with or without replacement in diversity analyses were evaluated herein.Highlights▪ Amplicon sequencing technology for environmental water samples is reviewed▪ Sequencing data must be normalized to allow comparison in diversity analyses▪ Rarefying normalizes library sizes by subsampling from observed sequences▪ Criticisms of data loss through rarefying can be resolved by rarefying repeatedly▪ Rarefying repeatedly characterizes errors introduced by subsampling sequences

Download Full-text

rANOMALY: AmplicoN wOrkflow for Microbial community AnaLYsis

F1000Research ◽

10.12688/f1000research.27268.1 ◽

2021 ◽

Vol 10 ◽

pp. 7

Author(s):

Sebastien Theil ◽

Etienne Rifa

Keyword(s):

Data Analysis ◽

Microbial Community ◽

Statistical Tests ◽

Marker Gene ◽

R Package ◽

Microbial Community Analysis ◽

Differential Analysis ◽

Sequencing Data ◽

Statistical Validation ◽

Bioinformatic Tools

Bioinformatic tools for marker gene sequencing data analysis are continuously and rapidly evolving, thus integrating most recent techniques and tools is challenging. We present an R package for data analysis of 16S and ITS amplicons based sequencing. This workflow is based on several R functions and performs automatic treatments from fastq sequence files to diversity and differential analysis with statistical validation. The main purpose of this package is to automate bioinformatic analysis, ensure reproducibility between projects, and to be flexible enough to quickly integrate new bioinformatic tools or statistical methods. rANOMALY is an easy to install and customizable R package, that uses amplicon sequence variants (ASV) level for microbial community characterization. It integrates all assets of the latest bioinformatics methods, such as better sequence tracking, decontamination from control samples, use of multiple reference databases for taxonomic annotation, all main ecological analysis for which we propose advanced statistical tests, and a cross-validated differential analysis by four different methods. Our package produces ready to publish figures, and all of its outputs are made to be integrated in Rmarkdown code to produce automated reports.

Download Full-text

Microbial Community Ecology & Insect Nutrition

American Entomologist ◽

10.1093/ae/46.3.173 ◽

2000 ◽

Vol 46 (3) ◽

pp. 173-185 ◽

Cited By ~ 38

Author(s):

Michael G. Kaufman ◽

Edward D. Walker ◽

David A. Odelson ◽

Michael J. Klug

Keyword(s):

Microbial Community ◽

Community Ecology ◽

Insect Nutrition ◽

Microbial Community Ecology

Download Full-text

What is microbial community ecology?

The ISME Journal ◽

10.1038/ismej.2009.88 ◽

2009 ◽

Vol 3 (11) ◽

pp. 1223-1230 ◽

Cited By ~ 200

Author(s):

Allan Konopka

Keyword(s):

Microbial Community ◽

Community Ecology ◽

Microbial Community Ecology

Download Full-text

Reference-based error correction of amplicon sequencing data from synthetic communities

10.1101/2021.01.15.426834 ◽

2021 ◽

Author(s):

Pengfan Zhang ◽

Stjin Spaepen ◽

Yang Bai ◽

Stephane Hacquard ◽

Ruben Garrido-Oter

Keyword(s):

Microbial Communities ◽

Amplicon Sequencing ◽

R Package ◽

Fungal Communities ◽

Polymorphic Variation ◽

Sequencing Data ◽

Extensive Evaluation ◽

Culture Independent ◽

Reference Sequences ◽

Synthetic Microbial Communities

AbstractMotivationSynthetic microbial communities (SynComs) constitute an emergent and powerful tool in biological, biomedical, and biotechnological research. Despite recent advances in algorithms for analysis of culture-independent amplicon sequencing data from microbial communities, there is a lack of tools specifically designed for analysing SynCom data, where reference sequences for each strain are available.ResultsHere we present Rbec, a tool designed for analysing SynCom data that outperforms current methods by accurately correcting errors in amplicon sequences and identifying intra-strain polymorphic variation. Extensive evaluation using mock bacterial and fungal communities show that our tool performs robustly for samples of varying complexity, diversity, and sequencing depth. Further, Rbec also allows accurate detection of contaminations in SynCom experiments.AvailabilityRbec is freely available as an open-source R package and can be downloaded at: https://github.com/PengfanZhang/Microbiome.

Download Full-text

Microbial community ecology in bioelectrochemical systems (BESs) using 16S ribosomal RNA (rRNA) pyrosequencing

10.5353/th_b5543986 ◽

2014 ◽

Author(s):

Tae Jin Park

Keyword(s):

Microbial Community ◽

Community Ecology ◽

Ribosomal Rna ◽

16S Ribosomal Rna ◽

Bioelectrochemical Systems ◽

Microbial Community Ecology

Download Full-text