scholarly journals To rarefy or not to rarefy: Enhancing microbial community analysis through next-generation sequencing

2020 ◽  
Author(s):  
Ellen S. Cameron ◽  
Philip J. Schmidt ◽  
Benjamin J.-M. Tremblay ◽  
Monica B. Emelko ◽  
Kirsten M. Müller

AbstractThe application of amplicon sequencing in water research provides a rapid and sensitive technique for microbial community analysis in a variety of environments ranging from freshwater lakes to water and wastewater treatment plants. It has revolutionized our ability to study DNA collected from environmental samples by eliminating the challenges associated with lab cultivation and taxonomic identification. DNA sequencing data consist of discrete counts of sequence reads, the total number of which is the library size. Samples may have different library sizes and thus, a normalization technique is required to meaningfully compare them. The process of randomly subsampling sequences to a selected normalized library size from the sample library—rarefying—is one such normalization technique. However, rarefying has been criticized as a normalization technique because data can be omitted through the exclusion of either excess sequences or entire samples, depending on the rarefied library size selected. Although it has been suggested that rarefying should be avoided altogether, we propose that repeatedly rarefying enables (i) characterization of the variation introduced to diversity analyses by this random subsampling and (ii) selection of smaller library sizes where necessary to incorporate all samples in the analysis. Rarefying may be a statistically valid normalization technique, but researchers should evaluate their data to make appropriate decisions regarding library size selection and subsampling type. The impact of normalized library size selection and rarefying with or without replacement in diversity analyses were evaluated herein.Highlights▪ Amplicon sequencing technology for environmental water samples is reviewed▪ Sequencing data must be normalized to allow comparison in diversity analyses▪ Rarefying normalizes library sizes by subsampling from observed sequences▪ Criticisms of data loss through rarefying can be resolved by rarefying repeatedly▪ Rarefying repeatedly characterizes errors introduced by subsampling sequences

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ellen S. Cameron ◽  
Philip J. Schmidt ◽  
Benjamin J.-M. Tremblay ◽  
Monica B. Emelko ◽  
Kirsten M. Müller

AbstractAmplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.


Author(s):  
Lauren V. Alteio ◽  
Joana Séneca ◽  
Alberto Canarini ◽  
Roey Angel ◽  
Ksenia Guseva ◽  
...  

Microbial community analysis via marker gene amplicon sequencing has become a routine method in the field of soil research. In this perspective, we discuss technical challenges and limitations of amplicon sequencing studies in soil and present statistical and experimental approaches that can help addressing the spatio-temporal complexity of soil and the high diversity of organisms therein. We illustrate the impact of compositionality on the interpretation of relative abundance data and discuss effects of sample replication on the statistical power in soil community analysis. Additionally, we argue for the need of increased study reproducibility and data availability, as well as complementary techniques for generating deeper ecological insights into microbial roles and our understanding thereof in soil ecosystems. At this stage, we call upon researchers and specialized soil journals to consider the current state of data analysis, interpretation and availability to improve the rigor of future studies.


2015 ◽  
Vol 15 (1) ◽  
Author(s):  
Liyou Wu ◽  
Chongqing Wen ◽  
Yujia Qin ◽  
Huaqun Yin ◽  
Qichao Tu ◽  
...  

2015 ◽  
pp. 2.4.2-1-2.4.2-26 ◽  
Author(s):  
Danny Ionescu ◽  
Will A. Overholt ◽  
Michael D. J. Lynch ◽  
Josh D. Neufeld ◽  
Ankur Naqib ◽  
...  

2017 ◽  
Vol 6 (5) ◽  
pp. e00500 ◽  
Author(s):  
Jacob H. Jacob ◽  
Emad I. Hussein ◽  
Muhamad Ali K. Shakhatreh ◽  
Christopher T. Cornelison

2018 ◽  
Author(s):  
Kasper S. Andersen ◽  
Rasmus H. Kirkegaard ◽  
Søren M. Karst ◽  
Mads Albertsen

AbstractSummaryMicrobial community analysis using 16S rRNA gene amplicon sequencing is the backbone of many microbial ecology studies. Several approaches and pipelines exist for processing the raw data generated through DNA sequencing and convert the data into OTU-tables. Here we present ampvis2, an R package designed for analysis of microbial community data in OTU-table format with focus on simplicity, reproducibility, and sample metadata integration, with a minimal set of intuitive commands. Unique features include flexible heatmaps and simplified ordination. By generating plots using the ggplot2 package, ampvis2 produces publication-ready figures that can be easily customised. Furthermore, ampvis2 includes features for interactive visualisation, which can be convenient for larger, more complex data.Availabilityampvis2 is implemented in the R statistical language and is released under the GNU A-GPL license. Documentation website and source code is maintained at: https://github.com/MadsAlbertsen/ampvis2ContactMads Albertsen ([email protected])


Sign in / Sign up

Export Citation Format

Share Document