scholarly journals A Rarefaction-Without-Resampling Extension of PERMANOVA for Testing Presence-Absence Associations in The Microbiome

Author(s):  
Yi-Juan Hu ◽  
Glen A. Satten

Abstract Background PERMANOVA [1] is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence-absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential costs of information loss and the introduction of a stochastic component into the analysis.Methods Here we develop a non-stochastic approach to PERMANOVA presence-absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix, and averaging the F-statistic.Results Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease (IBD) in which samples from case participants have systematically smaller library sizes than samples from control participants.Conclusions Our extension of PERMANOVA for presence-absence analyses using a non-stochastic approach that aggregates information over all potential rarefaction replicates without actual resampling is robust to confounding by library size and outperforms stochastic resampling approaches.

2021 ◽  
Author(s):  
Yijuan Hu ◽  
Glen Satten

Background: PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence-absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential costs of information loss and the introduction of a stochastic component into the analysis. Methods: Here we develop a non-stochastic approach to PERMANOVA presence-absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix, and averaging the F-statistic. Results: Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease (IBD) in which samples from case participants have systematically smaller library sizes than samples from control participants.


2020 ◽  
Author(s):  
Yi-Juan Hu ◽  
Andrea Lane ◽  
Glen A. Satten

AbstractBackgroundMany methods for testing association between the microbiome and covariates of interest (e.g., clinical outcomes, environmental factors) assume that these associations are driven by changes in the relative abundance of taxa. However, these associations may also result from changes in which taxa are present and which are absent. Analyses of such presence-absence associations face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential cost of information loss as well as the introduction of a stochastic component into the analysis. Currently, there is a need for robust and efficient methods for testing presence-absence associations in the presence of such confounding, both at the community level and at the individual-taxon level, that avoid the drawbacks of rarefaction.MethodsWe have previously developed the linear decomposition model (LDM) that unifies the community-level and taxon-level tests into one framework. Here we present an extension of the LDM for testing presence-absence associations. The extended LDM is a non-stochastic approach that repeatedly applies the LDM to all rarefied taxa count tables, averages the residual sum-of-squares (RSS) terms over the rarefaction replicates, and then forms an F-statistic based on these average RSS terms. We show that this approach compares favorably to averaging the F-statistic from R rarefaction replicates, which can only be calculated stochastically. The flexible nature of the LDM allows discrete or continuous traits or interactions to be tested while allowing confounding covariates to be adjusted for.ResultsOur simulations indicate that our proposed method is robust to any systematic differences in library size and has better power than alternative approaches. We illustrate our method using an analysis of data on inflammatory bowel disease (IBD) in which case samples have systematically smaller library sizes than controls.ConclusionsThe rarefaction-based extension of the LDM performs well for testing presenceabsence associations and should be adopted even when there is no obvious systematic variation in library size.


2019 ◽  
Vol 3 (2) ◽  
Author(s):  
C. G. Bower ◽  
S. C. Fernando ◽  
G. A. Sullivan

ObjectivesThis study aimed to evaluate the spoilage microbiota of beef throughout various processing steps and identify key differences in the microbiome associated with each phase of processing.Materials and MethodsIn each of three replicates, products representing each phase of processing were made from the same uniform meat block (beef shoulder clods): T1-ground beef; T2-fresh sausage; T3-cooked links; T4-beef franks; T5-sliced bologna; T6-bologna with HPP treatment; T7-bologna with lactate/diacetate. Raw treatments were evaluated every 3 d for 21 d, and cooked treatments were evaluated every 14 d for 112 d. Heat treated products were cooked to an internal temperature of 71°C and chilled overnight at 4°C. Parameters for HPP were 600 MPa for 3 min. Aerobic (APC), anaerobic (AnPC), lactic acid bacteria (LAB), and psychrotrophic (PPC) plate counts were measured. Microbial communities were evaluated using high throughput 16S rRNA gene sequencing on the Illumina MiSeq platform. Reads were processed using QIIME, binned into operational taxonomic units (OTUs) at 97% similarity, and assigned taxonomy using the Greengenes database as reference. Alpha and β diversity of bacterial communities were analyzed using QIIME and R. Alpha diversity was estimated using observed OTUs and Chao1 estimates, and β diversity was determined using the weighted and unweighted UniFrac distance matrices (Fig. 2). Raw and cooked samples were analyzed independently for plate counts and α diversity.ResultsThere was a treatment by storage time interaction for AnPC in cooked samples (P = 0.003), where T3, T4, and T7 increased from Day 28 and 42. In raw samples, there was a main effect of storage time on APC, AnPC, LAB, and PPC (P < 0.001), where growth increased over time. In cooked samples, there was a main effect of storage time on APC, LAB, and PPC, and a main effect of treatment for APC and LAB (P < 0.030). Higher APC and LAB counts were observed in T5, while a general increase in APC, LAB, and PPC was seen throughout storage time. There were main effects of treatment and storage time on Chao1 and Observed OTUs in raw samples (P < 0.023) and a main effect of treatment in cooked samples (P < 0.009). In raw samples, bacterial richness was greater in T2 compared to T1, and generally decreased throughout storage time. In cooked samples, richness was the greatest in T3 and T4, the least in the T5, and T6 and T7 were intermediate. There were main effects for treatment and storage time on the bacterial community structure according to the weighted UniFrac distance matrix (P < 0.004) and a treatment by storage time interaction for the unweighted UniFrac distance matrix (P = 0.031). For the weighted UniFrac, T1 and T5 samples formed a cluster relatively separate from the other treatments, while T2 formed an additional cluster by itself. For the unweighted UniFrac, T1, T2, and T5 formed a cluster separate from the other samples, with increased storage times being further separated from the other samples.ConclusionResults from this study indicate that the microbiota of cooked, sliced, bologna is somewhat similar to that of raw ground beef, whereas fresh sausage, cooked links, and bologna with HPP and antimicrobial treatments are different from the former. Treatments where microbial growth was reduced had a significantly different microbial composition compared to those with greater amounts of growth.Figure 2PCoA Plot of Weighted (a) and Unweighted (b) UniFrac Distance Matrices.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lifeng Zhu ◽  
Wei Zhu ◽  
Tian Zhao ◽  
Hua Chen ◽  
Chunlin Zhao ◽  
...  

An increasing number of studies have shown that warming also influences the animal gut microbiome (altering the community structure and decreasing its diversity), which might further impact host fitness. Here, based on an analysis of the stomach and gut (the entire intestine: from the anterior intestine to the cloaca) microbiome in laboratory larva of giant salamanders (Andrias davidianus) under different living water temperatures (5, 15, and 25°C) at two sample time points (80 and 330 days after the acclimation), we investigated the potential effect of temperature on the gastrointestinal microbiome community. We found the significant Interaction between sampling time and temperature, or type (stomach and gut) on Shannon index in the gastrointestinal microbiome of the giant salamanders. We also found the significant difference in Shannon index among temperature groups within the same sample type (stomach or gut) at each sample time. 10% of variation in microbiome community could be explained by temperature alone in the total samples. Both the stomach and gut microbiomes displayed the highest similarity in the microbiome community (significantly lowest pairwise unweighted Unifrac distance) in the 25-degree group between the two sampling times compared to those in the 5-degree and 15-degree groups. Moreover, the salamanders in the 25°C treatment showed the highest food intake and body mess compared to that of other temperature treatments. A significant increase in the abundance of Firmicutes in the gastrointestinal microbiome on day 330 with increasing temperatures might be caused by increased host metabolism and food consumption. Therefore, we speculate that the high environmental temperature might indirectly affect both alpha and beta diversity of the gastrointestinal microbiome.


Ragnar Frisch, the Nobel prizer in economics, drew attention to two phenomena: propagation problems and impulse problems in dynamic economics. His deep scientific contribution relates to the interpretation of business cycle transformed under the influence of impulses (shocks). But some terminological misunderstandings arose. One of them forced the authors to focus on the phenomenon of systems' self-movement: their self-organization in statics and their self-development in dynamics. Another one relates to exogenous nature of impulses (shocks) that forced the authors to prove the endogenous embeddedness of shocks into the mechanisms of dialectical laws implementation. Eugen Slutsky demonstrated the stochastic approach as to random fluctuations as a source of cyclical processes in the economy. The confusion in the concepts of cycles and waves predetermines the need to create a wave theory of systemic self-organization (Chapter 2). Modern shocks theory develops a new approach which makes it possible to eliminate misconceptions of past theories.


2016 ◽  
Vol 32 (4) ◽  
pp. 947-962 ◽  
Author(s):  
Kim Dunstan ◽  
Christopher Ball

Abstract Statistics New Zealand is one of the few national statistical agencies to have applied a stochastic (probabilistic) approach to official demographic projections. This article discusses the experience and benefits of adopting this new approach, including the perspective of a key user of projections, the New Zealand Treasury. Our experience is that the change is less difficult to make than might be expected. Uncertainty in the different projection inputs (components) can be modelled simply or with more complexity, and progressively applied to different projection types. This means that not all the different demographic projections an agency produces need to adopt a stochastic approach simultaneously. At the same time, users of the projections are keen to better understand the relative certainty and uncertainty of projected outcomes, given the important uses of projections.


2019 ◽  
Author(s):  
Deepank R Korandla ◽  
Jacob M Wozniak ◽  
Anaamika Campeau ◽  
David J Gonzalez ◽  
Erik S Wright

Abstract Motivation A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. Results Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88–95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. Availability and implementation AssessORF is available as an R package via the Bioconductor package repository. Supplementary information Supplementary data are available at Bioinformatics online.


2004 ◽  
Vol 39 (3) ◽  
pp. 448-452 ◽  
Author(s):  
Grier L Arthur ◽  
Marshall Z Schwartz ◽  
Keith A Kuenzler ◽  
Ruth Birbe

2021 ◽  
Author(s):  
Ye Yue ◽  
Yi-Juan Hu

Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Methods: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Results: Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset. Availability and Implementation: Our new method has been added to our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.


Sign in / Sign up

Export Citation Format

Share Document