A New Approach to Testing Mediation of the Microbiome using the LDM

Mapping Intimacies ◽

10.1101/2021.11.12.468449 ◽

2021 ◽

Author(s):

Ye Yue ◽

Yi-Juan Hu

Keyword(s):

Relative Abundance ◽

Type I Error ◽

R Package ◽

New Method ◽

Type I ◽

Inverse Regression ◽

Global Test ◽

New Approach ◽

Mediation Analyses ◽

Mediating Role

Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Methods: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Results: Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset. Availability and Implementation: Our new method has been added to our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.

Download Full-text

Testing Mediation Effects in High-Dimensional Microbiome Data with False Discovery Rate Control

10.21203/rs.3.rs-1105471/v1 ◽

2021 ◽

Author(s):

Ye Yue ◽

Yijuan Hu

Keyword(s):

False Discovery Rate ◽

Relative Abundance ◽

Mediation Analysis ◽

Type I Error ◽

New Method ◽

Type I ◽

Inverse Regression ◽

Global Test ◽

Mediation Effects ◽

False Discovery

Abstract Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Results: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset, which identified a plausible mediator.Conclusions: Inverse regression coupled with the LDM is a strategy that performs well and is capable of handling mediation analysis in a wide variety of microbiome studies.

Download Full-text

New Insights in Power and Power Analysis in Mediation Models

10.31234/osf.io/c2kw7 ◽

2021 ◽

Author(s):

Amanda Kay Montoya ◽

Chris Aberson ◽

Jessica Fossum ◽

Donna Chen ◽

Oscar Gonzalez

Keyword(s):

Latent Variable ◽

Power Analysis ◽

Type I Error ◽

R Package ◽

Type I ◽

Personality Psychology ◽

Mediation Analyses ◽

Personality Research ◽

Assumption Violations ◽

Study Characteristics

Mediation analysis is commonly used in social-personality psychology to evaluate potential mechanisms of effects. With the recent replicability crisis, researchers are turning to power analysis to help plan studies; however, power analysis for mediation is not implemented in popular software (e.g., G*Power). Our symposium includes two presentations focusing on implementation of power analysis for mediation: (1) describing easy-to-use tools for implementing power analysis (e.g., pwr2ppl R package), and (2) evaluating whether different inferential methods result in similar recommended sample sizes and the role of assumption violations in these differences. Two presenters focus on study characteristics which can affect power: (1) use of the bias-corrected confidence interval and alternatives which better balance power and type I error, and (2) how measurement error on the mediator can impact power and how to correct this issue with latent variable models. Presentations will include applied examples, aimed at a social-personality audience, and provide concrete steps for increasing the validity and replicability of mediation analyses conducted in social-personality research. (Symposium Presented at SPSP 2021)

Download Full-text

Testing hypotheses about the microbiome using the linear decomposition model (LDM)

Bioinformatics ◽

10.1093/bioinformatics/btaa260 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4106-4115 ◽

Cited By ~ 1

Author(s):

Yi-Juan Hu ◽

Glen A Satten

Keyword(s):

Multiple Testing ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Omnibus Test ◽

Global Test ◽

Decomposition Model ◽

Interaction Terms ◽

Continuous And Discrete Variables

Abstract Motivation Methods for analyzing microbiome data generally fall into one of two groups: tests of the global hypothesis of any microbiome effect, which do not provide any information on the contribution of individual operational taxonomic units (OTUs); and tests for individual OTUs, which do not typically provide a global test of microbiome effect. Without a unified approach, the findings of a global test may be hard to resolve with the findings at the individual OTU level. Further, many tests of individual OTU effects do not preserve the false discovery rate (FDR). Results We introduce the linear decomposition model (LDM), that provides a single analysis path that includes global tests of any effect of the microbiome, tests of the effects of individual OTUs while accounting for multiple testing by controlling the FDR, and a connection to distance-based ordination. The LDM accommodates both continuous and discrete variables (e.g. clinical outcomes, environmental factors) as well as interaction terms to be tested either singly or in combination, allows for adjustment of confounding covariates, and uses permutation-based P-values that can control for sample correlation. The LDM can also be applied to transformed data, and an ‘omnibus’ test can easily combine results from analyses conducted on different transformation scales. We also provide a new implementation of PERMANOVA based on our approach. For global testing, our simulations indicate the LDM provided correct type I error and can have comparable power to existing distance-based methods. For testing individual OTUs, our simulations indicate the LDM controlled the FDR well. In contrast, DESeq2 often had inflated FDR; MetagenomeSeq generally had the lowest sensitivity. The flexibility of the LDM for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. We also show that our implementation of PERMANOVA can outperform existing implementations. Availability and implementation The R package LDM is available on GitHub at https://github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Implementation of the Omega (ω) Index to detect large-scale systematic cheating

10.35542/osf.io/exwkp ◽

2019 ◽

Author(s):

Alvin Vista

Keyword(s):

Standardized Testing ◽

Large Scale ◽

Type I Error ◽

R Package ◽

Statistical Testing ◽

System Level ◽

Control Group ◽

Type I ◽

Data Contamination ◽

Cheating Detection

Cheating detection is an important issue in standardized testing, especially in large-scale settings. Statistical approaches are often computationally intensive and require specialised software to conduct. We present a two-stage approach that quickly filters suspected groups using statistical testing on an IRT-based answer-copying index. We also present an approach to mitigate data contamination and improve the performance of the index. The computation of the index was implemented through a modified version of an open source R package, thus enabling wider access to the method. Using data from PIRLS 2011 (N=64,232) we conduct a simulation to demonstrate our approach. Type I error was well-controlled and no control group was falsely flagged for cheating, while 16 (combined n=12,569) of the 18 (combined n=14,149) simulated groups were detected. Implications for system-level cheating detection and further improvements of the approach were discussed.

Download Full-text

Custom Contrast Testing: Current Trends and a New Approach

The Accounting Review ◽

10.2308/accr-52005 ◽

2018 ◽

Vol 93 (5) ◽

pp. 223-244 ◽

Cited By ~ 86

Author(s):

Ryan D. Guggenmos ◽

M. David Piercey ◽

Christopher P. Agoglia

Keyword(s):

Type I Error ◽

Type I ◽

Visual Evaluation ◽

New Approach ◽

Accounting Research ◽

Elevated Type ◽

Current Trends ◽

Comprehensive Picture ◽

Contrast Analysis ◽

First Time

ABSTRACT Contrast analysis has become prevalent in experimental accounting research since Buckless and Ravenscroft (1990) introduced it to the accounting literature over 25 years ago. Since its initial introduction, the scope of contrast testing has expanded, yet guidance as to the most appropriate methods of specifying, conducting, interpreting, and exhibiting these tests has not. We survey the use of contrast analysis in the recent literature and propose a three-part testing approach that provides a more comprehensive picture of contrast results. Our approach considers three pieces of complementary evidence: the visual evaluation of fit, traditional significance testing, and quantitative evaluation of the contrast variance residual. Our measure of the contrast variance residual, q2, is proposed for the first time in this work. After proposing our approach, we walk through six common contrast testing scenarios where current practices may fall short and our approach may guide researchers. We extend Buckless and Ravenscroft (1990) and contribute to the accounting research methods literature by documenting current contrast analysis practices that result in elevated Type I error and by proposing a potential solution to mitigate these concerns.

Download Full-text

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz453 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5155-5162 ◽

Cited By ~ 10

Author(s):

Chengzhong Ye ◽

Terence P Speed ◽

Agus Salim

Keyword(s):

Single Cell ◽

Differential Expression ◽

Type I Error ◽

R Package ◽

Supplementary Information ◽

Type I ◽

Common Phenomenon ◽

Rna Seq ◽

Capture Process ◽

Technological Platforms

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrating the evidence from evidence factors in observational studies

Biometrika ◽

10.1093/biomet/asz003 ◽

2019 ◽

Vol 106 (2) ◽

pp. 353-367 ◽

Cited By ~ 3

Author(s):

B Karmakar ◽

B French ◽

D S Small

Keyword(s):

Sensitivity Analysis ◽

Observational Study ◽

Error Rate ◽

Type I Error ◽

R Package ◽

Sensitivity Analyses ◽

Bahadur Efficiency ◽

Type I ◽

Familywise Error Rate ◽

Risk Of Cancer

Summary A sensitivity analysis for an observational study assesses how much bias, due to nonrandom assignment of treatment, would be necessary to change the conclusions of an analysis that assumes treatment assignment was effectively random. The evidence for a treatment effect can be strengthened if two different analyses, which could be affected by different types of biases, are both somewhat insensitive to bias. The finding from the observational study is then said to be replicated. Evidence factors allow for two independent analyses to be constructed from the same dataset. When combining the evidence factors, the Type I error rate must be controlled to obtain valid inference. A powerful method is developed for controlling the familywise error rate for sensitivity analyses with evidence factors. It is shown that the Bahadur efficiency of sensitivity analysis for the combined evidence is greater than for either evidence factor alone. The proposed methods are illustrated through a study of the effect of radiation exposure on the risk of cancer. An R package, evidenceFactors, is available from CRAN to implement the methods of the paper.

Download Full-text

A test of inflated zeros for Poisson regression models

Statistical Methods in Medical Research ◽

10.1177/0962280217749991 ◽

2017 ◽

Vol 28 (4) ◽

pp. 1157-1169 ◽

Cited By ~ 1

Author(s):

Hua He ◽

Hui Zhang ◽

Peng Ye ◽

Wan Tang

Keyword(s):

Poisson Regression ◽

Regression Models ◽

Type I Error ◽

Large Body ◽

Poisson Model ◽

Type I ◽

New Approach ◽

Poisson Models ◽

Zero Inflated Poisson Models ◽

Vuong Test

Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.

Download Full-text

A more efficient three-arm non-inferiority test based on pooled estimators of the homogeneous variance

Statistical Methods in Medical Research ◽

10.1177/0962280216681036 ◽

2016 ◽

Vol 27 (8) ◽

pp. 2437-2446 ◽

Cited By ~ 1

Author(s):

Hezhi Lu ◽

Hua Jin ◽

Weixiong Zeng

Keyword(s):

Sample Size ◽

Error Rate ◽

Statistical Power ◽

Type I Error ◽

Statistical Testing ◽

New Method ◽

Type I ◽

Simulation Studies ◽

Testing Framework ◽

Better Than

Hida and Tango established a statistical testing framework for the three-arm non-inferiority trial including a placebo with a pre-specified non-inferiority margin to overcome the shortcomings of traditional two-arm non-inferiority trials (such as having to choose the non-inferiority margin). In this paper, we propose a new method that improves their approach with respect to two aspects. We construct our testing statistics based on the best unbiased pooled estimators of the homogeneous variance; and we use the principle of intersection-union tests to determine the rejection rule. We theoretically prove that our test is better than that of Hida and Tango for large sample sizes. Furthermore, when that sample size was small or moderate, our simulation studies showed that our approach performed better than Hida and Tango’s. Although both controlled the type I error rate, their test was more conservative and the statistical power of our test was higher.

Download Full-text

Detecting differentially methylated regions using a fast wavelet-based approach to functional association analysis

BMC Bioinformatics ◽

10.1186/s12859-021-03979-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

William R. P. Denault ◽

Astanand Jugessur

Keyword(s):

Type I Error ◽

R Package ◽

Type I ◽

Differentially Methylated Regions ◽

Stat Assoc ◽

Link Type ◽

Fast Wavelet ◽

Computational Speed ◽

Recent Method ◽

Public Repositories

Abstract Background We present here a computational shortcut to improve a powerful wavelet-based method by Shim and Stephens (Ann Appl Stat 9(2):665–686, 2015. 10.1214/14-AOAS776) called WaveQTL that was originally designed to identify DNase I hypersensitivity quantitative trait loci (dsQTL). Results WaveQTL relies on permutations to evaluate the significance of an association. We applied a recent method by Zhou and Guan (J Am Stat Assoc 113(523):1362–1371, 2017. 10.1080/01621459.2017.1328361) to boost computational speed, which involves calculating the distribution of Bayes factors and estimating the significance of an association by simulations rather than permutations. We called this simulation-based approach “fast functional wavelet” (FFW), and tested it on a publicly available DNA methylation (DNAm) dataset on colorectal cancer. The simulations confirmed a substantial gain in computational speed compared to the permutation-based approach in WaveQTL. Furthermore, we show that FFW controls the type I error satisfactorily and has good power for detecting differentially methylated regions. Conclusions Our approach has broad utility and can be applied to detect associations between different types of functions and phenotypes. As more and more DNAm datasets are being made available through public repositories, an attractive application of FFW would be to re-analyze these data and identify associations that might have been missed by previous efforts. The full R package for FFW is freely available at GitHub https://github.com/william-denault/ffw.

Download Full-text