scholarly journals Testing Mediation Effects in High-Dimensional Microbiome Data with False Discovery Rate Control

Author(s):  
Ye Yue ◽  
Yijuan Hu

Abstract Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Results: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset, which identified a plausible mediator.Conclusions: Inverse regression coupled with the LDM is a strategy that performs well and is capable of handling mediation analysis in a wide variety of microbiome studies.

2021 ◽  
Author(s):  
Ye Yue ◽  
Yi-Juan Hu

Background: Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null (no exposure-microbe association, no microbe-outcome association given the exposure, or neither), most existing methods for the global test such as MedTest and MODIMA treat the microbes as if they are all under the same type of null. Methods: We propose a new approach based on inverse regression that regresses the (possibly transformed) relative abundance of each taxon on the exposure and the exposure-adjusted outcome to assess the exposure-taxon and taxon-outcome associations simultaneously. Then the association p-values are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method is implemented in the LDM and enjoys all the features of the LDM, i.e., allowing an arbitrary number of taxa to be tested, supporting continuous, discrete, or multivariate exposures and outcomes as well as adjustment of confounding covariates, accommodating clustered data, and offering analysis at the relative abundance or presence-absence scale. We refer to this new method as LDM-med. Results: Using extensive simulations, we showed that LDM-med always controlled the type I error of the global test and had compelling power over existing methods; LDM-med always preserved the FDR of testing individual taxa and had much better sensitivity than alternative approaches. In contrast, MedTest and MODIMA had severely inflated type I error when different taxa were under different types of null. The flexibility of LDM-med for a variety of mediation analyses is illustrated by the application to a murine microbiome dataset. Availability and Implementation: Our new method has been added to our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.


BMC Genetics ◽  
2005 ◽  
Vol 6 (Suppl 1) ◽  
pp. S134 ◽  
Author(s):  
Qiong Yang ◽  
Jing Cui ◽  
Irmarie Chazaro ◽  
L Adrienne Cupples ◽  
Serkalem Demissie

Genetics ◽  
1998 ◽  
Vol 150 (4) ◽  
pp. 1699-1706 ◽  
Author(s):  
Joel Ira Weller ◽  
Jiu Zhou Song ◽  
David W Heyen ◽  
Harris A Lewin ◽  
Micha Ron

Abstract Saturated genetic marker maps are being used to map individual genes affecting quantitative traits. Controlling the “experimentwise” type-I error severely lowers power to detect segregating loci. For preliminary genome scans, we propose controlling the “false discovery rate,” that is, the expected proportion of true null hypotheses within the class of rejected null hypotheses. Examples are given based on a granddaughter design analysis of dairy cattle and simulated backcross populations. By controlling the false discovery rate, power to detect true effects is not dependent on the number of tests performed. If no detectable genes are segregating, controlling the false discovery rate is equivalent to controlling the experimentwise error rate. If quantitative loci are segregating in the population, statistical power is increased as compared to control of the experimentwise type-I error. The difference between the two criteria increases with the increase in the number of false null hypotheses. The false discovery rate can be controlled at the same level whether the complete genome or only part of it has been analyzed. Additional levels of contrasts, such as multiple traits or pedigrees, can be handled without the necessity of a proportional decrease in the critical test probability.


2006 ◽  
Vol 45 (9) ◽  
pp. 1181-1189 ◽  
Author(s):  
D. S. Wilks

Abstract The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level αglobal is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than αglobal × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.


mSystems ◽  
2017 ◽  
Vol 2 (6) ◽  
Author(s):  
Lingjing Jiang ◽  
Amnon Amir ◽  
James T. Morton ◽  
Ruth Heller ◽  
Ery Arias-Castro ◽  
...  

ABSTRACT DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures. Differential abundance testing is a critical task in microbiome studies that is complicated by the sparsity of data matrices. Here we adapt for microbiome studies a solution from the field of gene expression analysis to produce a new method, discrete false-discovery rate (DS-FDR), that greatly improves the power to detect differential taxa by exploiting the discreteness of the data. Additionally, DS-FDR is relatively robust to the number of noninformative features, and thus removes the problem of filtering taxonomy tables by an arbitrary abundance threshold. We show by using a combination of simulations and reanalysis of nine real-world microbiome data sets that this new method outperforms existing methods at the differential abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given difference (thus increasing the efficiency of microbiome experiments considerably). We therefore expect DS-FDR to be widely applied in microbiome studies. IMPORTANCE DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures.


2016 ◽  
Vol 27 (8) ◽  
pp. 2437-2446 ◽  
Author(s):  
Hezhi Lu ◽  
Hua Jin ◽  
Weixiong Zeng

Hida and Tango established a statistical testing framework for the three-arm non-inferiority trial including a placebo with a pre-specified non-inferiority margin to overcome the shortcomings of traditional two-arm non-inferiority trials (such as having to choose the non-inferiority margin). In this paper, we propose a new method that improves their approach with respect to two aspects. We construct our testing statistics based on the best unbiased pooled estimators of the homogeneous variance; and we use the principle of intersection-union tests to determine the rejection rule. We theoretically prove that our test is better than that of Hida and Tango for large sample sizes. Furthermore, when that sample size was small or moderate, our simulation studies showed that our approach performed better than Hida and Tango’s. Although both controlled the type I error rate, their test was more conservative and the statistical power of our test was higher.


2019 ◽  
Vol 16 (2) ◽  
pp. 73-82
Author(s):  
I. A. ADELEKE ◽  
A. O. ADEYEMI ◽  
E. E.E. AKARAWAK

Multiple testing is associated with simultaneous testing of many hypotheses, and frequently calls for adjusting level of significance in some way that the probability of observing at least one significant result due to chance remains below the desired significance levels. This study developed a Binomial Model Approximations (BMA) method as an alternative to addressing the multiplicity problem associated with testing more than one hypothesis at a time. The proposed method has demonstrated capacity for controlling Type I Error Rate as sample size increases when compared with the existing Bonferroni and False Discovery Rate (FDR).      


2019 ◽  
Vol 22 (8) ◽  
pp. 1339-1346 ◽  
Author(s):  
Rachel Nolan-Kenney ◽  
Fen Wu ◽  
Jiyuan Hu ◽  
Liying Yang ◽  
Dervla Kelly ◽  
...  

Abstract Introduction Epidemiological studies that investigate alterations in the gut microbial composition associated with smoking are lacking. This study examined the composition of the gut microbiome in smokers compared with nonsmokers. Aims and Methods Stool samples were collected in a cross-sectional study of 249 participants selected from the Health Effects of Arsenic Longitudinal Study in Bangladesh. Microbial DNA was extracted from the fecal samples and sequenced by 16S rRNA gene sequencing. The associations of smoking status and intensity of smoking with the relative abundance or the absence and presence of individual bacterial taxon from phylum to genus levels were examined. Results The relative abundance of bacterial taxa along the Erysipelotrichi-to-Catenibacterium lineage was significantly higher in current smokers compared to never-smokers. The odds ratio comparing the mean relative abundance in current smokers with that in never-smokers was 1.91 (95% confidence interval = 1.36–2.69) for the genus Catenibacterium and 1.89 (95% confidence interval = 1.39–2.56) for the family Erysipelotrichaceae, the order Erysipelotrichale, and the class Erysipelotrichi (false discovery rate-adjusted p values = .0008–.01). A dose–response association was observed for each of these bacterial taxa. The presence of Alphaproteobacteria was significantly greater comparing current with never-smokers (odds ratio = 4.85, false discovery rate-adjusted p values = .04). Conclusions Our data in a Bangladeshi population are consistent with evidence of an association between smoking status and dosage with change in the gut bacterial composition. Implications This study for the first time examined the relationship between smoking and the gut microbiome composition. The data suggest that smoking status may play an important role in the composition of the gut microbiome, especially among individuals with higher levels of tobacco exposure.


Sign in / Sign up

Export Citation Format

Share Document