scholarly journals The Romano–Wolf multiple-hypothesis correction in Stata

Author(s):  
Damian Clarke ◽  
Joseph P. Romano ◽  
Michael Wolf

When considering multiple-hypothesis tests simultaneously, standard statistical techniques will lead to overrejection of null hypotheses unless the multiplicity of the testing framework is explicitly considered. In this article, we discuss the Romano–Wolf multiple-hypothesis correction and document its implementation in Stata. The Romano–Wolf correction (asymptotically) controls the familywise error rate, that is, the probability of rejecting at least one true null hypothesis among a family of hypotheses under test. This correction is considerably more powerful than earlier multiple-testing procedures, such as the Bonferroni and Holm corrections, given that it takes into account the dependence structure of the test statistics by resampling from the original data. We describe a command, rwolf, that implements this correction and provide several examples based on a wide range of models. We document and discuss the performance gains from using rwolf over other multiple-testing procedures that control the familywise error rate.

2004 ◽  
Vol 3 (1) ◽  
pp. 1-69 ◽  
Author(s):  
Sandrine Dudoit ◽  
Mark J. van der Laan ◽  
Katherine S. Pollard

The present article proposes general single-step multiple testing procedures for controlling Type I error rates defined as arbitrary parameters of the distribution of the number of Type I errors, such as the generalized family-wise error rate. A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which single-step common-quantile and common-cut-off procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization of a null distribution, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. In the special case of family-wise error rate (FWER) control, our method yields the single-step minP and maxT procedures, based on minima of unadjusted p-values and maxima of test statistics, respectively, with the important distinction in the choice of null distribution. Single-step procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution. The special cases of t- and F-statistics are discussed in detail. The companion articles focus on step-down multiple testing procedures for control of the FWER (van der Laan et al., 2004b) and on augmentations of FWER-controlling methods to control error rates such as tail probabilities for the number of false positives and for the proportion of false positives among the rejected hypotheses (van der Laan et al., 2004a). The proposed bootstrap multiple testing procedures are evaluated by a simulation study and applied to genomic data in the fourth article of the series (Pollard et al., 2004).


2019 ◽  
Vol 1 (2) ◽  
pp. 653-683 ◽  
Author(s):  
Frank Emmert-Streib ◽  
Matthias Dehmer

A statistical hypothesis test is one of the most eminent methods in statistics. Its pivotal role comes from the wide range of practical problems it can be applied to and the sparsity of data requirements. Being an unsupervised method makes it very flexible in adapting to real-world situations. The availability of high-dimensional data makes it necessary to apply such statistical hypothesis tests simultaneously to the test statistics of the underlying covariates. However, if applied without correction this leads to an inevitable increase in Type 1 errors. To counteract this effect, multiple testing procedures have been introduced to control various types of errors, most notably the Type 1 error. In this paper, we review modern multiple testing procedures for controlling either the family-wise error (FWER) or the false-discovery rate (FDR). We emphasize their principal approach allowing categorization of them as (1) single-step vs. stepwise approaches, (2) adaptive vs. non-adaptive approaches, and (3) marginal vs. joint multiple testing procedures. We place a particular focus on procedures that can deal with data with a (strong) correlation structure because real-world data are rarely uncorrelated. Furthermore, we also provide background information making the often technically intricate methods accessible for interdisciplinary data scientists.


2015 ◽  
Vol 14 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Rosa J. Meijer ◽  
Thijmen J.P. Krebs ◽  
Jelle J. Goeman

AbstractWe present a multiple testing method for hypotheses that are ordered in space or time. Given such hypotheses, the elementary hypotheses as well as regions of consecutive hypotheses are of interest. These region hypotheses not only have intrinsic meaning but testing them also has the advantage that (potentially small) signals across a region are combined in one test. Because the expected number and length of potentially interesting regions are usually not available beforehand, we propose a method that tests all possible region hypotheses as well as all individual hypotheses in a single multiple testing procedure that controls the familywise error rate. We start at testing the global null-hypothesis and when this hypothesis can be rejected we continue with further specifying the exact location/locations of the effect present. The method is implemented in the


2021 ◽  
Vol 18 (5) ◽  
pp. 521-528
Author(s):  
Eric S Leifer ◽  
James F Troendle ◽  
Alexis Kolecki ◽  
Dean A Follmann

Background/aims: The two-by-two factorial design randomizes participants to receive treatment A alone, treatment B alone, both treatments A and B( AB), or neither treatment ( C). When the combined effect of A and B is less than the sum of the A and B effects, called a subadditive interaction, there can be low power to detect the A effect using an overall test, that is, factorial analysis, which compares the A and AB groups to the C and B groups. Such an interaction may have occurred in the Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD BP) which simultaneously randomized participants to receive intensive or standard blood pressure, control and intensive or standard glycemic control. For the primary outcome of major cardiovascular event, the overall test for efficacy of intensive blood pressure control was nonsignificant. In such an instance, simple effect tests of A versus C and B versus C may be useful since they are not affected by a subadditive interaction, but they can have lower power since they use half the participants of the overall trial. We investigate multiple testing procedures which exploit the overall tests’ sample size advantage and the simple tests’ robustness to a potential interaction. Methods: In the time-to-event setting, we use the stratified and ordinary logrank statistics’ asymptotic means to calculate the power of the overall and simple tests under various scenarios. We consider the A and B research questions to be unrelated and allocate 0.05 significance level to each. For each question, we investigate three multiple testing procedures which allocate the type 1 error in different proportions for the overall and simple effects as well as the AB effect. The Equal Allocation 3 procedure allocates equal type 1 error to each of the three effects, the Proportional Allocation 2 procedure allocates 2/3 of the type 1 error to the overall A (respectively, B) effect and the remaining type 1 error to the AB effect, and the Equal Allocation 2 procedure allocates equal amounts to the simple A (respectively, B) and AB effects. These procedures are applied to ACCORD BP. Results: Across various scenarios, Equal Allocation 3 had robust power for detecting a true effect. For ACCORD BP, all three procedures would have detected a benefit of intensive glycemia control. Conclusions: When there is no interaction, Equal Allocation 3 has less power than a factorial analysis. However, Equal Allocation 3 often has greater power when there is an interaction. The R package factorial2x2 can be used to explore the power gain or loss for different scenarios.


Author(s):  
Dan Lin ◽  
Ziv Shkedy ◽  
Dani Yekutieli ◽  
Tomasz Burzykowski ◽  
Hinrich W.H. Göhlmann ◽  
...  

Dose-response studies are commonly used in experiments in pharmaceutical research in order to investigate the dependence of the response on dose, i.e., a trend of the response level toxicity with respect to dose. In this paper we focus on dose-response experiments within a microarray setting in which several microarrays are available for a sequence of increasing dose levels. A gene is called differentially expressed if there is a monotonic trend (with respect to dose) in the gene expression. We review several testing procedures which can be used in order to test equality among the gene expression means against ordered alternatives with respect to dose, namely Williams' (Williams 1971 and 1972), Marcus' (Marcus 1976), global likelihood ratio test (Bartholomew 1961, Barlow et al. 1972, and Robertson et al. 1988), and M (Hu et al. 2005) statistics. Additionally we introduce a modification to the standard error of the M statistic. We compare the performance of these five test statistics. Moreover, we discuss the issue of one-sided versus two-sided testing procedures. False Discovery Rate (Benjamni and Hochberg 1995, Ge et al. 2003), and resampling-based Familywise Error Rate (Westfall and Young 1993) are used to handle the multiple testing issue. The methods above are applied to a data set with 4 doses (3 arrays per dose) and 16,998 genes. Results on the number of significant genes from each statistic are discussed. A simulation study is conducted to investigate the power of each statistic. A R library IsoGene implementing the methods is available from the first author.


2019 ◽  
Vol 35 (22) ◽  
pp. 4764-4766 ◽  
Author(s):  
Jonathan Cairns ◽  
William R Orchard ◽  
Valeriya Malysheva ◽  
Mikhail Spivakov

Abstract Summary Capture Hi-C is a powerful approach for detecting chromosomal interactions involving, at least on one end, DNA regions of interest, such as gene promoters. We present Chicdiff, an R package for robust detection of differential interactions in Capture Hi-C data. Chicdiff enhances a state-of-the-art differential testing approach for count data with bespoke normalization and multiple testing procedures that account for specific statistical properties of Capture Hi-C. We validate Chicdiff on published Promoter Capture Hi-C data in human Monocytes and CD4+ T cells, identifying multitudes of cell type-specific interactions, and confirming the overall positive association between promoter interactions and gene expression. Availability and implementation Chicdiff is implemented as an R package that is publicly available at https://github.com/RegulatoryGenomicsGroup/chicdiff. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document