Sample Size Calculation for Simulation-Based Multiple-Testing Procedures

2005 ◽  
Vol 15 (6) ◽  
pp. 957-967 ◽  
Author(s):  
Heejung Bang ◽  
Sin-Ho Jung ◽  
Stephen L. George
2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Dongmei Li ◽  
Timothy D. Dye

Resampling-based multiple testing procedures are widely used in genomic studies to identify differentially expressed genes and to conduct genome-wide association studies. However, the power and stability properties of these popular resampling-based multiple testing procedures have not been extensively evaluated. Our study focuses on investigating the power and stability of seven resampling-based multiple testing procedures frequently used in high-throughput data analysis for small sample size data through simulations and gene oncology examples. The bootstrap single-step minPprocedure and the bootstrap step-down minPprocedure perform the best among all tested procedures, when sample size is as small as 3 in each group and either familywise error rate or false discovery rate control is desired. When sample size increases to 12 and false discovery rate control is desired, the permutation maxTprocedure and the permutation minPprocedure perform best. Our results provide guidance for high-throughput data analysis when sample size is small.


2011 ◽  
Vol 55 (1) ◽  
pp. 110-122 ◽  
Author(s):  
Jie Chen ◽  
Jianfeng Luo ◽  
Kenneth Liu ◽  
Devan V. Mehrotra

Author(s):  
Amelie Elsäßer ◽  
Anja Victor ◽  
Gerhard Hommel

In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account for the multiplicity of the test situation, different types of multiple testing procedures have been proposed. The question arises whether procedures taking into account the discreteness of the situation show a benefit especially in case of correlated data. We empirically evaluate several different multiple testing procedures via simulation studies using simulated correlated SNP data. We analyze FDR and FWER controlling procedures, special procedures for discrete situations, and the minP-resampling-based procedure. Within the simulation study, we examine a broad range of different gene data scenarios. We show that the main difference in the varying performance of the procedures is due to sample size. In small sample size scenarios, the minP-resampling procedure though controlling the stricter FWER even had more power than the classical FDR controlling procedures. In contrast, FDR controlling procedures led to more rejections in higher sample size scenarios.


Author(s):  
Damian Clarke ◽  
Joseph P. Romano ◽  
Michael Wolf

When considering multiple-hypothesis tests simultaneously, standard statistical techniques will lead to overrejection of null hypotheses unless the multiplicity of the testing framework is explicitly considered. In this article, we discuss the Romano–Wolf multiple-hypothesis correction and document its implementation in Stata. The Romano–Wolf correction (asymptotically) controls the familywise error rate, that is, the probability of rejecting at least one true null hypothesis among a family of hypotheses under test. This correction is considerably more powerful than earlier multiple-testing procedures, such as the Bonferroni and Holm corrections, given that it takes into account the dependence structure of the test statistics by resampling from the original data. We describe a command, rwolf, that implements this correction and provide several examples based on a wide range of models. We document and discuss the performance gains from using rwolf over other multiple-testing procedures that control the familywise error rate.


2021 ◽  
Vol 18 (5) ◽  
pp. 521-528
Author(s):  
Eric S Leifer ◽  
James F Troendle ◽  
Alexis Kolecki ◽  
Dean A Follmann

Background/aims: The two-by-two factorial design randomizes participants to receive treatment A alone, treatment B alone, both treatments A and B( AB), or neither treatment ( C). When the combined effect of A and B is less than the sum of the A and B effects, called a subadditive interaction, there can be low power to detect the A effect using an overall test, that is, factorial analysis, which compares the A and AB groups to the C and B groups. Such an interaction may have occurred in the Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD BP) which simultaneously randomized participants to receive intensive or standard blood pressure, control and intensive or standard glycemic control. For the primary outcome of major cardiovascular event, the overall test for efficacy of intensive blood pressure control was nonsignificant. In such an instance, simple effect tests of A versus C and B versus C may be useful since they are not affected by a subadditive interaction, but they can have lower power since they use half the participants of the overall trial. We investigate multiple testing procedures which exploit the overall tests’ sample size advantage and the simple tests’ robustness to a potential interaction. Methods: In the time-to-event setting, we use the stratified and ordinary logrank statistics’ asymptotic means to calculate the power of the overall and simple tests under various scenarios. We consider the A and B research questions to be unrelated and allocate 0.05 significance level to each. For each question, we investigate three multiple testing procedures which allocate the type 1 error in different proportions for the overall and simple effects as well as the AB effect. The Equal Allocation 3 procedure allocates equal type 1 error to each of the three effects, the Proportional Allocation 2 procedure allocates 2/3 of the type 1 error to the overall A (respectively, B) effect and the remaining type 1 error to the AB effect, and the Equal Allocation 2 procedure allocates equal amounts to the simple A (respectively, B) and AB effects. These procedures are applied to ACCORD BP. Results: Across various scenarios, Equal Allocation 3 had robust power for detecting a true effect. For ACCORD BP, all three procedures would have detected a benefit of intensive glycemia control. Conclusions: When there is no interaction, Equal Allocation 3 has less power than a factorial analysis. However, Equal Allocation 3 often has greater power when there is an interaction. The R package factorial2x2 can be used to explore the power gain or loss for different scenarios.


Sign in / Sign up

Export Citation Format

Share Document