Multiplicity Issues and Methods for Controlling the Type I Error Rate

2021 ◽  
pp. 263-281
Author(s):  
Weichung Joe Shih ◽  
Joseph Aisner
2014 ◽  
Vol 53 (05) ◽  
pp. 343-343

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.


2003 ◽  
Vol 22 (5) ◽  
pp. 665-675 ◽  
Author(s):  
Weichung J. Shih ◽  
Peter Ouyang ◽  
Hui Quan ◽  
Yong Lin ◽  
Bart Michiels ◽  
...  

2021 ◽  
pp. 174077452110101
Author(s):  
Jennifer Proper ◽  
John Connett ◽  
Thomas Murray

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.


1977 ◽  
Vol 2 (3) ◽  
pp. 187-206 ◽  
Author(s):  
Charles G. Martin ◽  
Paul A. Games

This paper presents an exposition and an empirical comparison of two potentially useful tests for homogeneity of variance. Control of Type I error rate, P(EI), and power are investigated for three forms of the Box test and for two forms of the jackknife test with equal and unequal n's under conditions of normality and nonnormality. The Box test is shown to be robust to violations of the assumption of normality. The jackknife test is shown not to be robust. When n's are unequal, the problem of heterogeneous within-cell variances of the transformed values and unequal n's affects the jackknife and Box tests. Previously reported suggestions for selecting subsample sizes for the Box test are shown to be inappropriate, producing an inflated P(EI). Two procedures which alleviate this problem are presented for the Box test. Use of the jack-knife test with a reduced alpha is shown to provide power and control of P(EI) at approximately the same level as the Box test. Recommendations for the use of these techniques and computational examples of each are provided.


2018 ◽  
Vol 28 (8) ◽  
pp. 2385-2403 ◽  
Author(s):  
Tobias Mütze ◽  
Ekkehard Glimm ◽  
Heinz Schmidli ◽  
Tim Friede

Robust semiparametric models for recurrent events have received increasing attention in the analysis of clinical trials in a variety of diseases including chronic heart failure. In comparison to parametric recurrent event models, robust semiparametric models are more flexible in that neither the baseline event rate nor the process inducing between-patient heterogeneity needs to be specified in terms of a specific parametric statistical model. However, implementing group sequential designs in the robust semiparametric model is complicated by the fact that the sequence of Wald statistics does not follow asymptotically the canonical joint distribution. In this manuscript, we propose two types of group sequential procedures for a robust semiparametric analysis of recurrent events. The first group sequential procedure is based on the asymptotic covariance of the sequence of Wald statistics and it guarantees asymptotic control of the type I error rate. The second procedure is based on the canonical joint distribution and does not guarantee asymptotic type I error rate control but is easy to implement and corresponds to the well-known standard approach for group sequential designs. Moreover, we describe how to determine the maximum information when planning a clinical trial with a group sequential design and a robust semiparametric analysis of recurrent events. We contrast the operating characteristics of the proposed group sequential procedures in a simulation study motivated by the ongoing phase 3 PARAGON-HF trial (ClinicalTrials.gov identifier: NCT01920711) in more than 4600 patients with chronic heart failure and a preserved ejection fraction. We found that both group sequential procedures have similar operating characteristics and that for some practically relevant scenarios, the group sequential procedure based on the canonical joint distribution has advantages with respect to the control of the type I error rate. The proposed method for calculating the maximum information results in appropriately powered trials for both procedures.


2016 ◽  
Vol 54 (1) ◽  
pp. 100-113 ◽  
Author(s):  
Joseph L. Brooks ◽  
Alexia Zoumpoulaki ◽  
Howard Bowman

1992 ◽  
Vol 17 (4) ◽  
pp. 315-339 ◽  
Author(s):  
Michael R. Harwell ◽  
Elaine N. Rubinstein ◽  
William S. Hayes ◽  
Corley C. Olds

Meta-analytic methods were used to integrate the findings of a sample of Monte Carlo studies of the robustness of the F test in the one- and two-factor fixed effects ANOVA models. Monte Carlo results for the Welch (1947) and Kruskal-Wallis (Kruskal & Wallis, 1952) tests were also analyzed. The meta-analytic results provided strong support for the robustness of the Type I error rate of the F test when certain assumptions were violated. The F test also showed excellent power properties. However, the Type I error rate of the F test was sensitive to unequal variances, even when sample sizes were equal. The error rate of the Welch test was insensitive to unequal variances when the population distribution was normal, but nonnormal distributions tended to inflate its error rate and to depress its power. Meta-analytic and exact statistical theory results were used to summarize the effects of assumption violations for the tests.


Trials ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jiyu Kim ◽  
Andrea B. Troxel ◽  
Scott D. Halpern ◽  
Kevin G. Volpp ◽  
Brennan C. Kahan ◽  
...  

Abstract Introduction In a five-arm randomized clinical trial (RCT) with stratified randomization across 54 sites, we encountered low primary outcome event proportions, resulting in multiple sites with zero events either overall or in one or more study arms. In this paper, we systematically evaluated different statistical methods of accounting for center in settings with low outcome event proportions. Methods We conducted a simulation study and a reanalysis of a completed RCT to compare five popular methods of estimating an odds ratio for multicenter trials with stratified randomization by center: (i) no center adjustment, (ii) random intercept model, (iii) Mantel–Haenszel model, (iv) generalized estimating equation (GEE) with an exchangeable correlation structure, and (v) GEE with small sample correction (GEE-small sample correction). We varied the number of total participants (200, 500, 1000, 5000), number of centers (5, 50, 100), control group outcome percentage (2%, 5%, 10%), true odds ratio (1, > 1), intra-class correlation coefficient (ICC) (0.025, 0.075), and distribution of participants across the centers (balanced, skewed). Results Mantel–Haenszel methods generally performed poorly in terms of power and bias and led to the exclusion of participants from the analysis because some centers had no events. Failure to account for center in the analysis generally led to lower power and type I error rates than other methods, particularly with ICC = 0.075. GEE had an inflated type I error rate except in some settings with a large number of centers. GEE-small sample correction maintained the type I error rate at the nominal level but suffered from reduced power and convergence issues in some settings when the number of centers was small. Random intercept models generally performed well in most scenarios, except with a low event rate (i.e., 2% scenario) and small total sample size (n ≤ 500), when all methods had issues. Discussion Random intercept models generally performed best across most scenarios. GEE-small sample correction performed well when the number of centers was large. We do not recommend the use of Mantel–Haenszel, GEE, or models that do not account for center. When the expected event rate is low, we suggest that the statistical analysis plan specify an alternative method in the case of non-convergence of the primary method.


Sign in / Sign up

Export Citation Format

Share Document