Evaluations of the Optimal Discovery Procedure for Multiple Testing

Abstract The Optimal Discovery Procedure (ODP) is a method for simultaneous hypothesis testing that attempts to gain power relative to more standard techniques by exploiting multivariate structure [1]. Specializing to the example of testing whether components of a Gaussian mean vector are zero, we compare the power of the ODP to a Bonferroni-style method and to the Benjamini-Hochberg method when the testing procedures aim to respectively control certain Type I error rate measures, such as the expected number of false positives or the false discovery rate. We show through theoretical results, numerical comparisons, and two microarray examples that when the rejection regions for the ODP test statistics are chosen such that the procedure is guaranteed to uniformly control a Type I error rate measure, the technique is generally less powerful than competing methods. We contrast and explain these results in light of previously proven optimality theory for the ODP. We also compare the ordering given by the ODP test statistics to the standard rankings based on sorting univariate p-values from smallest to largest. In the cases we considered the standard ordering was superior, and ODP rankings were adversely impacted by correlation.

Download Full-text

Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1040 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-69 ◽

Cited By ~ 53

Author(s):

Sandrine Dudoit ◽

Mark J. van der Laan ◽

Katherine S. Pollard

Keyword(s):

Error Rate ◽

Multiple Testing ◽

Type I Error ◽

Null Distribution ◽

Error Rates ◽

Single Step ◽

Type I ◽

Test Statistics ◽

Testing Procedures ◽

Multiple Testing Procedures

The present article proposes general single-step multiple testing procedures for controlling Type I error rates defined as arbitrary parameters of the distribution of the number of Type I errors, such as the generalized family-wise error rate. A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which single-step common-quantile and common-cut-off procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization of a null distribution, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. In the special case of family-wise error rate (FWER) control, our method yields the single-step minP and maxT procedures, based on minima of unadjusted p-values and maxima of test statistics, respectively, with the important distinction in the choice of null distribution. Single-step procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution. The special cases of t- and F-statistics are discussed in detail. The companion articles focus on step-down multiple testing procedures for control of the FWER (van der Laan et al., 2004b) and on augmentations of FWER-controlling methods to control error rates such as tail probabilities for the number of false positives and for the proportion of false positives among the rejected hypotheses (van der Laan et al., 2004a). The proposed bootstrap multiple testing procedures are evaluated by a simulation study and applied to genomic data in the fourth article of the series (Pollard et al., 2004).

Download Full-text

Adding new experimental arms to randomised clinical trials: Impact on error rates

Clinical Trials ◽

10.1177/1740774520904346 ◽

2020 ◽

Vol 17 (3) ◽

pp. 273-284 ◽

Cited By ~ 1

Author(s):

Babak Choodari-Oskooei ◽

Daniel J Bratton ◽

Melissa R Gannon ◽

Angela M Meade ◽

Matthew R Sydes ◽

...

Keyword(s):

Error Rate ◽

Type I Error ◽

Controlled Trial ◽

Late Phase ◽

Pairwise Comparison ◽

Error Rates ◽

Pairwise Comparisons ◽

Type I ◽

Test Statistics ◽

Type I Error Rate

Background: Experimental treatments pass through various stages of development. If a treatment passes through early-phase experiments, the investigators may want to assess it in a late-phase randomised controlled trial. An efficient way to do this is adding it as a new research arm to an ongoing trial while the existing research arms continue, a so-called multi-arm platform trial. The familywise type I error rate is often a key quantity of interest in any multi-arm platform trial. We set out to clarify how it should be calculated when new arms are added to a trial some time after it has started. Methods: We show how the familywise type I error rate, any-pair and all-pairs powers can be calculated when a new arm is added to a platform trial. We extend the Dunnett probability and derive analytical formulae for the correlation between the test statistics of the existing pairwise comparison and that of the newly added arm. We also verify our analytical derivation via simulations. Results: Our results indicate that the familywise type I error rate depends on the shared control arm information (i.e. individuals in continuous and binary outcomes and primary outcome events in time-to-event outcomes) from the common control arm patients and the allocation ratio. The familywise type I error rate is driven more by the number of pairwise comparisons and the corresponding (pairwise) type I error rates than by the timing of the addition of the new arms. The familywise type I error rate can be estimated using Šidák’s correction if the correlation between the test statistics of pairwise comparisons is less than 0.30. Conclusions: The findings we present in this article can be used to design trials with pre-planned deferred arms or to add new pairwise comparisons within an ongoing platform trial where control of the pairwise error rate or familywise type I error rate (for a subset of pairwise comparisons) is required.

Download Full-text

Comparison of the Performance of Nonparametric and Parametric MANOVA Test Statistics when Assumptions Are Violated

Methodology ◽

10.1027/1614-1881.1.1.27 ◽

2005 ◽

Vol 1 (1) ◽

pp. 27-38 ◽

Cited By ~ 58

Author(s):

Holmes Finch

Keyword(s):

Error Rate ◽

Type I Error ◽

Past Research ◽

Covariance Matrices ◽

Type I ◽

Test Statistics ◽

Social Scientists ◽

Nonparametric Approach ◽

Type I Error Rate ◽

Response Variables

Abstract. Multivariate analysis of variance (MANOVA) is a useful tool for social scientists because it allows for the comparison of response-variable means across multiple groups. MANOVA requires that the observations are independent, the response variables are multivariate normally distributed, and the covariance matrix of the response variables is homogeneous across groups. When the assumptions of normality and homogeneous covariance matrices are not met, past research has shown that the type I error rate of the standard MANOVA test statistics can be inflated while their power can be attenuated. The current study compares the performance of a nonparametric alternative to one of the standard parametric test statistics when these two assumptions are not met. Results show that when the assumption of homogeneous covariance matrices is not met, the nonparametric approach has a lower type I error rate and higher power than the most robust parametric statistic. When the assumption of normality is untenable, the parametric statistic is robust, and slightly outperforms the nonparametric statistic in terms of type I error rate and power.

Download Full-text

Increasing the sample size during clinical trials witht-distributed test statistics without inflating the type I error rate

Statistics in Medicine ◽

10.1002/sim.2725 ◽

2007 ◽

Vol 26 (12) ◽

pp. 2449-2464 ◽

Cited By ~ 17

Author(s):

Nina Timmesfeld ◽

Helmut Schäfer ◽

Hans-Helge Müller

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Error Rate ◽

Type I Error ◽

Type I ◽

Test Statistics ◽

Type I Error Rate ◽

Distributed Test

Download Full-text

Solutions for Determining the Significance Region Using the Johnson-Neyman Type Procedure in Generalized Linear (Mixed) Models

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998610396889 ◽

2011 ◽

Vol 36 (6) ◽

pp. 699-719 ◽

Cited By ~ 7

Author(s):

Ann A. Lazar ◽

Gary O. Zerbe

Keyword(s):

Error Rate ◽

Multiple Testing ◽

Mixed Model ◽

Linear Mixed Model ◽

Type I Error ◽

Hierarchical Linear Model ◽

Wald Test ◽

Analysis Of Covariance ◽

Type I ◽

Type I Error Rate

Researchers often compare the relationship between an outcome and covariate for two or more groups by evaluating whether the fitted regression curves differ significantly. When they do, researchers need to determine the “significance region,” or the values of the covariate where the curves significantly differ. In analysis of covariance (ANCOVA), the Johnson-Neyman procedure can be used to determine the significance region; for the hierarchical linear model (HLM), the Miyazaki and Maier (M-M) procedure has been suggested. However, neither procedure can assume nonnormally distributed data. Furthermore, the M-M procedure produces biased (downward) results because it uses the Wald test, does not control the inflated Type I error rate due to multiple testing, and requires implementing multiple software packages to determine the significance region. In this article, we address these limitations by proposing solutions for determining the significance region suitable for generalized linear (mixed) model (GLM or GLMM). These proposed solutions incorporate test statistics that resolve the biased results, control the Type I error rate using Scheffé’s method, and uses a single statistical software package to determine the significance region.

Download Full-text

Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1041 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-33 ◽

Cited By ~ 33

Author(s):

Mark J. van der Laan ◽

Sandrine Dudoit ◽

Katherine S. Pollard

Keyword(s):

Error Rate ◽

Multiple Testing ◽

Type I Error ◽

Null Distribution ◽

Type I ◽

Test Statistics ◽

P Values ◽

Family Wise Error Rate ◽

The Family ◽

Step Down

The present article proposes two step-down multiple testing procedures for asymptotic control of the family-wise error rate (FWER): the first procedure is based on maxima of test statistics (step-down maxT), while the second relies on minima of unadjusted p-values (step-down minP). A key feature of our approach is the characterization and construction of a test statistics null distribution (rather than data generating null distribution) for deriving cut-offs for these test statistics (i.e., rejection regions) and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which the step-down maxT and minP procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. Step-down procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution.

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

Controlling type I error rate for fast track drug development programmes

Statistics in Medicine ◽

10.1002/sim.1396 ◽

2003 ◽

Vol 22 (5) ◽

pp. 665-675 ◽

Cited By ~ 6

Author(s):

Weichung J. Shih ◽

Peter Ouyang ◽

Hui Quan ◽

Yong Lin ◽

Bart Michiels ◽

...

Keyword(s):

Drug Development ◽

Error Rate ◽

Fast Track ◽

Type I Error ◽

Type I ◽

Type I Error Rate

Download Full-text

Alternative models and randomization techniques for Bayesian response-adaptive randomization with binary outcomes

Clinical Trials ◽

10.1177/17407745211010139 ◽

2021 ◽

pp. 174077452110101

Author(s):

Jennifer Proper ◽

John Connett ◽

Thomas Murray

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Error Rate ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Binary Outcomes ◽

Type I ◽

Operating Characteristics ◽

Type I Error Rate

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.

Download Full-text

Anova Tests for Homogeneity of Variance: Nonnormality and Unequal Samples

Journal of Educational Statistics ◽

10.3102/10769986002003187 ◽

1977 ◽

Vol 2 (3) ◽

pp. 187-206 ◽

Cited By ~ 10

Author(s):

Charles G. Martin ◽

Paul A. Games

Keyword(s):

Error Rate ◽

Type I Error ◽

Type I ◽

Empirical Comparison ◽

Jackknife Test ◽

Type I Error Rate ◽

Power And Control ◽

Homogeneity Of Variance ◽

Test Use ◽

And Control

This paper presents an exposition and an empirical comparison of two potentially useful tests for homogeneity of variance. Control of Type I error rate, P(EI), and power are investigated for three forms of the Box test and for two forms of the jackknife test with equal and unequal n's under conditions of normality and nonnormality. The Box test is shown to be robust to violations of the assumption of normality. The jackknife test is shown not to be robust. When n's are unequal, the problem of heterogeneous within-cell variances of the transformed values and unequal n's affects the jackknife and Box tests. Previously reported suggestions for selecting subsample sizes for the Box test are shown to be inappropriate, producing an inflated P(EI). Two procedures which alleviate this problem are presented for the Box test. Use of the jack-knife test with a reduced alpha is shown to provide power and control of P(EI) at approximately the same level as the Box test. Recommendations for the use of these techniques and computational examples of each are provided.

Download Full-text