A General Experimentwise Error Rate for Multiple Significance Tests

1976 ◽  
Vol 43 (3_suppl) ◽  
pp. 1263-1277 ◽  
Author(s):  
Stanley J. Rule

No current method of controlling error rate is appropriate for all experiments. When the error rate is set at traditional levels a per comparison error rate can yield too high a proportion of Type I errors, while an experimentwise error rate can be too conservative because the purpose of the experiment is not taken into account. A definition of error rate is proposed in which the number of significant outcomes needed to answer the question of interest is considered and a distinction is made between tests of fundamental importance and those of only subsidiary interest. The definition provides a systematic method of unequally allotting the error rate such that more power is provided for tests of crucial interest and for experiments in which several significant results are required.

1993 ◽  
Vol 76 (2) ◽  
pp. 407-412 ◽  
Author(s):  
Donald W. Zimmerman

This study investigated violations of random sampling and random assignment in data analyzed by nonparametric significance tests. A computer program induced correlations within groups, as well as between groups, and performed one-sample and two-sample versions of the Mann-Whitney-Wilcoxon test on the resulting scores. Nonindependence of observations within groups spuriously inflated the probability of Type I errors and depressed the probability of Type II errors, and nonindependence between groups had the reverse effect. This outcome, which parallels the influence of nonindependence on parametric tests, can be explained by the equivalence of the Mann-Whitney-Wilcoxon test and the Student t test performed on ranks replacing the initial scores.


2011 ◽  
pp. 87-102 ◽  
Author(s):  
S. Avdasheva

The article is devoted to antitrust policy towards tacit collusion as a form of coordination that restricts competition. Competing approaches to define tacit collusion, i.e. concerted practice and excessive monopoly price as an abuse of dominance, are compared. The evidence that allows to reject the hypothesis on concerted practice as a form of tacit collusion is discussed and compared with the criteria used by Russian antitrust authorities to consider practice as concerted. The standards of proof adopted leave the possibility for type I errors when actions of sellers which had no intention to restrict competition and/or coordinate the prices are qualified as illegal. Moreover, there is a possibility to qualify as illegal the actions that do not comply with the definition of concerted practice in the law "On protection of competition".


2020 ◽  
Vol 43 (3) ◽  
pp. 605-616 ◽  
Author(s):  
Marc J. Lanovaz ◽  
Stéphanie Turgeon

Abstract Design quality guidelines typically recommend that multiple baseline designs include at least three demonstrations of effects. Despite its widespread adoption, this recommendation does not appear grounded in empirical evidence. The main purpose of our study was to address this issue by assessing Type I error rate and power in multiple baseline designs. First, we generated 10,000 multiple baseline graphs, applied the dual-criteria method to each tier, and computed Type I error rate and power for different number of tiers showing a clear change. Second, two raters categorized the tiers for 300 multiple baseline graphs to replicate our analyses using visual inspection. When multiple baseline designs had at least three tiers and two or more of these tiers showed a clear change, the Type I error rate remained adequate (< .05) while power also reached acceptable levels (> .80). In contrast, requiring all tiers to show a clear change resulted in overly stringent conclusions (i.e., unacceptably low power). Therefore, our results suggest that researchers and practitioners should carefully consider limitations in power when requiring all tiers of a multiple baseline design to show a clear change in their analyses.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-25 ◽  
Author(s):  
Mark J. van der Laan ◽  
Sandrine Dudoit ◽  
Katherine S. Pollard

This article shows that any single-step or stepwise multiple testing procedure (asymptotically) controlling the family-wise error rate (FWER) can be augmented into procedures that (asymptotically) control tail probabilities for the number of false positives and the proportion of false positives among the rejected hypotheses. Specifically, given any procedure that (asymptotically) controls the FWER at level alpha, we propose simple augmentation procedures that provide (asymptotic) level-alpha control of: (i) the generalized family-wise error rate, i.e., the tail probability, gFWER(k), that the number of Type I errors exceeds a user-supplied integer k, and (ii) the tail probability, TPPFP(q), that the proportion of Type I errors among the rejected hypotheses exceeds a user-supplied value 0


1988 ◽  
Vol 13 (2) ◽  
pp. 173-182 ◽  
Author(s):  
Philip H. Ramsey ◽  
Patricia P. Ramsey

The normal approximation to the binomial test with and without a continuity correction is evaluated in terms of control of Type I errors and power. The normal approximations are evaluated as robust for a given sample size, N, and at a given level α if the true Type I error rate never exceeds 1.5 α. The uncorrected normal test is found to be less robust than is implied by the currently applied guidelines. The most stringent currently used guideline of requiring σ2≥10 is adequate at α = .05 but must be increased to σ2 ≥35 at α = .01. The corrected test is shown to be robust but not conservative. Both tests are shown to have substantial power loss in comparison to the exact binomial test.


2020 ◽  
Author(s):  
Marc Lanovaz ◽  
Stephanie Turgeon

Design quality guidelines typically recommend that multiple baseline designs include at least three demonstrations of effects. Despite its widespread adoption, this recommendation does not appear grounded in empirical evidence. The main purpose of our study was to address this issue by assessing Type I error rate and power in multiple baseline designs. First, we generated 10,000 multiple baseline graphs, applied the dual-criteria method to each tier, and computed Type I error rate and power for different number of tiers showing a clear change. Second, two raters categorized the tiers for 300 multiple baseline graphs to replicate our analyses using visual inspection. When multiple baseline designs had at least three tiers and two or more of these tiers showed a clear change, the Type I error rate remained adequate (&lt; .05) while power also reached acceptable levels (&gt; .80). In contrast, requiring all tiers to show a clear change resulted in overly stringent conclusions (i.e., unacceptably low power). Therefore, researchers and practitioners should carefully consider limitations in power when requiring all tiers of a multiple baseline design to show a clear change in their analyses.


1976 ◽  
Vol 1 (2) ◽  
pp. 113-125 ◽  
Author(s):  
Paul A. Games ◽  
John F. Howell

Three different methods for testing all pairs of yȳk, - yȳk’ were contrasted under varying sample size (n) and variance conditions. With unequal n’s of six and up, only the Behrens-Fisher statistic provided satisfactory control of both the familywise rate of Type I errors and Type I error rate on each contrast. Satisfactory control with unequal n’s of three and up is dubious even with this statistic.


Methodology ◽  
2015 ◽  
Vol 11 (3) ◽  
pp. 110-115 ◽  
Author(s):  
Rand R. Wilcox ◽  
Jinxia Ma

Abstract. The paper compares methods that allow both within group and between group heteroscedasticity when performing all pairwise comparisons of the least squares lines associated with J independent groups. The methods are based on simple extension of results derived by Johansen (1980) and Welch (1938) in conjunction with the HC3 and HC4 estimators. The probability of one or more Type I errors is controlled using the improvement on the Bonferroni method derived by Hochberg (1988) . Results are illustrated using data from the Well Elderly 2 study, which motivated this paper.


2014 ◽  
Vol 53 (05) ◽  
pp. 343-343

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.


Sign in / Sign up

Export Citation Format

Share Document