Is the ANOVA F-Test Robust to Variance Heterogeneity When Sample Sizes are Equal?: An Investigation via a Coefficient of Variation

1977 ◽  
Vol 14 (4) ◽  
pp. 493-498 ◽  
Author(s):  
Joanne C. Rogan ◽  
H. J. Keselman

Numerous investigations have examined the effects of variance heterogeneity on the empirical probability of a Type I error for the analysis of variance (ANOVA) F-test and the prevailing conclusion has been that when sample sizes are equal, the ANOVA is robust to variance heterogeneity. However, Box (1954) reported a Type I error rate of .12, for a 5% nominal level, when unequal variances were paired with equal sample sizes. The present paper explored this finding, examining varying degrees and patterns of variance heterogeneity for varying sample sizes and number of treatment groups. The data indicate that the rate of Type 1 error varies as a function of the degree of variance heterogeneity and, consequently, it should not be assumed that the ANOVA F-test is always robust to variance heterogeneity when sample sizes are equal.

1980 ◽  
Vol 5 (4) ◽  
pp. 337-349 ◽  
Author(s):  
Philip H. Ramsey

It is noted that disagreements have arisen in the literature about the robustness of the t test in normal populations with unequal variances. Hsu's procedure is applied to determine exact Type I error rates for t. Employing fairly liberal but objective standards for assessing robustness, it is shown that the t test is not always robust to the assumption of equal population variances even when sample sizes are equal. Several guidelines are suggested including the point that to apply t at α = .05 without regard for unequal variances would require equal sample sizes of at least 15 by one of the standards considered. In many cases, especially those with unequal N's, an alternative such as Welch's procedure is recommended.


1992 ◽  
Vol 17 (4) ◽  
pp. 315-339 ◽  
Author(s):  
Michael R. Harwell ◽  
Elaine N. Rubinstein ◽  
William S. Hayes ◽  
Corley C. Olds

Meta-analytic methods were used to integrate the findings of a sample of Monte Carlo studies of the robustness of the F test in the one- and two-factor fixed effects ANOVA models. Monte Carlo results for the Welch (1947) and Kruskal-Wallis (Kruskal & Wallis, 1952) tests were also analyzed. The meta-analytic results provided strong support for the robustness of the Type I error rate of the F test when certain assumptions were violated. The F test also showed excellent power properties. However, the Type I error rate of the F test was sensitive to unequal variances, even when sample sizes were equal. The error rate of the Welch test was insensitive to unequal variances when the population distribution was normal, but nonnormal distributions tended to inflate its error rate and to depress its power. Meta-analytic and exact statistical theory results were used to summarize the effects of assumption violations for the tests.


1994 ◽  
Vol 19 (3) ◽  
pp. 275-291 ◽  
Author(s):  
James Algina ◽  
T. C. Oshima ◽  
Wen-Ying Lin

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.


1995 ◽  
Vol 77 (1) ◽  
pp. 155-159 ◽  
Author(s):  
John E. Overall ◽  
Robert S. Atlas ◽  
Janet M. Gibson

Welch (1947) proposed an adjusted t test that can be used to correct the serious bias in Type I error protection that is otherwise present when both sample sizes and variances are unequal. The implications of the Welch adjustment for power of tests for the difference between two treatments across k levels of a concomitant factor are evaluated in this article for k × 2 designs with unequal sample sizes and unequal variances. Analyses confirm that, although Type I error is uniformly controlled, power of the Welch test of significance for the main effect of treatments remains rather seriously dependent on direction of the correlation between unequal variances and unequal sample sizes. Nevertheless, considering the fact that analysis of variance is not an acceptable option in such cases, the Welch t test appears to have an important role to play in the analysis of experimental data.


2018 ◽  
Vol 34 (4) ◽  
pp. 258-261 ◽  
Author(s):  
Rand Wilcox ◽  
Travis J. Peterson ◽  
Jill L. McNitt-Gray

The paper reviews advances and insights relevant to comparing groups when the sample sizes are small. There are conditions under which conventional, routinely used techniques are satisfactory. But major insights regarding outliers, skewed distributions, and unequal variances (heteroscedasticity) make it clear that under general conditions they provide poor control over the type I error probability and can have relatively poor power. In practical terms, important differences among groups can be missed and poorly characterized. Many new and improved methods have been derived that are aimed at dealing with the shortcomings of classic methods. To provide a conceptual basis for understanding the practical importance of modern methods, the paper reviews some modern insights related to why methods based on means can perform poorly. Then some strategies for dealing with nonnormal distributions and unequal variances are described. For brevity, the focus is on comparing 2 independent groups or 2 dependent groups based on the usual difference scores. The paper concludes with comments on issues to consider when choosing from among the methods reviewed in the paper.


1984 ◽  
Vol 9 (3) ◽  
pp. 227-236 ◽  
Author(s):  
Rand R. Wilcox

A problem of considerable practical importance when applying multiple comparison procedures is that unequal variances can seriously affect power and the probability of a Type I error. A related problem is getting a precise indication of how many observations are required so that the length of the confidence intervals will be reasonably short. Two-stage procedures have been proposed that give an exact solution to these problems, the first stage being a pilot study for the purpose of obtaining sample estimates of the variances. However, the critical values of these procedures are available only when there are equal sample sizes in the first stage. This paper suggests a method of evaluating the experimentwise Type I error probability when the first stage has unequal sample sizes.


2021 ◽  
Author(s):  
Josue E. Rodriguez ◽  
Donald Ray Williams ◽  
Paul - Christian Bürkner

Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be drastic ramifications to upholding this assumption. We propose that researchers should instead assume unequal between-study variances by default. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variances. In two extensive simulation studies, we show that in terms of Type I error and statistical power, nearly nothing is lost by using the MELSM for moderator tests, but there can be serious costs when a mixed-effects model with equal variances is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the mixed-effects model and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the mixed-effects model can be grossly inflated or overly conservative, whereas the MELSM excellently controlled the Type I error across all scenarios. With respect to power, the MELSM had comparable or higher power than the mixed-effects model in all conditions where the latter produced valid (i.e., not inflated) Type 1 error rates. Altogether, our results strongly support that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators


1979 ◽  
Vol 4 (1) ◽  
pp. 14-23 ◽  
Author(s):  
Juliet Popper Shaffer

If used only when a preliminary F test yields significance, the usual multiple range procedures can be modified to increase the probability of detecting differences without changing the control of Type I error. The modification consists of a reduction in the critical value when comparing the largest and smallest means. Equivalence of modified and unmodified procedures in error control is demonstrated. The modified procedure is also compared with the alternative of using the unmodified range test without a preliminary F test, and it is shown that each has advantages over the other under some circumstances.


1982 ◽  
Vol 7 (3) ◽  
pp. 207-214 ◽  
Author(s):  
Jennifer J. Clinch ◽  
H. J. Keselman

The ANOVA, Welch, and Brown and Forsyth tests for mean equality were compared using Monte Carlo methods. The tests’ rates of Type I error and power were examined when populations were non-normal, variances were heterogeneous, and group sizes were unequal. The ANOVA F test was most affected by the assumption violations. The test proposed by Brown and Forsyth appeared, on the average, to be the “best” test statistic for testing an omnibus hypothesis of mean equality.


Sign in / Sign up

Export Citation Format

Share Document