Take Two Orthogonals and Call Me in the Morning

2003 ◽  
Vol 24 (7) ◽  
pp. 544-547 ◽  
Author(s):  
David Birnbaum

AbstractAnalysis of variance (ANOVA) is used to prevent inflated type I error when hypothesis testing involves comparing more than two groups. If an ANOVA result indicates a statistically significant difference exists somewhere within, the next task is to discover exactly which combination or combinations of those groups account for the significant difference. Among many methods available for that exploration, orthogonal contrasts and relatively simple graphs are noteworthy (Infect Control Hosp Epidemiol 2003;24:544-547).

2019 ◽  
Author(s):  
Axel Mayer ◽  
Felix Thoemmes

The analysis of variance (ANOVA) is still one of the most widely used statistical methods in the social sciences. This paper is about stochastic group weights in ANOVA models – a neglected aspect in the literature. Stochastic group weights are present whenever the experimenter does not determine the exact group sizes before conducting the experiment. We show that classic ANOVA tests based on estimated marginal means can have an inflated type I error rate when stochastic group weights are not taken into account, even in randomized experiments. We propose two new ways to incorporate stochastic group weights in the tests of average effects - one based on the general linear model and one based on multigroup structural equation models (SEMs). We show in simulation studies that our methods have nominal type I error rates in experiments with stochastic group weights while classic approaches show an inflated type I error rate. The SEM approach can additionally deal with heteroscedastic residual variances and latent variables. An easy-to-use software package with graphical user interface is provided.


2021 ◽  
pp. 096228022110082
Author(s):  
Yang Li ◽  
Wei Ma ◽  
Yichen Qin ◽  
Feifang Hu

Concerns have been expressed over the validity of statistical inference under covariate-adaptive randomization despite the extensive use in clinical trials. In the literature, the inferential properties under covariate-adaptive randomization have been mainly studied for continuous responses; in particular, it is well known that the usual two-sample t-test for treatment effect is typically conservative. This phenomenon of invalid tests has also been found for generalized linear models without adjusting for the covariates and are sometimes more worrisome due to inflated Type I error. The purpose of this study is to examine the unadjusted test for treatment effect under generalized linear models and covariate-adaptive randomization. For a large class of covariate-adaptive randomization methods, we obtain the asymptotic distribution of the test statistic under the null hypothesis and derive the conditions under which the test is conservative, valid, or anti-conservative. Several commonly used generalized linear models, such as logistic regression and Poisson regression, are discussed in detail. An adjustment method is also proposed to achieve a valid size based on the asymptotic results. Numerical studies confirm the theoretical findings and demonstrate the effectiveness of the proposed adjustment method.


2020 ◽  
Author(s):  
Jeff Miller

Contrary to the warning of Miller (1988), Rousselet and Wilcox (2020) argued that it is better to summarize each participant’s single-trial reaction times (RTs) in a given condition with the median than with the mean when comparing the central tendencies of RT distributions across experimental conditions. They acknowledged that median RTs can produce inflated Type I error rates when conditions differ in the number of trials tested, consistent with Miller’s warning, but they showed that the bias responsible for this error rate inflation could be eliminated with a bootstrap bias correction technique. The present simulations extend their analysis by examining the power of bias-corrected medians to detect true experimental effects and by comparing this power with the power of analyses using means and regular medians. Unfortunately, although bias-corrected medians solve the problem of inflated Type I error rates, their power is lower than that of means or regular medians in many realistic situations. In addition, even when conditions do not differ in the number of trials tested, the power of tests (e.g., t-tests) is generally lower using medians rather than means as the summary measures. Thus, the present simulations demonstrate that summary means will often provide the most powerful test for differences between conditions, and they show what aspects of the RT distributions determine the size of the power advantage for means.


2017 ◽  
Vol 21 (4) ◽  
pp. 321-329 ◽  
Author(s):  
Mark Rubin

Gelman and Loken (2013 , 2014 ) proposed that when researchers base their statistical analyses on the idiosyncratic characteristics of a specific sample (e.g., a nonlinear transformation of a variable because it is skewed), they open up alternative analysis paths in potential replications of their study that are based on different samples (i.e., no transformation of the variable because it is not skewed). These alternative analysis paths count as additional (multiple) tests and, consequently, they increase the probability of making a Type I error during hypothesis testing. The present article considers this forking paths problem and evaluates four potential solutions that might be used in psychology and other fields: (a) adjusting the prespecified alpha level, (b) preregistration, (c) sensitivity analyses, and (d) abandoning the Neyman-Pearson approach. It is concluded that although preregistration and sensitivity analyses are effective solutions to p-hacking, they are ineffective against result-neutral forking paths, such as those caused by transforming data. Conversely, although adjusting the alpha level cannot address p-hacking, it can be effective for result-neutral forking paths. Finally, abandoning the Neyman-Pearson approach represents a further solution to the forking paths problem.


2011 ◽  
Vol 55 (1) ◽  
pp. 366-374 ◽  
Author(s):  
Robin L. Young ◽  
Janice Weinberg ◽  
Verónica Vieira ◽  
Al Ozonoff ◽  
Thomas F. Webster

1998 ◽  
Vol 55 (9) ◽  
pp. 2127-2140 ◽  
Author(s):  
Brian J Pyper ◽  
Randall M Peterman

Autocorrelation in fish recruitment and environmental data can complicate statistical inference in correlation analyses. To address this problem, researchers often either adjust hypothesis testing procedures (e.g., adjust degrees of freedom) to account for autocorrelation or remove the autocorrelation using prewhitening or first-differencing before analysis. However, the effectiveness of methods that adjust hypothesis testing procedures has not yet been fully explored quantitatively. We therefore compared several adjustment methods via Monte Carlo simulation and found that a modified version of these methods kept Type I error rates near . In contrast, methods that remove autocorrelation control Type I error rates well but may in some circumstances increase Type II error rates (probability of failing to detect some environmental effect) and hence reduce statistical power, in comparison with adjusting the test procedure. Specifically, our Monte Carlo simulations show that prewhitening and especially first-differencing decrease power in the common situations where low-frequency (slowly changing) processes are important sources of covariation in fish recruitment or in environmental variables. Conversely, removing autocorrelation can increase power when low-frequency processes account for only some of the covariation. We therefore recommend that researchers carefully consider the importance of different time scales of variability when analyzing autocorrelated data.


1982 ◽  
Vol 7 (3) ◽  
pp. 207-214 ◽  
Author(s):  
Jennifer J. Clinch ◽  
H. J. Keselman

The ANOVA, Welch, and Brown and Forsyth tests for mean equality were compared using Monte Carlo methods. The tests’ rates of Type I error and power were examined when populations were non-normal, variances were heterogeneous, and group sizes were unequal. The ANOVA F test was most affected by the assumption violations. The test proposed by Brown and Forsyth appeared, on the average, to be the “best” test statistic for testing an omnibus hypothesis of mean equality.


1997 ◽  
Vol 85 (1) ◽  
pp. 193-194
Author(s):  
Peter Hassmén

Violation of the sphericity assumption in repeated-measures analysis of variance can lead to positively biased tests, i.e., the likelihood of a Type I error exceeds the alpha level set by the user. Two widely applicable solutions exist, the use of an epsilon-corrected univariate analysis of variance or the use of a multivariate analysis of variance. It is argued that the latter method offers advantages over the former.


Sign in / Sign up

Export Citation Format

Share Document