Evaluating the Efficacy of Conditional Analysis of Variance under Heterogeneity and Non-Normality

A simulation study was conducted to examine the efficacy of conditional analysis of variance (ANOVA) methods where the initial homogeneity of variance screening leads to the choice between the ANOVA F test and robust ANOVA methods. Type I error control and statistical power were investigated under various conditions.

Download Full-text

Empirical Comparison of Tests for One-Factor ANOVA Under Heterogeneity and Non-Normality: A Monte Carlo Study

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1604190000 ◽

2020 ◽

Vol 18 (2) ◽

pp. 2-30

Author(s):

Diep Nguyen ◽

Eunsook Kim ◽

Yan Wang ◽

Thanh Vinh Pham ◽

Yi-Hsin Chen ◽

...

Keyword(s):

Statistical Power ◽

Error Control ◽

Type I Error ◽

Monte Carlo Study ◽

Type I ◽

Omnibus Test ◽

Homogeneity Of Variance ◽

Mean Equality ◽

Comparison Of Tests ◽

Structured Means

Although the Analysis of Variance (ANOVA) F test is one of the most popular statistical tools to compare group means, it is sensitive to violations of the homogeneity of variance (HOV) assumption. This simulation study examines the performance of thirteen tests in one-factor ANOVA models in terms of their Type I error rate and statistical power under numerous (82,080) conditions. The results show that when HOV was satisfied, the ANOVA F or the Brown-Forsythe test outperformed the other methods in terms of both Type I error control and statistical power even under non-normality. When HOV was violated, the Structured Means Modeling (SMM) with Bartlett or SMM with Maximum Likelihood was strongly recommended for the omnibus test of group mean equality.

Download Full-text

Comparison of Means: An F Test Followed by a Modified Multiple Range Procedure

Journal of Educational Statistics ◽

10.3102/10769986004001014 ◽

1979 ◽

Vol 4 (1) ◽

pp. 14-23 ◽

Cited By ~ 9

Author(s):

Juliet Popper Shaffer

Keyword(s):

Error Control ◽

Type I Error ◽

Critical Value ◽

The Other ◽

Type I ◽

F Test ◽

Range Test

If used only when a preliminary F test yields significance, the usual multiple range procedures can be modified to increase the probability of detecting differences without changing the control of Type I error. The modification consists of a reduction in the critical value when comparing the largest and smallest means. Equivalence of modified and unmodified procedures in error control is demonstrated. The modified procedure is also compared with the alternative of using the unmodified range test without a preliminary F test, and it is shown that each has advantages over the other under some circumstances.

Download Full-text

Parametric Alternatives to the Analysis of Variance

Journal of Educational Statistics ◽

10.3102/10769986007003207 ◽

1982 ◽

Vol 7 (3) ◽

pp. 207-214 ◽

Cited By ~ 22

Author(s):

Jennifer J. Clinch ◽

H. J. Keselman

Keyword(s):

Monte Carlo ◽

Analysis Of Variance ◽

Monte Carlo Methods ◽

Type I Error ◽

Type I ◽

Test Statistic ◽

F Test ◽

Assumption Violations ◽

Mean Equality

The ANOVA, Welch, and Brown and Forsyth tests for mean equality were compared using Monte Carlo methods. The tests’ rates of Type I error and power were examined when populations were non-normal, variances were heterogeneous, and group sizes were unequal. The ANOVA F test was most affected by the assumption violations. The test proposed by Brown and Forsyth appeared, on the average, to be the “best” test statistic for testing an omnibus hypothesis of mean equality.

Download Full-text

Are Reaction Time Transformations Really Beneficial?

10.31234/osf.io/9ksa6 ◽

2019 ◽

Cited By ~ 6

Author(s):

Pele Schramm ◽

Jeffrey Rouder

Keyword(s):

Statistical Power ◽

Error Control ◽

Type I Error ◽

Response Times ◽

Error Rates ◽

Type I ◽

Drift Diffusion Model ◽

Reciprocal Transformation ◽

The Common ◽

Time Transformations

We investigate whether or not the common practice of transforming response times prior to conventional analyses of central tendency yields any notable benefits. We generate data from a realistic single-bound drift diffusion model with parameters informed by several different typical experiments in cognition. We then examine the effects of log and reciprocal transformation on expected effect size, statistical power, and Type I error rates for conventional two-sample t-tests. One of the key elements of our setup is that RTs have a lower bound, called the shift, which is well above 0. We closely examine the effect that different shifts have for the analyses. We conclude that logarithm and reciprocal transformation offer no gain in power or Type I error control. In some typical cases, reciprocal transformations are detrimental as they lead to a lowering of power.

Download Full-text

Detecting Unit of Analysis Problems in Nested Designs: Statistical Power and Type I Error Rates of the F Test for Groups-within-Treatments Effects

Educational and Psychological Measurement ◽

10.1177/0013164496056002003 ◽

1996 ◽

Vol 56 (2) ◽

pp. 215-231 ◽

Cited By ~ 9

Author(s):

Jeffrey D. Kromrey ◽

Wendy B. Dickinson

Keyword(s):

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

F Test ◽

Unit Of Analysis ◽

Type I Error Rates ◽

Nested Designs

Download Full-text

Taking Parametric Assumptions Seriously Arguments for the Use of Welch’s F-test instead of the Classical F-test in One-way ANOVA (in press for the International Review of Social Psychology)

10.31234/osf.io/wnezg ◽

2018 ◽

Cited By ~ 1

Author(s):

Marie Delacre ◽

Daniel Lakens ◽

Youri Mora ◽

Christophe Leys

Keyword(s):

Statistical Power ◽

Type I Error ◽

Type I ◽

International Review ◽

F Test ◽

Statistical Inferences ◽

Student’S T ◽

One Way Anova ◽

Practical Recommendations ◽

Student’S T Test

Student's t-test and classical F-test ANOVA rely on the assumptions that two or more samples are independent, and that independent and identically distributed residuals are normal and have equal variances between groups. We focus on the assumptions of normality and equality of variances, and argue that these assumptions are often unrealistic in the field of psychology. We underline the current lack of attention to these assumptions through an analysis of researchers' practices. Through Monte Carlo simulations we illustrate the consequences of performing the classic parametric F-test for ANOVA when the test assumptions are not met on the Type I error rate and statistical power. Under realistic deviations from the assumption of equal variances the classic F-test can yield severely biased results and lead to invalid statistical inferences. We examine two common alternatives to the F-test, namely the Welch's ANOVA (W-test) and the Brown-Forsythe test (F*-test). Our simulations show that under a range of realistic scenariosthe W-test is a better alternative and we therefore recommend using the W-test by default when comparing means. We provide a detailed example explaining how to perform the W-test in SPSS and R. We summarize our conclusions in practical recommendations that researchers can use to improve their statistical practices.

Download Full-text

Coal-Miner: a coalescent-based method for GWA studies of quantitative traits with complex evolutionary origins

10.1101/132951 ◽

2017 ◽

Author(s):

Hussein A. Hejase ◽

Natalie Vande Pol ◽

Gregory M. Bonito ◽

Patrick P. Edger ◽

Kevin J. Liu

Keyword(s):

Statistical Power ◽

Error Control ◽

Quantitative Traits ◽

Type I Error ◽

Type I ◽

Phenotypic Data ◽

Coal Miner ◽

Evolutionary Origins ◽

Candidate Loci ◽

Gwa Studies

AbstractAssociation mapping (AM) methods are used in genome-wide association (GWA) studies to test for statistically significant associations between genotypic and phenotypic data. The genotypic and phenotypic data share common evolutionary origins – namely, the evolutionary history of sampled organisms – introducing covariance which must be distinguished from the covariance due to biological function that is of primary interest in GWA studies. A variety of methods have been introduced to perform AM while accounting for sample relatedness. However, the state of the art predominantly utilizes the simplifying assumption that sample relatedness is effectively fixed across the genome. In contrast, population genetic theory and empirical studies have shown that sample relatedness can vary greatly across different loci within a genome; this phenomena – referred to as local genealogical variation – is commonly encountered in many genomic datasets. New AM methods are needed to better account for local variation in sample relatedness within genomes.We address this gap by introducing Coal-Miner, a new statistical AM method. The Coal-Miner algorithm takes the form of a methodological pipeline. The initial stages of Coal-Miner seek to detect candidate loci, or loci which contain putatively causal markers. Subsequent stages of Coal-Miner perform test for association using a linear mixed model with multiple effects which account for sample relatedness locally within candidate loci and globally across the entire genome.Using synthetic and empirical datasets, we compare the statistical power and type I error control of Coal-Miner against state-of-theart AM methods. The simulation conditions reflect a variety of genomic architectures for complex traits and incorporate a range of evolutionary scenarios, each with different evolutionary processes that can generate local genealogical variation. The empirical benchmarks include a large-scale dataset that appeared in a recent high-profile publication. Across the datasets in our study, we find that Coal-Miner consistently offers comparable or typically better statistical power and type I error control compared to the state-of-art methods.CCS CONCEPTSApplied computing → Computational genomics; Computational biology; Molecular sequence analysis; Molecular evolution; Computational genomics; Systems biology; Bioinformatics; Population genetics;ACM Reference formatHussein A. Hejase, Natalie Vande Pol, Gregory M. Bonito, Patrick P. Edger, and Kevin J. Liu. 2017. Coal-Miner: a coalescent-based method for GWA studies of quantitative traits with complex evolutionary origins. In Proceedings of ACM BCB, Boston, MA, 2017 (BCB), 10 pages. DOI: 10.475/123 4

Download Full-text

Assessing Person Fit With the Information Matrix Test

Methodology ◽

10.1027/1614-2241/a000085 ◽

2015 ◽

Vol 11 (1) ◽

pp. 3-12 ◽

Cited By ~ 2

Author(s):

Jochen Ranger ◽

Jörg-Tobias Kuhn

Keyword(s):

Simulation Study ◽

Type I Error ◽

Information Matrix ◽

Small Samples ◽

Type I ◽

Person Fit ◽

Power Of The Test ◽

Order Expansion ◽

Trait Stability ◽

Information Matrix Test

In this manuscript, a new approach to the analysis of person fit is presented that is based on the information matrix test of White (1982) . This test can be interpreted as a test of trait stability during the measurement situation. The test follows approximately a χ2-distribution. In small samples, the approximation can be improved by a higher-order expansion. The performance of the test is explored in a simulation study. This simulation study suggests that the test adheres to the nominal Type-I error rate well, although it tends to be conservative in very short scales. The power of the test is compared to the power of four alternative tests of person fit. This comparison corroborates that the power of the information matrix test is similar to the power of the alternative tests. Advantages and areas of application of the information matrix test are discussed.

Download Full-text

How to Detect Publication Bias in Psychological Research

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000386 ◽

2019 ◽

Vol 227 (4) ◽

pp. 261-279 ◽

Cited By ~ 2

Author(s):

Frank Renkewitz ◽

Melanie Keiner

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Psychological Research ◽

Type I ◽

True Effect Size ◽

Questionable Research Practices ◽

True Effect ◽

Meta Analyses

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text