scholarly journals Violating the normality assumption may be the lesser of two evils

Author(s):  
Ulrich Knief ◽  
Wolfgang Forstmeier

AbstractWhen data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.

2018 ◽  
Author(s):  
Ulrich Knief ◽  
Wolfgang Forstmeier

AbstractWhen data are not normally distributed (e.g. skewed, zero-inflated, binomial, or count data) researchers are often uncertain whether it may be legitimate to use tests that assume Gaussian errors (e.g. regression, t-test, ANOVA, Gaussian mixed models), or whether one has to either model a more specific error structure or use randomization techniques.Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation.We find that Gaussian models are remarkably robust to non-normality over a wide range of conditions, meaning that P-values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also perform well in terms of power and they can be useful for parameter estimation but usually not for extrapolation. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data.Overall, we argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and difficult to check during peer review. Hence, as long as scientists and reviewers are not fully aware of the risks, science might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data in a transparent way.Tweetable abstractGaussian models are remarkably robust to even dramatic violations of the normality assumption.


2017 ◽  
Vol 94 ◽  
pp. 305-315 ◽  
Author(s):  
Hannes Matuschek ◽  
Reinhold Kliegl ◽  
Shravan Vasishth ◽  
Harald Baayen ◽  
Douglas Bates

Methodology ◽  
2010 ◽  
Vol 6 (4) ◽  
pp. 147-151 ◽  
Author(s):  
Emanuel Schmider ◽  
Matthias Ziegler ◽  
Erik Danay ◽  
Luzi Beyer ◽  
Markus Bühner

Empirical evidence to the robustness of the analysis of variance (ANOVA) concerning violation of the normality assumption is presented by means of Monte Carlo methods. High-quality samples underlying normally, rectangularly, and exponentially distributed basic populations are created by drawing samples which consist of random numbers from respective generators, checking their goodness of fit, and allowing only the best 10% to take part in the investigation. A one-way fixed-effect design with three groups of 25 values each is chosen. Effect-sizes are implemented in the samples and varied over a broad range. Comparing the outcomes of the ANOVA calculations for the different types of distributions, gives reason to regard the ANOVA as robust. Both, the empirical type I error α and the empirical type II error β remain constant under violation. Moreover, regression analysis identifies the factor “type of distribution” as not significant in explanation of the ANOVA results.


2013 ◽  
Vol 52 (04) ◽  
pp. 351-359 ◽  
Author(s):  
M. O. Scheinhardt ◽  
A. Ziegler

Summary Background: Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. Objectives: In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O’Gorman, Can J Stat 1997; 25: 269 –279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267– 293; Szymczak et al., Stat Med 2013; 32: 524 – 537] for a wide range of distributions. Methods: We simulated two-sample scena -rios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. Results: All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test per -formed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. Conclusions: The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.


2021 ◽  
Author(s):  
Angély Loubert ◽  
Antoine Regnault ◽  
Véronique Sébille ◽  
Jean-Benoit Hardouin

Abstract BackgroundIn the analysis of clinical trial endpoints, calibration of patient-reported outcomes (PRO) instruments ensures that resulting “scores” represent the same quantity of the measured concept between applications. Rasch measurement theory (RMT) is a psychometric approach that guarantees algebraic separation of person and item parameter estimates, allowing formal calibration of PRO instruments. In the RMT framework, calibration is performed using the item parameter estimates obtained from a previous “calibration” study. But if calibration is based on poorly estimated item parameters (e.g., because the sample size of the calibration sample was low), this may hamper the ability to detect a treatment effect, and direct estimation of item parameters from the trial data (non-calibration) may then be preferred. The objective of this simulation study was to assess the impact of calibration on the comparison of PRO results between treatment groups, using different analysis methods.MethodsPRO results were simulated following a polytomous Rasch model, for a calibration and a trial sample. Scenarios included varying sample sizes, with instrument of varying number of items and modalities, and varying item parameters distributions. Different treatment effect sizes and distributions of the two patient samples were also explored. Comparison of treatment groups was performed using different methods based on a random effect Rasch model. Calibrated and non-calibrated approaches were compared based on type-I error, power, bias, and variance of the estimates for the difference between groups.Results There was no impact of the calibration approach on type-I error, power, bias, and dispersion of the estimates. Among other findings, mistargeting between the PRO instrument and patients from the trial sample (regarding the level of measured concept) resulted in a lower power and higher position bias than appropriate targeting. ConclusionsCalibration of PROs in clinical trials does not compromise the ability to accurately assess a treatment effect and is essential to properly interpret PRO results. Given its important added value, calibration should thus always be performed when a PRO instrument is used as an endpoint in a clinical trial, in the RMT framework.


Author(s):  
Patrick J. Rosopa ◽  
Alice M. Brawley ◽  
Theresa P. Atkinson ◽  
Stephen A. Robertson

Preliminary tests for homoscedasticity may be unnecessary in general linear models. Based on Monte Carlo simulations, results suggest that when testing for differences between independent slopes, the unconditional use of weighted least squares regression and HC4 regression performed the best across a wide range of conditions.


1993 ◽  
Vol 30 (2) ◽  
pp. 246-255 ◽  
Author(s):  
Murali Chandrashekaran ◽  
Beth A. Walker

To enhance the utility of meta-analysis as an integrative tool for marketing research, heteroscedastic MLE (HMLE), a maximum-likelihood-based estimation procedure, is proposed as a method that overcomes heteroscedasticity, a problem known to impair OLS estimates and threaten the validity of meta-analytic findings. The results of a Monté Carlo simulation experiment reveal that, under a wide range of heteroscedastic conditions, HMLE is more efficient and powerful than OLS and achieves these performance advantages without inflating type I error. Further, the relative performance of HMLE increases as heteroscedasticity becomes more severe. An empirical analysis of a meta-analytic dataset in marketing confirmed and extended these findings by illustrating how the enhanced efficiency and power of HMLE improve the ability to detect moderator variables and by demonstrating how the theoretical generalizations emerging from a meta-analysis are affected by the choice of the analytic procedure.


2017 ◽  
Vol 78 (3) ◽  
pp. 460-481 ◽  
Author(s):  
Margarita Olivera-Aguilar ◽  
Samuel H. Rikoon ◽  
Oscar Gonzalez ◽  
Yasemin Kisbu-Sakarya ◽  
David P. MacKinnon

When testing a statistical mediation model, it is assumed that factorial measurement invariance holds for the mediating construct across levels of the independent variable X. The consequences of failing to address the violations of measurement invariance in mediation models are largely unknown. The purpose of the present study was to systematically examine the impact of mediator noninvariance on the Type I error rates, statistical power, and relative bias in parameter estimates of the mediated effect in the single mediator model. The results of a large simulation study indicated that, in general, the mediated effect was robust to violations of invariance in loadings. In contrast, most conditions with violations of intercept invariance exhibited severely positively biased mediated effects, Type I error rates above acceptable levels, and statistical power larger than in the invariant conditions. The implications of these results are discussed and recommendations are offered.


2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Emily A. Blood ◽  
Debbie M. Cheng

Linear mixed models (LMMs) are frequently used to analyze longitudinal data. Although these models can be used to evaluate mediation, they do not directly model causal pathways. Structural equation models (SEMs) are an alternative technique that allows explicit modeling of mediation. The goal of this paper is to evaluate the performance of LMMs relative to SEMs in the analysis of mediated longitudinal data with time-dependent predictors and mediators. We simulated mediated longitudinal data from an SEM and specified delayed effects of the predictor. A variety of model specifications were assessed, and the LMMs and SEMs were evaluated with respect to bias, coverage probability, power, and Type I error. Models evaluated in the simulation were also applied to data from an observational cohort of HIV-infected individuals. We found that when carefully constructed, the LMM adequately models mediated exposure effects that change over time in the presence of mediation, even when the data arise from an SEM.


2021 ◽  
Author(s):  
Sebastian Sosa ◽  
Cristian Pasquaretta ◽  
Ivan Puga-Gonzalez ◽  
F Stephen Dobson ◽  
Vincent A Viblanc ◽  
...  

Animal social network analyses (ASNA) have led to a foundational shift in our understanding of animal sociality that transcends the disciplinary boundaries of genetics, spatial movements, epidemiology, information transmission, evolution, species assemblages and conservation. However, some analytical protocols (i.e., permutation tests) used in ASNA have recently been called into question due to the unacceptable rates of false negatives (type I error) and false positives (type II error) they generate in statistical hypothesis testing. Here, we show that these rates are related to the way in which observation heterogeneity is accounted for in association indices. To solve this issue, we propose a method termed the "global index" (GI) that consists of computing the average of individual associations indices per unit of time. In addition, we developed an "index of interactions" (II) that allows the use of the GI approach for directed behaviours. Our simulations show that GI: 1) returns more reasonable rates of false negatives and positives, with or without observational biases in the collected data, 2) can be applied to both directed and undirected behaviours, 3) can be applied to focal sampling, scan sampling or "gambit of the group" data collection protocols, and 4) can be applied to first- and second-order social network measures. Finally, we provide a method to control for non-social biological confounding factors using linear regression residuals. By providing a reliable approach for a wide range of scenarios, we propose a novel methodology in ASNA with the aim of better understanding social interactions from a mechanistic, ecological and evolutionary perspective.


Sign in / Sign up

Export Citation Format

Share Document