The Importance of Type I Error Rates When Studying Bias in Monte Carlo Studies in Statistics

Michael Harwell

doi:10.22237/jmasm/1556670360

Summarizing Monte Carlo Results in Methodological Research: The One- and Two-Factor Fixed Effects ANOVA Cases

Journal of Educational Statistics ◽

10.3102/10769986017004315 ◽

1992 ◽

Vol 17 (4) ◽

pp. 315-339 ◽

Cited By ~ 181

Author(s):

Michael R. Harwell ◽

Elaine N. Rubinstein ◽

William S. Hayes ◽

Corley C. Olds

Keyword(s):

Monte Carlo ◽

Error Rate ◽

Fixed Effects ◽

Type I Error ◽

Type I ◽

F Test ◽

Unequal Variances ◽

Type I Error Rate ◽

Monte Carlo Studies ◽

The One

Meta-analytic methods were used to integrate the findings of a sample of Monte Carlo studies of the robustness of the F test in the one- and two-factor fixed effects ANOVA models. Monte Carlo results for the Welch (1947) and Kruskal-Wallis (Kruskal & Wallis, 1952) tests were also analyzed. The meta-analytic results provided strong support for the robustness of the Type I error rate of the F test when certain assumptions were violated. The F test also showed excellent power properties. However, the Type I error rate of the F test was sensitive to unequal variances, even when sample sizes were equal. The error rate of the Welch test was insensitive to unequal variances when the population distribution was normal, but nonnormal distributions tended to inflate its error rate and to depress its power. Meta-analytic and exact statistical theory results were used to summarize the effects of assumption violations for the tests.

Download Full-text

Summarizing Monte Carlo Results in Methodological Research

Journal of Educational Statistics ◽

10.3102/10769986017004297 ◽

1992 ◽

Vol 17 (4) ◽

pp. 297-313 ◽

Cited By ~ 18

Author(s):

Michael R. Harwell

Keyword(s):

Monte Carlo ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Power Of A Test ◽

Methodological Research ◽

Assumption Violations ◽

Monte Carlo Studies ◽

Effective Use

Monte Carlo studies provide information that can assist researchers in selecting a statistical test when underlying assumptions of the test are violated. Effective use of this literature is hampered by the lack of an overarching theory to guide the interpretation of Monte Carlo studies. The problem is exacerbated by the impressionistic nature of the studies, which can lead different readers to different conclusions. These shortcomings can be addressed using meta-analytic methods to integrate the results of Monte Carlo studies. Quantitative summaries of the effects of assumption violations on the Type I error rate and power of a test can assist researchers in selecting the best test for their data. Such summaries can also be used to evaluate the validity of previously published statistical results. This article provides a methodological framework for quantitatively integrating Type I error rates and power values for Monte Carlo studies. An example is provided using Monte Carlo studies of Bartlett’s (1937) test of equality of variances. The importance of relating meta-analytic results to exact statistical theory is emphasized.

Download Full-text

Assessment of Type I Error Rates and Power of Common Analysis Methods in Murine Obesity-Related Study: ‘Plasmode-Based’ Simulation (P13-011-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz036.p13-011-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Author(s):

Keisuke Ejima ◽

Andrew Brown ◽

Daniel Smith ◽

Ufuk Beyaztas ◽

David Allison

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Small Samples ◽

Type I ◽

Type I Error Rates ◽

Type I Error Rate ◽

Weight Distributions

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.

Download Full-text

Correcting the Bias Correction for the Bootstrap Confidence Interval in Mediation Analysis

10.31234/osf.io/pe4m2 ◽

2021 ◽

Author(s):

Tristan Tibbe ◽

Amanda Kay Montoya

Keyword(s):

Confidence Interval ◽

Error Rate ◽

Indirect Effect ◽

Mediation Analysis ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Bootstrap Confidence Interval ◽

Type I Error Rates ◽

Type I Error Rate

The bias-corrected bootstrap confidence interval (BCBCI) was once the method of choice for conducting inference on the indirect effect in mediation analysis due to its high power in small samples, but now it is criticized by methodologists for its inflated type I error rates. In its place, the percentile bootstrap confidence interval (PBCI), which does not adjust for bias, is currently the recommended inferential method for indirect effects. This study proposes two alternative bias-corrected bootstrap methods for creating confidence intervals around the indirect effect. Using a Monte Carlo simulation, these methods were compared to the BCBCI, PBCI, and a bias-corrected method introduced by Chen and Fritz (2021). The results showed that the methods perform on a continuum, where the BCBCI has the best balance (i.e., having closest to an equal proportion of CIs falling above and below the true effect), highest power, and highest type I error rate; the PBCI has the worst balance, lowest power, and lowest type I error rate; and the alternative bias-corrected methods fall between these two methods on all three performance criteria. An extension of the original simulation that compared the bias-corrected methods to the PBCI after controlling for type I error rate inflation suggests that the increased power of these methods might only be due to their higher type I error rates. Thus, if control over the type I error rate is desired, the PBCI is still the recommended method for use with the indirect effect. Future research should examine the performance of these methods in the presence of missing data, confounding variables, and other real-world complications to enhance the generalizability of these results.

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

Comparision Mann-Whitney U Test and Students’ t Test in Terms of Type I Error Rate and Test Power: A Monte Carlo Sımulation Study

Afyon Kocatepe University Journal of Sciences and Engineering ◽

10.5578/fmbd.7380 ◽

2014 ◽

Vol 14 (1) ◽

pp. 5-11

Author(s):

Recep Bindak

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

T Test ◽

Type I ◽

Monte Carlo Simulation Study ◽

Test Power ◽

Type I Error Rate

Download Full-text

Comparison of methods to account for autocorrelation in correlation analyses of fish data

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f98-104 ◽

1998 ◽

Vol 55 (9) ◽

pp. 2127-2140 ◽

Cited By ~ 445

Author(s):

Brian J Pyper ◽

Randall M Peterman

Keyword(s):

Monte Carlo ◽

Hypothesis Testing ◽

Type I Error ◽

Low Frequency ◽

Error Rates ◽

Type I ◽

Testing Procedures ◽

Type I Error Rates ◽

Fish Recruitment ◽

Correlation Analyses

Autocorrelation in fish recruitment and environmental data can complicate statistical inference in correlation analyses. To address this problem, researchers often either adjust hypothesis testing procedures (e.g., adjust degrees of freedom) to account for autocorrelation or remove the autocorrelation using prewhitening or first-differencing before analysis. However, the effectiveness of methods that adjust hypothesis testing procedures has not yet been fully explored quantitatively. We therefore compared several adjustment methods via Monte Carlo simulation and found that a modified version of these methods kept Type I error rates near . In contrast, methods that remove autocorrelation control Type I error rates well but may in some circumstances increase Type II error rates (probability of failing to detect some environmental effect) and hence reduce statistical power, in comparison with adjusting the test procedure. Specifically, our Monte Carlo simulations show that prewhitening and especially first-differencing decrease power in the common situations where low-frequency (slowly changing) processes are important sources of covariation in fish recruitment or in environmental variables. Conversely, removing autocorrelation can increase power when low-frequency processes account for only some of the covariation. We therefore recommend that researchers carefully consider the importance of different time scales of variability when analyzing autocorrelated data.

Download Full-text

A Monte Carlo Simulation Study for Kolmogorov-Smirnov Two-Sample Test Under the Precondition of Heterogeneity: Upon the Changes on the Probabilities of Statistical Power and Type I Error Rates with Respect to Skewness Measure

SSRN Electronic Journal ◽

10.2139/ssrn.2497601 ◽

2013 ◽

Author(s):

ttken Senger ◽

Ali Kemal elik

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Statistical Power ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Monte Carlo Simulation Study ◽

Type I Error Rates ◽

Sample Test ◽

Kolmogorov Smirnov

Download Full-text

316. Note: Graphical Monte Carlo Type I Error Rates for Multiple Comparison Procedures

Biometrics ◽

10.2307/2528613 ◽

1971 ◽

Vol 27 (3) ◽

pp. 738 ◽

Cited By ~ 23

Author(s):

Thomas J. Boardman ◽

Donald R. Moffitt

Keyword(s):

Monte Carlo ◽

Type I Error ◽

Error Rates ◽

Multiple Comparison ◽

Type I ◽

Type I Error Rates ◽

Multiple Comparison Procedures

Download Full-text

Performance of Monte Carlo Permutation and Approximate Tests for Multivariate Means Comparisons With Small Sample Sizes When Parametric Assumptions are Violated

Methodology ◽

10.1027/1614-2241.5.2.60 ◽

2009 ◽

Vol 5 (2) ◽

pp. 60-70 ◽

Cited By ~ 6

Author(s):

W. Holmes Finch ◽

Teresa Davenport

Keyword(s):

Monte Carlo ◽

Type I Error ◽

Permutation Tests ◽

Error Rates ◽

Covariance Matrices ◽

Small Sample ◽

Type I ◽

Permutation Testing ◽

Sample Sizes ◽

Type I Error Rates

Permutation testing has been suggested as an alternative to the standard F approximate tests used in multivariate analysis of variance (MANOVA). These approximate tests, such as Wilks’ Lambda and Pillai’s Trace, have been shown to perform poorly when assumptions of normally distributed dependent variables and homogeneity of group covariance matrices were violated. Because Monte Carlo permutation tests do not rely on distributional assumptions, they may be expected to work better than their approximate cousins when the data do not conform to the assumptions described above. The current simulation study compared the performance of four standard MANOVA test statistics with their Monte Carlo permutation-based counterparts under a variety of conditions with small samples, including conditions when the assumptions were met and when they were not. Results suggest that for sample sizes of 50 subjects, power is very low for all the statistics. In addition, Type I error rates for both the approximate F and Monte Carlo tests were inflated under the condition of nonnormal data and unequal covariance matrices. In general, the performance of the Monte Carlo permutation tests was slightly better in terms of Type I error rates and power when both assumptions of normality and homogeneous covariance matrices were not met. It should be noted that these simulations were based upon the case with three groups only, and as such results presented in this study can only be generalized to similar situations.

Download Full-text