Analysis of Variance Models with Stochastic Group Weights

Mapping Intimacies ◽

10.31234/osf.io/t98w7 ◽

2019 ◽

Author(s):

Axel Mayer ◽

Felix Thoemmes

Keyword(s):

Analysis Of Variance ◽

Error Rate ◽

Latent Variables ◽

Structural Equation Models ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Inflated Type ◽

Group Weights

The analysis of variance (ANOVA) is still one of the most widely used statistical methods in the social sciences. This paper is about stochastic group weights in ANOVA models – a neglected aspect in the literature. Stochastic group weights are present whenever the experimenter does not determine the exact group sizes before conducting the experiment. We show that classic ANOVA tests based on estimated marginal means can have an inflated type I error rate when stochastic group weights are not taken into account, even in randomized experiments. We propose two new ways to incorporate stochastic group weights in the tests of average effects - one based on the general linear model and one based on multigroup structural equation models (SEMs). We show in simulation studies that our methods have nominal type I error rates in experiments with stochastic group weights while classic approaches show an inflated type I error rate. The SEM approach can additionally deal with heteroscedastic residual variances and latent variables. An easy-to-use software package with graphical user interface is provided.

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

Assessment of Type I Error Rates and Power of Common Analysis Methods in Murine Obesity-Related Study: ‘Plasmode-Based’ Simulation (P13-011-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz036.p13-011-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Author(s):

Keisuke Ejima ◽

Andrew Brown ◽

Daniel Smith ◽

Ufuk Beyaztas ◽

David Allison

Keyword(s):

Sample Size ◽

Error Rate ◽

Type I Error ◽

Error Rates ◽

T Test ◽

Small Samples ◽

Type I ◽

Type I Error Rates ◽

Type I Error Rate ◽

Weight Distributions

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.

Download Full-text

The Devil is Mainly in the Nuisance Parameters: Performance of Structural Fit Indices Under Misspecified Structural Models in SEM

10.31234/osf.io/d8tuy ◽

2021 ◽

Author(s):

Moritz Heene ◽

Michael Maraun ◽

Nadine J. Glushko ◽

Sunthud Pornprasertmanit

Keyword(s):

Error Rate ◽

Structural Component ◽

Latent Variable ◽

Structural Equation Models ◽

Type I Error ◽

Correct Rejection ◽

Type I ◽

Fit Indices ◽

Type I Error Rate ◽

Model Features

To provide researchers with a means of assessing the fit of the structural component of structural equation models, structural fit indices- modifications of the composite fit indices, RMSEA, SRMR, and CFI- have recently been developed. We investigated the performance of four of these structural fit indices- RMSEA-P, RMSEAs, SRMRs, and CFIs-, when paired with widely accepted cutoff values, in the service of detecting structural misspecification. In particular, by way of simulation study, for each of seven fit indices- 3 composite and 4 structural-, and the traditional chi-square test of perfect composite fit, we estimated the following rates: a) Type I error rate (i.e., the probability of (incorrect) rejection of a correctly specified structural component), under each of four degrees of misspecification in the measurement component; and b) Power (i.e., the probability of (correct) rejection of an incorrectly specified structural model), under each condition formed of the pairing of one of three degrees of structural misspecification with one of four degrees of measurement component misspecification. In addition to sample size, the impacts of two model features, incidental to model misspecification- number of manifest variables per latent variable and magnitude of factor loading- were investigated. The results suggested that, although the structural fit indices performed relatively better than the composite fit indices, none of the GFICV pairings was capable of delivering an entirely satisfactory Type I error rate/Power balance, [RMSEA-S,.05] failing entirely in this regard. Of the remaining pairings; a) RMSEA-P and CFIs suffered from a severely inflated Type I error rate; b) despite the fact that they were designed to pick up on structural features of candidate models, all pairings- and especially, RMSEA-P and CFIs- manifested sensitivities to model features, incidental to structural misspecification; and c) although, in the main, behaving in a sensible fashion, SRMRS was only sensitive to structural misspecification when it occurred at a relatively high degree.

Download Full-text

Adding new experimental arms to randomised clinical trials: Impact on error rates

Clinical Trials ◽

10.1177/1740774520904346 ◽

2020 ◽

Vol 17 (3) ◽

pp. 273-284 ◽

Cited By ~ 1

Author(s):

Babak Choodari-Oskooei ◽

Daniel J Bratton ◽

Melissa R Gannon ◽

Angela M Meade ◽

Matthew R Sydes ◽

...

Keyword(s):

Error Rate ◽

Type I Error ◽

Controlled Trial ◽

Late Phase ◽

Pairwise Comparison ◽

Error Rates ◽

Pairwise Comparisons ◽

Type I ◽

Test Statistics ◽

Type I Error Rate

Background: Experimental treatments pass through various stages of development. If a treatment passes through early-phase experiments, the investigators may want to assess it in a late-phase randomised controlled trial. An efficient way to do this is adding it as a new research arm to an ongoing trial while the existing research arms continue, a so-called multi-arm platform trial. The familywise type I error rate is often a key quantity of interest in any multi-arm platform trial. We set out to clarify how it should be calculated when new arms are added to a trial some time after it has started. Methods: We show how the familywise type I error rate, any-pair and all-pairs powers can be calculated when a new arm is added to a platform trial. We extend the Dunnett probability and derive analytical formulae for the correlation between the test statistics of the existing pairwise comparison and that of the newly added arm. We also verify our analytical derivation via simulations. Results: Our results indicate that the familywise type I error rate depends on the shared control arm information (i.e. individuals in continuous and binary outcomes and primary outcome events in time-to-event outcomes) from the common control arm patients and the allocation ratio. The familywise type I error rate is driven more by the number of pairwise comparisons and the corresponding (pairwise) type I error rates than by the timing of the addition of the new arms. The familywise type I error rate can be estimated using Šidák’s correction if the correlation between the test statistics of pairwise comparisons is less than 0.30. Conclusions: The findings we present in this article can be used to design trials with pre-planned deferred arms or to add new pairwise comparisons within an ongoing platform trial where control of the pairwise error rate or familywise type I error rate (for a subset of pairwise comparisons) is required.

Download Full-text

Genotyping and inflated type I error rate in genome-wide association case/control studies

BMC Bioinformatics ◽

10.1186/1471-2105-10-68 ◽

2009 ◽

Vol 10 (1) ◽

Cited By ~ 5

Author(s):

Joshua N Sampson ◽

Hongyu Zhao

Keyword(s):

Error Rate ◽

Type I Error ◽

Case Control ◽

Genome Wide Association ◽

Type I ◽

Case Control Studies ◽

Type I Error Rate ◽

Genome Wide ◽

Inflated Type

Download Full-text

Dyadic Measurement Invariance and Its Importance for Replicability in Romantic Relationship Research

10.31234/osf.io/9vcnz ◽

2019 ◽

Cited By ~ 1

Author(s):

John Kitchener Sakaluk ◽

Robyn Kilshaw ◽

Alexandra Noelle Fisher ◽

Connor Emont Leshner

Keyword(s):

Measurement Invariance ◽

Latent Variables ◽

Romantic Relationship ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Model Framework ◽

Psychological Variables ◽

Inflated Type ◽

Regression Slopes

Comparisons of group means, variances, correlations, and/or regression slopes involving psychological variables rely on an assumption of measurement invariance–that the latent variables under investigation have equivalent meaning and measurement across group. When measures are noninvariant, replicability suffers, as comparisons are either conceptually meaningless, or hindered by inflated Type I error rates. We propose that the failure to account for interdependence amongst dyad members when testing measurement invariance may be a potential source of unreplicable findings in relationship research. We develop fully dyadic versions of invariance-testing in an Actor-Partner Interdependence Model framework, and propose a Registered Report for gauging the extent of dyadic (non)invariance in romantic relationship research.

Download Full-text

Correcting the Bias Correction for the Bootstrap Confidence Interval in Mediation Analysis

10.31234/osf.io/pe4m2 ◽

2021 ◽

Author(s):

Tristan Tibbe ◽

Amanda Kay Montoya

Keyword(s):

Confidence Interval ◽

Error Rate ◽

Indirect Effect ◽

Mediation Analysis ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Bootstrap Confidence Interval ◽

Type I Error Rates ◽

Type I Error Rate

The bias-corrected bootstrap confidence interval (BCBCI) was once the method of choice for conducting inference on the indirect effect in mediation analysis due to its high power in small samples, but now it is criticized by methodologists for its inflated type I error rates. In its place, the percentile bootstrap confidence interval (PBCI), which does not adjust for bias, is currently the recommended inferential method for indirect effects. This study proposes two alternative bias-corrected bootstrap methods for creating confidence intervals around the indirect effect. Using a Monte Carlo simulation, these methods were compared to the BCBCI, PBCI, and a bias-corrected method introduced by Chen and Fritz (2021). The results showed that the methods perform on a continuum, where the BCBCI has the best balance (i.e., having closest to an equal proportion of CIs falling above and below the true effect), highest power, and highest type I error rate; the PBCI has the worst balance, lowest power, and lowest type I error rate; and the alternative bias-corrected methods fall between these two methods on all three performance criteria. An extension of the original simulation that compared the bias-corrected methods to the PBCI after controlling for type I error rate inflation suggests that the increased power of these methods might only be due to their higher type I error rates. Thus, if control over the type I error rate is desired, the PBCI is still the recommended method for use with the indirect effect. Future research should examine the performance of these methods in the presence of missing data, confounding variables, and other real-world complications to enhance the generalizability of these results.

Download Full-text

Extending the CWM approach to intraspecific trait variation: how to deal with overly optimistic standard tests?

10.1101/2021.09.09.459685 ◽

2021 ◽

Author(s):

David Zelený ◽

Kenny Helsen ◽

Yi-Nuo Lee

Keyword(s):

Error Rate ◽

Real World ◽

Type I Error ◽

Cloud Forest ◽

Error Rates ◽

Type I ◽

Trait Variation ◽

Site Specific ◽

Intraspecific Trait Variation ◽

Type I Error Rate

AbstractCommunity weighted means (CWMs) are widely used to study the relationship between community-level functional traits and environment variation. When relationships between CWM traits and environmental variables are directly assessed using linear regression or ANOVA and tested by standard parametric tests, results are prone to inflated Type I error rates, thus producing overly optimistic results. Previous research has found that this problem can be solved by permutation tests (i.e. the max test). A recent extension of this CWM approach, that allows the inclusion of intraspecific trait variation (ITV) by partitioning information in fixed, site-specific and intraspecific CWMs, has proven popular. However, this raises the question whether the same kind of Type I error rate inflation also exists for site-specific CWM or intraspecific CWM-environment relationships. Using simulated community datasets and a real-world dataset from a subtropical montane cloud forest in Taiwan, we show that site-specific CWM-environment relationships also suffer from Type I error rate inflation, and that the severity of this inflation is negatively related to the relative ITV magnitude. In contrast, for intraspecific CWM-environment relationships, standard parametric tests have the correct Type I error rate, while being somewhat conservative, with reduced statistical power. We introduce an ITV-extended version of the max test for the ITV-extended CWM approach, which can solve the inflation problem for site-specific CWM-environment relationships, and which, without considering ITV, becomes equivalent to the “original” max test used for the CWM approach. On both simulated and real-world data, we show that this new ITV-extended max test works well across the full possible magnitude of ITV. We also provide guidelines and R codes of max test solutions for each CWM type and situation. Finally, we suggest recommendations on how to handle the results of previously published studies using the CWM approach without controlling for Type I error rate inflation.

Download Full-text

WILCOXON TEST AFTER LEVENE'S TRANSFORMATION CAN HAVE AN INFLATED TYPE I ERROR RATE

Psychological Reports ◽

10.2466/pr0.94.3.1419-1420 ◽

2004 ◽

Vol 94 (3) ◽

pp. 1419

Author(s):

MARKUS NEUHAUSER

Keyword(s):

Error Rate ◽

Type I Error ◽

Wilcoxon Test ◽

Type I ◽

Type I Error Rate ◽

Inflated Type

Download Full-text

Which results of the standard test in community weighted mean approach are too optimistic?

10.1101/349589 ◽

2018 ◽

Author(s):

David Zelený

Keyword(s):

Species Composition ◽

Error Rate ◽

Type I Error ◽

Permutation Test ◽

Standard Test ◽

Type I ◽

Type I Error Rate ◽

Inflated Type ◽

Community Weighted Mean ◽

The Relationship

AbstractQuestionsCommunity weighted mean (CWM) approach analyses the relationship species attributes (like traits or Ellenberg-type indicator values) to sample attributes (environmental variables). Recently it has been shown to suffer from inflated Type I error rate if tested by standard parametric or (row-based) permutation test. Results of many published studies are likely influenced, reporting overly optimistic relationships that are in fact merely a numerical artefact. Can we evaluate results of which studies are likely to be influenced and how much?MethodsI suggest that hypotheses commonly tested by CWM approach are classified into three categories, which differ by assumption they make about the link of species composition to either species or sample attributes. I used a set of simulated and one simple real dataset to show how is the inflated Type I error rate influenced by data characteristics.ResultsFor hypotheses assuming the link of species composition to species attributes, CWM approach with standard test returns correct Type I error rate. However, for the other two categories (assuming link of species composition to sample attributes or not assuming any link) it returns inflated Type I error rate and requires alternative tests to control for it (column-based and max test, respectively). Inflation index is negatively related to the beta diversity of species composition and positively to the strength of species composition-sample attributes relationship and the number of samples in the dataset. Inflation index is also influenced by modifying species composition matrix (by transformation or removal of species). The relationship of CWM with intrinsic species attributes is a case of spurious correlation and can be tested by column-based (modified) permutation test.ConclusionsThe concept of three hypothesis categories offers a simple tool to evaluate whether given study reports correct or inflated Type I error rate, and how inflated the rate can be.

Download Full-text