Permutation-based methods for mediation analysis in studies with small sample sizes

PeerJ ◽

10.7717/peerj.8246 ◽

2020 ◽

Vol 8 ◽

pp. e8246

Author(s):

Miranda E. Kroehl ◽

Sharon Lutz ◽

Brandie D. Wagner

Keyword(s):

Indirect Effect ◽

Mediation Analysis ◽

Type I Error ◽

Permutation Test ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Permutation Testing ◽

Sample Sizes ◽

Small Sample Sizes

Background Mediation analysis can be used to evaluate the effect of an exposure on an outcome acting through an intermediate variable or mediator. For studies with small sample sizes, permutation testing may be useful in evaluating the indirect effect (i.e., the effect of exposure on the outcome through the mediator) while maintaining the appropriate type I error rate. For mediation analysis in studies with small sample sizes, existing permutation testing methods permute the residuals under the full or alternative model, but have not been evaluated under situations where covariates are included. In this article, we consider and evaluate two additional permutation approaches for testing the indirect effect in mediation analysis based on permutating the residuals under the reduced or null model which allows for the inclusion of covariates. Methods Simulation studies were used to empirically evaluate the behavior of these two additional approaches: (1) the permutation test of the Indirect Effect under Reduced Models (IERM) and (2) the Permutation Supremum test under Reduced Models (PSRM). The performance of these methods was compared to the standard permutation approach for mediation analysis, the permutation test of the Indirect Effect under Full Models (IEFM). We evaluated the type 1 error rates and power of these methods in the presence of covariates since mediation analysis assumes no unmeasured confounders of the exposure–mediator–outcome relationships. Results The proposed PSRM approach maintained type I error rates below nominal levels under all conditions, while the proposed IERM approach exhibited grossly inflated type I rates in many conditions and the standard IEFM exhibited inflated type I error rates under a small number of conditions. Power did not differ substantially between the proposed PSRM approach and the standard IEFM approach. Conclusions The proposed PSRM approach is recommended over the existing IEFM approach for mediation analysis in studies with small sample sizes.

Download Full-text

Performance of Monte Carlo Permutation and Approximate Tests for Multivariate Means Comparisons With Small Sample Sizes When Parametric Assumptions are Violated

Methodology ◽

10.1027/1614-2241.5.2.60 ◽

2009 ◽

Vol 5 (2) ◽

pp. 60-70 ◽

Cited By ~ 6

Author(s):

W. Holmes Finch ◽

Teresa Davenport

Keyword(s):

Monte Carlo ◽

Type I Error ◽

Permutation Tests ◽

Error Rates ◽

Covariance Matrices ◽

Small Sample ◽

Type I ◽

Permutation Testing ◽

Sample Sizes ◽

Type I Error Rates

Permutation testing has been suggested as an alternative to the standard F approximate tests used in multivariate analysis of variance (MANOVA). These approximate tests, such as Wilks’ Lambda and Pillai’s Trace, have been shown to perform poorly when assumptions of normally distributed dependent variables and homogeneity of group covariance matrices were violated. Because Monte Carlo permutation tests do not rely on distributional assumptions, they may be expected to work better than their approximate cousins when the data do not conform to the assumptions described above. The current simulation study compared the performance of four standard MANOVA test statistics with their Monte Carlo permutation-based counterparts under a variety of conditions with small samples, including conditions when the assumptions were met and when they were not. Results suggest that for sample sizes of 50 subjects, power is very low for all the statistics. In addition, Type I error rates for both the approximate F and Monte Carlo tests were inflated under the condition of nonnormal data and unequal covariance matrices. In general, the performance of the Monte Carlo permutation tests was slightly better in terms of Type I error rates and power when both assumptions of normality and homogeneous covariance matrices were not met. It should be noted that these simulations were based upon the case with three groups only, and as such results presented in this study can only be generalized to similar situations.

Download Full-text

A Monte Carlo Comparison of Seven ε-Adjustment Procedures in Repeated Measures Designs With Small Sample Sizes

Journal of Educational Statistics ◽

10.3102/10769986019001057 ◽

1994 ◽

Vol 19 (1) ◽

pp. 57-71 ◽

Cited By ~ 18

Author(s):

Stephen M. Quintana ◽

Scott E. Maxwell

Keyword(s):

Repeated Measures ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Small Samples ◽

Type I ◽

Sample Sizes ◽

Type I Error Rates ◽

Repeated Measures Designs ◽

Small Sample Sizes

The purpose of this study was to evaluate seven univariate procedures for testing omnibus null hypotheses for data gathered from repeated measures designs. Five alternate approaches are compared to the two more traditional adjustment procedures (Geisser and Greenhouse’s ε̂ and Huynh and Feldt’s ε̃), neither of which may be entirely adequate when sample sizes are small and the number of levels of the repeated factors is large. Empirical Type I error rates and power levels were obtained by simulation for conditions where small samples occur in combination with many levels of the repeated factor. Results suggested that alternate univariate approaches were improvements to the traditional approaches. One alternate approach in particular was found to be most effective in controlling Type I error rates without unduly sacrificing power.

Download Full-text

Differences of Type I error rates for ANOVA and Multilevel-Linear-Models using SAS and SPSS for repeated measures designs

Meta-Psychology ◽

10.15626/mp.2018.898 ◽

2019 ◽

Vol 3 ◽

Author(s):

Nicolas Haverkamp ◽

André Beauducel

Keyword(s):

Repeated Measures ◽

Linear Models ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Small Samples ◽

Type I ◽

Sample Sizes ◽

Type I Error Rates ◽

Multilevel Linear Models

To derive recommendations on how to analyze longitudinal data, we examined Type I error rates of Multilevel Linear Models (MLM) and repeated measures Analysis of Variance (rANOVA) using SAS and SPSS. We performed a simulation with the following specifications: To explore the effects of high numbers of measurement occasions and small sample sizes on Type I error, measurement occasions of m = 9 and 12 were investigated as well as sample sizes of n = 15, 20, 25 and 30. Effects of non-sphericity in the population on Type I error were also inspected: 5,000 random samples were drawn from two populations containing neither a within-subject nor a between-group effect. They were analyzed including the most common options to correct rANOVA and MLM-results: The Huynh-Feldt-correction for rANOVA (rANOVA-HF) and the Kenward-Roger-correction for MLM (MLM-KR), which could help to correct progressive bias of MLM with an unstructured covariance matrix (MLM-UN). Moreover, uncorrected rANOVA and MLM assuming a compound symmetry covariance structure (MLM-CS) were also taken into account. The results showed a progressive bias for MLM-UN for small samples which was stronger in SPSS than in SAS. Moreover, an appropriate bias correction for Type I error via rANOVA-HF and an insufficient correction by MLM-UN-KR for n < 30 were found. These findings suggest MLM-CS or rANOVA if sphericity holds and a correction of a violation via rANOVA-HF. If an analysis requires MLM, SPSS yields more accurate Type I error rates for MLM-CS and SAS yields more accurate Type I error rates for MLM-UN.

Download Full-text

Correcting the Bias Correction for the Bootstrap Confidence Interval in Mediation Analysis

10.31234/osf.io/pe4m2 ◽

2021 ◽

Author(s):

Tristan Tibbe ◽

Amanda Kay Montoya

Keyword(s):

Confidence Interval ◽

Error Rate ◽

Indirect Effect ◽

Mediation Analysis ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Bootstrap Confidence Interval ◽

Type I Error Rates ◽

Type I Error Rate

The bias-corrected bootstrap confidence interval (BCBCI) was once the method of choice for conducting inference on the indirect effect in mediation analysis due to its high power in small samples, but now it is criticized by methodologists for its inflated type I error rates. In its place, the percentile bootstrap confidence interval (PBCI), which does not adjust for bias, is currently the recommended inferential method for indirect effects. This study proposes two alternative bias-corrected bootstrap methods for creating confidence intervals around the indirect effect. Using a Monte Carlo simulation, these methods were compared to the BCBCI, PBCI, and a bias-corrected method introduced by Chen and Fritz (2021). The results showed that the methods perform on a continuum, where the BCBCI has the best balance (i.e., having closest to an equal proportion of CIs falling above and below the true effect), highest power, and highest type I error rate; the PBCI has the worst balance, lowest power, and lowest type I error rate; and the alternative bias-corrected methods fall between these two methods on all three performance criteria. An extension of the original simulation that compared the bias-corrected methods to the PBCI after controlling for type I error rate inflation suggests that the increased power of these methods might only be due to their higher type I error rates. Thus, if control over the type I error rate is desired, the PBCI is still the recommended method for use with the indirect effect. Future research should examine the performance of these methods in the presence of missing data, confounding variables, and other real-world complications to enhance the generalizability of these results.

Download Full-text

Comparing Alternative Corrections for Bias in the Bias-Corrected Bootstrap Test of Mediation

Evaluation & the Health Professions ◽

10.1177/01632787211024356 ◽

2021 ◽

pp. 016327872110243

Author(s):

Donna Chen ◽

Matthew S. Fritz

Keyword(s):

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Small Sample ◽

Medium Effect ◽

Type I ◽

Sample Sizes ◽

Bootstrap Test ◽

Small Effect Size ◽

Small Sample Sizes

Although the bias-corrected (BC) bootstrap is an often-recommended method for testing mediation due to its higher statistical power relative to other tests, it has also been found to have elevated Type I error rates with small sample sizes. Under limitations for participant recruitment, obtaining a larger sample size is not always feasible. Thus, this study examines whether using alternative corrections for bias in the BC bootstrap test of mediation for small sample sizes can achieve equal levels of statistical power without the associated increase in Type I error. A simulation study was conducted to compare Efron and Tibshirani’s original correction for bias, z 0, to six alternative corrections for bias: (a) mean, (b–e) Winsorized mean with 10%, 20%, 30%, and 40% trimming in each tail, and (f) medcouple (robust skewness measure). Most variation in Type I error (given a medium effect size of one regression slope and zero for the other slope) and power (small effect size in both regression slopes) was found with small sample sizes. Recommendations for applied researchers are made based on the results. An empirical example using data from the ATLAS drug prevention intervention study is presented to illustrate these results. Limitations and future directions are discussed.

Download Full-text

Likelihood ratio test between two groups of castor oil plant traits

Ciência Rural ◽

10.1590/0103-8478cr20151418 ◽

2016 ◽

Vol 46 (7) ◽

pp. 1158-1164

Author(s):

Betania Brum ◽

Sidinei José Lopes ◽

Daniel Furtado Ferreira ◽

Lindolfo Storck ◽

Alberto Cargnelutti Filho

Keyword(s):

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Type I Error ◽

Plant Traits ◽

Small Sample ◽

Ratio Test ◽

Type I ◽

Sample Sizes ◽

Oil Plant ◽

Small Sample Sizes

ABSTRACT: The likelihood ratio test (LRT), to the independence between two sets of variables, allows to identify whether there is a dependency relationship between them. The aim of this study was to calculate the type I error and power of the LRT for determining independence between two sets of variables under multivariate normal distributions in scenarios consisting of combinations of 16 sample sizes; 40 combinations of the number of variables of the two groups; and nine degrees of correlation between the variables (for the power). The rate of type I error and power were calculate at 640 and 5,760 scenarios, respectively. A performance evaluation of the LRT was conducted by computer simulation by the Monte Carlo method, using 2,000 simulations in each scenario. When the number of variables was large (24), the TRV controlled the rate of type I errors and showed high power in sizes greater than 100 samples. For small sample sizes (25, 30 and 50), the test showed good performance because the number of variables did not exceed 12.

Download Full-text

Type I Error Inflation of the Separate-Variances Welch t test with Very Small Sample Sizes when Assumptions Are Met

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1304224320 ◽

2011 ◽

Vol 10 (1) ◽

pp. 362-372 ◽

Cited By ~ 3

Author(s):

Albert K. Adusah ◽

Gordon P. Brooks

Keyword(s):

Type I Error ◽

Small Sample ◽

T Test ◽

Type I ◽

Sample Sizes ◽

Small Sample Sizes

Download Full-text

Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies

10.31222/osf.io/x6uhk ◽

2021 ◽

Author(s):

Megha Joshi ◽

James E Pustejovsky ◽

S. Natasha Beretvas

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Hypothesis Tests ◽

Type I Error Rates ◽

Meta Analyses ◽

Small Sample Correction

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.

Download Full-text

Improved Small Sample Inference Methods for a Mixed-Effects Model for Repeated Measures Approach in Incomplete Longitudinal Data Analysis

Stats ◽

10.3390/stats2020013 ◽

2019 ◽

Vol 2 (2) ◽

pp. 174-188

Author(s):

Yoshifumi Ukyo ◽

Hisashi Noma ◽

Kazushi Maruo ◽

Masahiko Gosho

Keyword(s):

Clinical Trial ◽

Repeated Measures ◽

Type I Error ◽

Mixed Effects ◽

Error Rates ◽

Small Sample ◽

Mixed Effects Model ◽

Model Complexity ◽

Type I ◽

Inference Methods

The mixed-effects model for repeated measures (MMRM) approach has been widely applied for longitudinal clinical trials. Many of the standard inference methods of MMRM could possibly lead to the inflation of type I error rates for the tests of treatment effect, when the longitudinal dataset is small and involves missing measurements. We propose two improved inference methods for the MMRM analyses, (1) the Bartlett correction with the adjustment term approximated by bootstrap, and (2) the Monte Carlo test using an estimated null distribution by bootstrap. These methods can be implemented regardless of model complexity and missing patterns via a unified computational framework. Through simulation studies, the proposed methods maintain the type I error rate properly, even for small and incomplete longitudinal clinical trial settings. Applications to a postnatal depression clinical trial are also presented.

Download Full-text

On the use of chi-square analyses in studies of resource utilization

Canadian Journal of Forest Research ◽

10.1139/x91-009 ◽

1991 ◽

Vol 21 (1) ◽

pp. 58-65 ◽

Cited By ~ 10

Author(s):

Dennis E. Jelinski

Keyword(s):

Resource Utilization ◽

Goodness Of Fit ◽

Small Sample ◽

Homogeneity Test ◽

Type I ◽

Sample Sizes ◽

Goodness Of Fit Test ◽

Chi Square ◽

Test Of Homogeneity ◽

Small Sample Sizes

Chi-square (χ2) tests are analytic procedures that are often used to test the hypothesis that animals use a particular food item or habitat in proportion to its availability. Unfortunately, several sources of error are common to the use of χ2 analysis in studies of resource utilization. Both the goodness-of-fit and homogeneity tests have been incorrectly used interchangeably when resource availabilities are estimated or known apriori. An empirical comparison of the two methods demonstrates that the χ2 test of homogeneity may generate results contrary to the χ2 goodness-of-fit test. Failure to recognize the conservative nature of the χ2 homogeneity test, when "expected" values are known apriori, may lead to erroneous conclusions owing to the increased possibility of committing a type II error. Conversely, proper use of the goodness-of-fit method is predicated on the availability of accurate maps of resource abundance, or on estimates of resource availability based on very large sample sizes. Where resource availabilities have been estimated from small sample sizes, the use of the χ2 goodness-of-fit test may lead to type I errors beyond the nominal level of α. Both tests require adherence to specific critical assumptions that often have been violated, and accordingly, these assumptions are reviewed here. Alternatives to the Pearson χ2 statistic are also discussed.

Download Full-text