Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.

Download Full-text

A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology

Multivariate Behavioral Research ◽

10.1080/00273171003680187 ◽

2010 ◽

Vol 45 (2) ◽

pp. 239-270 ◽

Cited By ~ 46

Author(s):

Guy Cafri ◽

Jeffrey D. Kromrey ◽

Michael T. Brannick

Keyword(s):

Statistical Power ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Effect Sizes ◽

Type I ◽

Power Type ◽

Type I Error Rates ◽

Meta Analyses ◽

Selection Of

Download Full-text

Evaluation of the Normality Assumption in Meta-Analyses

American Journal of Epidemiology ◽

10.1093/aje/kwz261 ◽

2019 ◽

Vol 189 (3) ◽

pp. 235-242 ◽

Cited By ~ 2

Author(s):

Chia-Chun Wang ◽

Wen-Chung Lee

Keyword(s):

Random Effects ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Type I ◽

Simulation Studies ◽

Type I Error Rates ◽

Normality Assumption ◽

Normality Test ◽

Meta Analyses

Abstract Random-effects meta-analysis is one of the mainstream methods for research synthesis. The heterogeneity in meta-analyses is usually assumed to follow a normal distribution. This is actually a strong assumption, but one that often receives little attention and is used without justification. Although methods for assessing the normality assumption are readily available, they cannot be used directly because the included studies have different within-study standard errors. Here we present a standardization framework for evaluation of the normality assumption and examine its performance in random-effects meta-analyses with simulation studies and real examples. We use both a formal statistical test and a quantile-quantile plot for visualization. Simulation studies show that our normality test has well-controlled type I error rates and reasonable power. We also illustrate the real-world significance of examining the normality assumption with examples. Investigating the normality assumption can provide valuable information for further analysis or clinical application. We recommend routine examination of the normality assumption with the proposed framework in future meta-analyses.

Download Full-text

Univariate and Multivariate Omnibus Hypothesis Tests Selected to Control Type I Error Rates When Population Variances Are Not Necessarily Equal

Review of Educational Research ◽

10.3102/00346543066002137 ◽

1996 ◽

Vol 66 (2) ◽

pp. 137-179 ◽

Cited By ~ 40

Author(s):

William T. Coombs ◽

James Algina ◽

Debra Olson Oltman

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Hypothesis Tests ◽

Type I Error Rates

Download Full-text

Performance of Monte Carlo Permutation and Approximate Tests for Multivariate Means Comparisons With Small Sample Sizes When Parametric Assumptions are Violated

Methodology ◽

10.1027/1614-2241.5.2.60 ◽

2009 ◽

Vol 5 (2) ◽

pp. 60-70 ◽

Cited By ~ 6

Author(s):

W. Holmes Finch ◽

Teresa Davenport

Keyword(s):

Monte Carlo ◽

Type I Error ◽

Permutation Tests ◽

Error Rates ◽

Covariance Matrices ◽

Small Sample ◽

Type I ◽

Permutation Testing ◽

Sample Sizes ◽

Type I Error Rates

Permutation testing has been suggested as an alternative to the standard F approximate tests used in multivariate analysis of variance (MANOVA). These approximate tests, such as Wilks’ Lambda and Pillai’s Trace, have been shown to perform poorly when assumptions of normally distributed dependent variables and homogeneity of group covariance matrices were violated. Because Monte Carlo permutation tests do not rely on distributional assumptions, they may be expected to work better than their approximate cousins when the data do not conform to the assumptions described above. The current simulation study compared the performance of four standard MANOVA test statistics with their Monte Carlo permutation-based counterparts under a variety of conditions with small samples, including conditions when the assumptions were met and when they were not. Results suggest that for sample sizes of 50 subjects, power is very low for all the statistics. In addition, Type I error rates for both the approximate F and Monte Carlo tests were inflated under the condition of nonnormal data and unequal covariance matrices. In general, the performance of the Monte Carlo permutation tests was slightly better in terms of Type I error rates and power when both assumptions of normality and homogeneous covariance matrices were not met. It should be noted that these simulations were based upon the case with three groups only, and as such results presented in this study can only be generalized to similar situations.

Download Full-text

Evaluating Meta-Analytic Methods to Detect Selective Reporting in the Presence of Dependent Effect Sizes

10.31222/osf.io/vqp8u ◽

2019 ◽

Author(s):

Melissa Angelina Rodgers ◽

James E Pustejovsky

Keyword(s):

Effect Size ◽

Type I Error ◽

Error Rates ◽

Effect Sizes ◽

Selective Reporting ◽

Ratio Test ◽

Type I ◽

Dependent Effect ◽

Type I Error Rates ◽

Dependent Effect Sizes

Selective reporting of results based on their statistical significance threatens the validity of meta-analytic findings. A variety of techniques for detecting selective reporting, publication bias, or small-study effects are available and are routinely used in research syntheses. Most such techniques are univariate, in that they assume that each study contributes a single, independent effect size estimate to the meta-analysis. In practice, however, studies often contribute multiple, statistically dependent effect size estimates, such as for multiple measures of a common outcome construct. Many methods are available for meta-analyzing dependent effect sizes, but methods for investigating selective reporting while also handling effect size dependencies require further investigation. Using Monte Carlo simulations, we evaluate three available univariate tests for small-study effects or selective reporting, including the Trim & Fill test, Egger's regression test, and a likelihood ratio test from a three-parameter selection model (3PSM), when dependence is ignored or handled using ad hoc techniques. We also examine two variants of Egger’s regression test that incorporate robust variance estimation (RVE) or multi-level meta-analysis (MLMA) to handle dependence. Simulation results demonstrate that ignoring dependence inflates Type I error rates for all univariate tests. Variants of Egger's regression maintain Type I error rates when dependent effect sizes are sampled or handled using RVE or MLMA. The 3PSM likelihood ratio test does not fully control Type I error rates. With the exception of the 3PSM, all methods have limited power to detect selection bias except under strong selection for statistically significant effects.

Download Full-text

Type I Error Control for Cluster Randomized Trials Under Varying Small Sample Structures

10.21203/rs.2.17855/v1 ◽

2019 ◽

Author(s):

Joshua Nugent ◽

Ken Kleinman

Keyword(s):

Cluster Size ◽

Error Control ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Number Of Clusters ◽

Type I Error Rates ◽

Wald Tests ◽

Cluster Randomized

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of interactions of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates have not been well studied. Reviews of published CRTs find that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices. Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronouced when the cluster size is relatively large. Wald tests with the Between-Within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small. Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.

Download Full-text

Type I Error Control for Cluster Randomized Trials Under Varying Small Sample Structures

10.21203/rs.2.17855/v2 ◽

2020 ◽

Author(s):

Joshua Nugent ◽

Ken Kleinman

Keyword(s):

Cluster Size ◽

Error Control ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Number Of Clusters ◽

Type I Error Rates ◽

Wald Tests ◽

Cluster Randomized

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of different combinations of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates has not been well studied. Reviews of published CRTs nd that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices.Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronounced when the cluster size is relatively large. Wald tests with the between-within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small.Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.

Download Full-text

The Effect of Publication Bias on the Assessment of Heterogeneity

10.31219/osf.io/gv25c ◽

2017 ◽

Author(s):

Hilde Augusteijn ◽

Robbie Cornelis Maria van Aert ◽

Marcel A. L. M. van Assen

Keyword(s):

Publication Bias ◽

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Population Heterogeneity ◽

Type I ◽

Monte Carlo Simulation Study ◽

True Effect Size ◽

True Effect

One of the main goals of meta-analysis is to test and estimate the heterogeneity of effect size. We examined the effect of publication bias on the Q-test and assessments of heterogeneity, as a function of true heterogeneity, publication bias, true effect size, number of studies, and variation of sample sizes. The expected values of heterogeneity measures H2 and I2 were analytically derived, and the power and the type I error rate of the Q-test were examined in a Monte-Carlo simulation study. Our results show that the effect of publication bias on the Q-test and assessment of heterogeneity is large, complex, and non-linear. Publication bias can both dramatically decrease and increase heterogeneity. Extreme homogeneity can occur even when the population heterogeneity is large. Particularly if the number of studies is large and population effect size is small, publication bias can cause both extreme type I error rates and power of the Q-test close to 0 or 1. We therefore conclude that the Q-test of homogeneity and heterogeneity measures H2 and I2 are generally not valid in assessing and testing heterogeneity when publication bias is present, especially when the true effect size is small and the number of studies is large. We introduce a web application, Q-sense, which can be used to assess the sensitivity of the Q-test to publication bias, and we apply it to two published meta-analysis. Meta-analytic methods should be enhanced in order to be able to deal with publication bias in their assessment and tests of heterogeneity.

Download Full-text

Cannons and sparrows II: the enhanced Bernoulli exact method for determining statistical significance and effect size in the meta-analysis of k 2 × 2 tables

Emerging Themes in Epidemiology ◽

10.1186/s12982-021-00101-8 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Lawrence M. Paul

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Statistical Significance ◽

Statistical Error ◽

Exact Method ◽

Type I ◽

Exact Test ◽

Inverse Variance ◽

Meta Analyses

Abstract Background The use of meta-analysis to aggregate the results of multiple studies has increased dramatically over the last 40 years. For homogeneous meta-analysis, the Mantel–Haenszel technique has typically been utilized. In such meta-analyses, the effect size across the contributing studies of the meta-analysis differs only by statistical error. If homogeneity cannot be assumed or established, the most popular technique developed to date is the inverse-variance DerSimonian and Laird (DL) technique (DerSimonian and Laird, in Control Clin Trials 7(3):177–88, 1986). However, both of these techniques are based on large sample, asymptotic assumptions. At best, they are approximations especially when the number of cases observed in any cell of the corresponding contingency tables is small. Results This research develops an exact, non-parametric test for evaluating statistical significance and a related method for estimating effect size in the meta-analysis of k 2 × 2 tables for any level of heterogeneity as an alternative to the asymptotic techniques. Monte Carlo simulations show that even for large values of heterogeneity, the Enhanced Bernoulli Technique (EBT) is far superior at maintaining the pre-specified level of Type I Error than the DL technique. A fully tested implementation in the R statistical language is freely available from the author. In addition, a second related exact test for estimating the Effect Size was developed and is also freely available. Conclusions This research has developed two exact tests for the meta-analysis of dichotomous, categorical data. The EBT technique was strongly superior to the DL technique in maintaining a pre-specified level of Type I Error even at extremely high levels of heterogeneity. As shown, the DL technique demonstrated many large violations of this level. Given the various biases towards finding statistical significance prevalent in epidemiology today, a strong focus on maintaining a pre-specified level of Type I Error would seem critical. In addition, a related exact method for estimating the Effect Size was developed.

Download Full-text

Random-effects meta-analysis: the number of studies matters

Statistical Methods in Medical Research ◽

10.1177/0962280215583568 ◽

2015 ◽

Vol 26 (3) ◽

pp. 1500-1518 ◽

Cited By ~ 70

Author(s):

Annamaria Guolo ◽

Cristiano Varin

Keyword(s):

Random Effects ◽

Type I Error ◽

Meta Analysis ◽

Practical Interest ◽

Error Rates ◽

Type I ◽

Model Framework ◽

Random Effects Models ◽

Meta Analyses ◽

The Impact

This paper investigates the impact of the number of studies on meta-analysis and meta-regression within the random-effects model framework. It is frequently neglected that inference in random-effects models requires a substantial number of studies included in meta-analysis to guarantee reliable conclusions. Several authors warn about the risk of inaccurate results of the traditional DerSimonian and Laird approach especially in the common case of meta-analysis involving a limited number of studies. This paper presents a selection of likelihood and non-likelihood methods for inference in meta-analysis proposed to overcome the limitations of the DerSimonian and Laird procedure, with a focus on the effect of the number of studies. The applicability and the performance of the methods are investigated in terms of Type I error rates and empirical power to detect effects, according to scenarios of practical interest. Simulation studies and applications to real meta-analyses highlight that it is not possible to identify an approach uniformly superior to alternatives. The overall recommendation is to avoid the DerSimonian and Laird method when the number of meta-analysis studies is modest and prefer a more comprehensive procedure that compares alternative inferential approaches. R code for meta-analysis according to all of the inferential methods examined in the paper is provided.

Download Full-text