Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: A meta-analysis.

Meta-analysis combines pertinent information from existing studies to provide an overall estimate of population parameters/effect sizes, as well as to quantify and explain the differences between studies. However, testing the between-study heterogeneity is one of the most troublesome topics in meta-analysis research. Additionally, no methods have been proposed to test whether the size of the heterogeneity is larger than a specific level. The existing methods, such as the Q test and likelihood ratio (LR) tests, are criticized for their failure to control the Type I error rate and/or failure to attain enough statistical power. Although better reference distribution approximations have been proposed in the literature, the expression is complicated and the application is limited. In this article, we propose bootstrap based heterogeneity tests combining the restricted maximum likelihood (REML) ratio test or Q test with bootstrap procedures, denoted as B-REML-LRT and B-Q respectively. Simulation studies were conducted to examine and compare the performance of the proposed methods with the regular LR tests, the regular Q test, and the improved Q test in both the random-effects meta-analysis and mixed-effects meta-analysis. Based on the results of Type I error rates and statistical power, B-Q is recommended. An R package \mathtt{boot.heterogeneity} is provided to facilitate the implementation of the proposed method.

Download Full-text

Evaluating Methods of Correcting for Multiple Comparisons Implemented in SPM12 in Social Neuroscience fMRI Studies: An Example from Moral Psychology

10.1101/129734 ◽

2017 ◽

Author(s):

Hyemin Han ◽

Andrea L. Glenn

Keyword(s):

Moral Judgment ◽

Statistical Power ◽

Type I Error ◽

Meta Analysis ◽

Multiple Comparisons ◽

False Positives ◽

Social Neuroscience ◽

Type I ◽

Random Field Theory ◽

Advantages And Disadvantages

AbstractIn fMRI research, the goal of correcting for multiple comparisons is to identify areas of activity that reflect true effects, and thus would be expected to replicate in future studies. Finding an appropriate balance between trying to minimize false positives (Type I error) while not being too stringent and omitting true effects (Type II error) can be challenging. Furthermore, the advantages and disadvantages of these types of errors may differ for different areas of study. In many areas of social neuroscience that involve complex processes and considerable individual differences, such as the study of moral judgment, effects are typically smaller and statistical power weaker, leading to the suggestion that less stringent corrections that allow for more sensitivity may be beneficial, but also result in more false positives. Using moral judgment fMRI data, we evaluated four commonly used methods for multiple comparison correction implemented in SPM12 by examining which method produced the most precise overlap with results from a meta-analysis of relevant studies and with results from nonparametric permutation analyses. We found that voxel-wise thresholding with family-wise error correction based on Random Field Theory provides a more precise overlap (i.e., without omitting too few regions or encompassing too many additional regions) than either clusterwise thresholding, Bonferroni correction, or false discovery rate correction methods.

Download Full-text

An Application of Sequential Meta-Analysis to Gene Expression Studies

Cancer Informatics ◽

10.4137/cin.s27718 ◽

2015 ◽

Vol 14s5 ◽

pp. CIN.S27718 ◽

Cited By ~ 1

Author(s):

Putri W. Novianti ◽

Ingeborg Van Der Tweel ◽

Victor L. Jong ◽

Kit C. B. Roes ◽

Marinus J. C. Eijkemans

Keyword(s):

Gene Expression ◽

Statistical Power ◽

Type I Error ◽

Meta Analysis ◽

Gene List ◽

Real Life ◽

Type I ◽

Optimal Subset ◽

Expression Studies ◽

Gene Expression Studies

Most of the discoveries from gene expression data are driven by a study claiming an optimal subset of genes that play a key role in a specific disease. Meta-analysis of the available datasets can help in getting concordant results so that a real-life application may be more successful. Sequential meta-analysis (SMA) is an approach for combining studies in chronological order while preserving the type I error and pre-specifying the statistical power to detect a given effect size. We focus on the application of SMA to find gene expression signatures across experiments in acute myeloid leukemia. SMA of seven raw datasets is used to evaluate whether the accumulated samples show enough evidence or more experiments should be initiated. We found 313 differentially expressed genes, based on the cumulative information of the experiments. SMA offers an alternative to existing methods in generating a gene list by evaluating the adequacy of the cumulative information.

Download Full-text

Heterogeneous Heterogeneity by Default: Testing Categorical Moderators in Random-effects Meta-Analysis

10.31234/osf.io/tqcka ◽

2021 ◽

Author(s):

Josue E. Rodriguez ◽

Donald Ray Williams ◽

Paul - Christian Bürkner

Keyword(s):

Statistical Power ◽

Type I Error ◽

Meta Analysis ◽

Mixed Effects ◽

Error Rates ◽

Mixed Effects Model ◽

Scale Model ◽

Type I ◽

Sample Sizes ◽

Unequal Variances

Categorical moderators are often included in mixed-effects meta-analysis to explain heterogeneity in effect sizes. An assumption in tests of moderator effects is that of a constant between-study variance across all levels of the moderator. Although it rarely receives serious thought, there can be drastic ramifications to upholding this assumption. We propose that researchers should instead assume unequal between-study variances by default. To achieve this, we suggest using a mixed-effects location-scale model (MELSM) to allow group-specific estimates for the between-study variances. In two extensive simulation studies, we show that in terms of Type I error and statistical power, nearly nothing is lost by using the MELSM for moderator tests, but there can be serious costs when a mixed-effects model with equal variances is used. Most notably, in scenarios with balanced sample sizes or equal between-study variance, the Type I error and power rates are nearly identical between the mixed-effects model and the MELSM. On the other hand, with imbalanced sample sizes and unequal variances, the Type I error rate under the mixed-effects model can be grossly inflated or overly conservative, whereas the MELSM excellently controlled the Type I error across all scenarios. With respect to power, the MELSM had comparable or higher power than the mixed-effects model in all conditions where the latter produced valid (i.e., not inflated) Type 1 error rates. Altogether, our results strongly support that assuming unequal between-study variances is preferred as a default strategy when testing categorical moderators

Download Full-text

How to Detect Publication Bias in Psychological Research

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000386 ◽

2019 ◽

Vol 227 (4) ◽

pp. 261-279 ◽

Cited By ~ 2

Author(s):

Frank Renkewitz ◽

Melanie Keiner

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Psychological Research ◽

Type I ◽

True Effect Size ◽

Questionable Research Practices ◽

True Effect ◽

Meta Analyses

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.

Download Full-text

Supplemental Material for Meta-Analysis to Integrate Effect Sizes Within an Article: Possible Misuse and Type I Error Inflation

Journal of Experimental Psychology General ◽

10.1037/xge0000159.supp ◽

2016 ◽

Keyword(s):

Type I Error ◽

Meta Analysis ◽

Effect Sizes ◽

Type I

Download Full-text

The Robustness of the Likelihood Ratio Chi-Square Test for Structural Equation Models: A Meta-Analysis

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986026001105 ◽

2001 ◽

Vol 26 (1) ◽

pp. 105-132 ◽

Cited By ~ 30

Author(s):

Douglas A. Powell ◽

William D. Schafer

Keyword(s):

Structural Equation ◽

Structural Equation Models ◽

Type I Error ◽

Meta Analysis ◽

Generalized Least Squares ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Distribution Free ◽

Projection Techniques

The robustness literature for the structural equation model was synthesized following the method of Harwell which employs meta-analysis as developed by Hedges and Vevea. The study focused on the explanation of empirical Type I error rates for six principal classes of estimators: two that assume multivariate normality (maximum likelihood and generalized least squares), elliptical estimators, two distribution-free estimators (asymptotic and others), and latent projection. Generally, the chi-square tests for overall model fit were found to be sensitive to non-normality and the size of the model for all estimators (with the possible exception of the elliptical estimators with respect to model size and the latent projection techniques with respect to non-normality). The asymptotic distribution-free (ADF) and latent projection techniques were also found to be sensitive to sample sizes. Distribution-free methods other than ADF showed, in general, much less sensitivity to all factors considered.

Download Full-text

Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies

10.31222/osf.io/x6uhk ◽

2021 ◽

Author(s):

Megha Joshi ◽

James E Pustejovsky ◽

S. Natasha Beretvas

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Hypothesis Tests ◽

Type I Error Rates ◽

Meta Analyses ◽

Small Sample Correction

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.

Download Full-text