Canons and Sparrows II*: The Enhanced Bernoulli Exact Method for Determining Statistical Significance and Effect Size in the Meta-Analysis of k 2 x 2 Tables

Statistical Error ◽

Type I ◽

Exact Test ◽

Strong Focus ◽

Inverse Variance ◽

Abstract BackgroundThe use of meta-analysis to aggregate the results of multiple studies has increased dramatically over the last 40 years. For homogeneous meta-analysis, the Mantel-Haenszel technique has typically been utilized. In such meta-analyses, the effect size across the contributing studies of the meta-analysis differ only by statistical error. If homogeneity cannot be assumed or established, the most popular technique developed to date is the inverse-variance DerSimonian & Laird (DL) technique [1]. However, both of these techniques are based on large sample, asymptotic assumptions. At best, they are approximations especially when the number of cases observed in any cell of the corresponding contingency tables is small.ResultsThis research develops an exact, non-parametric test for evaluating statistical significance and a related method for estimating effect size in the meta-analysis of k 2 x 2 tables for any level of heterogeneity as an alternative to the asymptotic techniques. Monte Carlo simulations show that even for large values of heterogeneity, the Enhanced Bernoulli Technique (EBT) is far superior at maintaining the pre-specified level of Type I Error than the DL technique. A fully tested implementation in the R statistical language is freely available from the author. In addition, a second related exact test for estimating the Effect Size was developed and is also freely available.ConclusionsThis research has developed two exact tests for the meta-analysis of dichotomous, categorical data. The EBT technique was strongly superior to the DL technique in maintaining a pre-specified level of Type I Error even at extremely high levels of heterogeneity. As shown, the DL technique demonstrated many large violations of this level. Given the various biases towards finding statistical significance prevalent in epidemiology today, a strong focus on maintaining a pre-specified level of Type I Error would seem critical.

Cannons and sparrows II: the enhanced Bernoulli exact method for determining statistical significance and effect size in the meta-analysis of k 2 × 2 tables

Emerging Themes in Epidemiology ◽

10.1186/s12982-021-00101-8 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Lawrence M. Paul

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Statistical Error ◽

Exact Method ◽

Type I ◽

Exact Test ◽

Inverse Variance ◽

Abstract Background The use of meta-analysis to aggregate the results of multiple studies has increased dramatically over the last 40 years. For homogeneous meta-analysis, the Mantel–Haenszel technique has typically been utilized. In such meta-analyses, the effect size across the contributing studies of the meta-analysis differs only by statistical error. If homogeneity cannot be assumed or established, the most popular technique developed to date is the inverse-variance DerSimonian and Laird (DL) technique (DerSimonian and Laird, in Control Clin Trials 7(3):177–88, 1986). However, both of these techniques are based on large sample, asymptotic assumptions. At best, they are approximations especially when the number of cases observed in any cell of the corresponding contingency tables is small. Results This research develops an exact, non-parametric test for evaluating statistical significance and a related method for estimating effect size in the meta-analysis of k 2 × 2 tables for any level of heterogeneity as an alternative to the asymptotic techniques. Monte Carlo simulations show that even for large values of heterogeneity, the Enhanced Bernoulli Technique (EBT) is far superior at maintaining the pre-specified level of Type I Error than the DL technique. A fully tested implementation in the R statistical language is freely available from the author. In addition, a second related exact test for estimating the Effect Size was developed and is also freely available. Conclusions This research has developed two exact tests for the meta-analysis of dichotomous, categorical data. The EBT technique was strongly superior to the DL technique in maintaining a pre-specified level of Type I Error even at extremely high levels of heterogeneity. As shown, the DL technique demonstrated many large violations of this level. Given the various biases towards finding statistical significance prevalent in epidemiology today, a strong focus on maintaining a pre-specified level of Type I Error would seem critical. In addition, a related exact method for estimating the Effect Size was developed.

Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies

10.31222/osf.io/x6uhk ◽

2021 ◽

Author(s):

Megha Joshi ◽

James E Pustejovsky ◽

S. Natasha Beretvas

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Hypothesis Tests ◽

Type I Error Rates ◽

Meta Analyses ◽

Small Sample Correction

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.

How to Detect Publication Bias in Psychological Research

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000386 ◽

2019 ◽

Vol 227 (4) ◽

pp. 261-279 ◽

Cited By ~ 2

Author(s):

Frank Renkewitz ◽

Melanie Keiner

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Psychological Research ◽

Type I ◽

True Effect Size ◽

Questionable Research Practices ◽

True Effect ◽

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.

Can Reliance be Placed on a Single Meta-Analysis?

Australian & New Zealand Journal of Psychiatry ◽

10.3109/00048679009077710 ◽

1990 ◽

Vol 24 (3) ◽

pp. 405-415 ◽

Cited By ~ 16

Author(s):

Nathaniel McConaghy

Keyword(s):

Literature Review ◽

Effect Size ◽

Meta Analysis ◽

Effect Sizes ◽

Control Groups ◽

Consistent Finding ◽

Placebo Controls ◽

Effect Of Treatment ◽

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.

Combining the strengths of inverse-variance weighting and Egger regression in Mendelian randomization using a mixture of regressions model

PLoS Genetics ◽

10.1371/journal.pgen.1009922 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009922

Author(s):

Zhaotong Lin ◽

Yangqing Deng ◽

Wei Pan

Keyword(s):

Large Scale ◽

Type I Error ◽

Mendelian Randomization ◽

Meta Analysis ◽

Type I ◽

Perturbation Scheme ◽

Analysis Model ◽

Component Mixture ◽

Inverse Variance ◽

Summary Data

With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference.

Low statistical power in biomedical science: a review of three human research domains

Royal Society Open Science ◽

10.1098/rsos.160254 ◽

2017 ◽

Vol 4 (2) ◽

pp. 160254 ◽

Cited By ~ 71

Author(s):

Estelle Dumas-Mallet ◽

Katherine S. Button ◽

Thomas Boraud ◽

Francois Gonon ◽

Marcus R. Munafò

Keyword(s):

Effect Size ◽

Statistical Power ◽

Meta Analysis ◽

Average Power ◽

Biomedical Science ◽

Significant Finding ◽

Biomedical Sciences ◽

True Effect Size ◽

Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result. We conducted a review of meta-analyses of studies investigating the association of biological, environmental or cognitive parameters with neurological, psychiatric and somatic diseases, excluding treatment studies, in order to estimate the average statistical power across these domains. Taking the effect size indicated by a meta-analysis as the best estimate of the likely true effect size, and assuming a threshold for declaring statistical significance of 5%, we found that approximately 50% of studies have statistical power in the 0–10% or 11–20% range, well below the minimum of 80% that is often considered conventional. Studies with low statistical power appear to be common in the biomedical sciences, at least in the specific subject areas captured by our search strategy. However, we also observe evidence that this depends in part on research methodology, with candidate gene studies showing very low average power and studies using cognitive/behavioural measures showing high average power. This warrants further investigation.

The Effect of Publication Bias on the Assessment of Heterogeneity

10.31219/osf.io/gv25c ◽

2017 ◽

Author(s):

Hilde Augusteijn ◽

Robbie Cornelis Maria van Aert ◽

Marcel A. L. M. van Assen

Keyword(s):

Publication Bias ◽

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Population Heterogeneity ◽

Type I ◽

Monte Carlo Simulation Study ◽

True Effect Size ◽

True Effect

One of the main goals of meta-analysis is to test and estimate the heterogeneity of effect size. We examined the effect of publication bias on the Q-test and assessments of heterogeneity, as a function of true heterogeneity, publication bias, true effect size, number of studies, and variation of sample sizes. The expected values of heterogeneity measures H2 and I2 were analytically derived, and the power and the type I error rate of the Q-test were examined in a Monte-Carlo simulation study. Our results show that the effect of publication bias on the Q-test and assessment of heterogeneity is large, complex, and non-linear. Publication bias can both dramatically decrease and increase heterogeneity. Extreme homogeneity can occur even when the population heterogeneity is large. Particularly if the number of studies is large and population effect size is small, publication bias can cause both extreme type I error rates and power of the Q-test close to 0 or 1. We therefore conclude that the Q-test of homogeneity and heterogeneity measures H2 and I2 are generally not valid in assessing and testing heterogeneity when publication bias is present, especially when the true effect size is small and the number of studies is large. We introduce a web application, Q-sense, which can be used to assess the sensitivity of the Q-test to publication bias, and we apply it to two published meta-analysis. Meta-analytic methods should be enhanced in order to be able to deal with publication bias in their assessment and tests of heterogeneity.

Visual Inference for the Funnel Plot in Meta-Analysis

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000358 ◽

2019 ◽

Vol 227 (1) ◽

pp. 83-89 ◽

Cited By ~ 2

Author(s):

Michael Kossmeier ◽

Ulrich S. Tran ◽

Martin Voracek

Keyword(s):

Visual Inspection ◽

Type I Error ◽

Statistical Tests ◽

Meta Analysis ◽

Real Data ◽

Type I ◽

Funnel Plot ◽

Funnel Plots ◽

Meta Analyses ◽

Open Nature

Abstract. The funnel plot is widely used in meta-analyses to assess potential publication bias. However, experimental evidence suggests that informal, mere visual, inspection of funnel plots is frequently prone to incorrect conclusions, and formal statistical tests (Egger regression and others) entirely focus on funnel plot asymmetry. We suggest using the visual inference framework with funnel plots routinely, including for didactic purposes. In this framework, the type I error is controlled by design, while the explorative, holistic, and open nature of visual graph inspection is preserved. Specifically, the funnel plot of the actually observed data is presented simultaneously, in a lineup, with null funnel plots showing data simulated under the null hypothesis. Only when the real data funnel plot is identifiable from all the funnel plots presented, funnel plot-based conclusions might be warranted. Software to implement visual funnel plot inference is provided via a tailored R function.

Random-effects meta-analysis: the number of studies matters

Statistical Methods in Medical Research ◽

10.1177/0962280215583568 ◽

2015 ◽

Vol 26 (3) ◽

pp. 1500-1518 ◽

Cited By ~ 70

Author(s):

Annamaria Guolo ◽

Cristiano Varin

Keyword(s):

Random Effects ◽

Type I Error ◽

Meta Analysis ◽

Practical Interest ◽

Error Rates ◽

Type I ◽

Model Framework ◽

Random Effects Models ◽

Meta Analyses ◽

The Impact

This paper investigates the impact of the number of studies on meta-analysis and meta-regression within the random-effects model framework. It is frequently neglected that inference in random-effects models requires a substantial number of studies included in meta-analysis to guarantee reliable conclusions. Several authors warn about the risk of inaccurate results of the traditional DerSimonian and Laird approach especially in the common case of meta-analysis involving a limited number of studies. This paper presents a selection of likelihood and non-likelihood methods for inference in meta-analysis proposed to overcome the limitations of the DerSimonian and Laird procedure, with a focus on the effect of the number of studies. The applicability and the performance of the methods are investigated in terms of Type I error rates and empirical power to detect effects, according to scenarios of practical interest. Simulation studies and applications to real meta-analyses highlight that it is not possible to identify an approach uniformly superior to alternatives. The overall recommendation is to avoid the DerSimonian and Laird method when the number of meta-analysis studies is modest and prefer a more comprehensive procedure that compares alternative inferential approaches. R code for meta-analysis according to all of the inferential methods examined in the paper is provided.

Investigating heterogeneity in meta-analysis of studies with rare events

METRON ◽

10.1007/s40300-021-00211-y ◽

2021 ◽

Author(s):

Dankmar Böhning ◽

Heinz Holling ◽

Walailuck Böhning ◽

Patarawan Sangnawakij

Keyword(s):

Risk Ratio ◽

Type I Error ◽

Rare Events ◽

Meta Analysis ◽

Rare Event ◽

Control Group ◽

Type I ◽

Bootstrap Approach ◽

Testing Homogeneity ◽

AbstractIn many meta-analyses, the variable of interest is frequently a count outcome reported in an intervention and a control group. Single- or double-zero studies are often observed in this type of data. Given this setting, the well-known Cochran’s Q statistic for testing homogeneity becomes undefined. In this paper, we propose two statistics for testing homogeneity of the risk ratio, particularly for application in the case of rare events in meta-analysis. The first one is a chi-square type statistic. It is constructed based on information of the conditional probability of the number of events in the treatment group given the total number of events. The second one is a likelihood ratio statistic, derived from the logistic regression models allowing fixed and random effects for the risk ratio. Both proposed statistics are well defined even in the situation of single-zero studies. In a simulation study, the proposed tests show a performance better than the traditional test in terms of type I error and power of the test under common and rare event situations. However, as the performance of the two newly proposed tests is still unsatisfactory in the very rare events setting, we suggest a bootstrap approach that does not rely on asymptotic distributional theory and it is shown that the bootstrap approach performs well in terms of type I error. Furthermore, a number of empirical meta-analyses are used to illustrate the methods.