Power Statistics for Meta-analysis: Tests for Mean Effects and Homogeneity

Author(s):  
Marc J. Lajeunesse

The common justification for meta-analysis is the increased statistical power to detect effects over what is obtained from individual studies. For ecologists and evolutionary biologists, the statistical power of meta-analysis is important because effect sizes are usually relatively small in these fields, and experimental sample sizes are often limited for logistic reasons. Consequently, many studies lack sufficient power to detect an experimental effect should it exist. This chapter provides a brief overview of the factors that determine the statistical power of meta-analysis. It presents statistics for calculating the power of pooled effect sizes to evaluate nonzero effects and the power of within- and between-study homogeneity tests. It also surveys ways to improve the statistical power of meta-analysis, and ends with a discussion on the overall utility of power statistics for meta-analysis.

2021 ◽  
Vol 30 ◽  
Author(s):  
Pim Cuijpers ◽  
Jason W. Griffin ◽  
Toshi A. Furukawa

Abstract One of the most used methods to examine sources of heterogeneity in meta-analyses is the so-called ‘subgroup analysis’. In a subgroup analysis, the included studies are divided into two or more subgroups, and it is tested whether the pooled effect sizes found in these subgroups differ significantly from each other. Subgroup analyses can be considered as a core component of most published meta-analyses. One important problem of subgroup analyses is the lack of statistical power to find significant differences between subgroups. In this paper, we explore the power problems of subgroup analyses in more detail, using ‘metapower’, a recently developed statistical package in R to examine power in meta-analyses, including subgroup analyses. We show that subgroup analyses require many more included studies in a meta-analysis than are needed for the main analyses. We work out an example of an ‘average’ meta-analysis, in which a subgroup analysis requires 3–4 times the number of studies that are needed for the main analysis to have sufficient power. This number of studies increases exponentially with decreasing effect sizes and when the studies are not evenly divided over the subgroups. Higher heterogeneity also requires increasing numbers of studies. We conclude that subgroup analyses remain an important method to examine potential sources of heterogeneity in meta-analyses, but that meta-analysts should keep in mind that power is very low for most subgroup analyses. As in any statistical evaluation, researchers should not rely on a test and p-value to interpret results, but should compare the confidence intervals and interpret results carefully.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lisa Holper

Abstract Background Conditional power of network meta-analysis (NMA) can support the planning of randomized controlled trials (RCTs) assessing medical interventions. Conditional power is the probability that updating existing inconclusive evidence in NMA with additional trial(s) will result in conclusive evidence, given assumptions regarding trial design, anticipated effect sizes, or event probabilities. Methods The present work aimed to estimate conditional power for potential future trials on antidepressant treatments. Existing evidence was based on a published network of 502 RCTs conducted between 1979-2018 assessing acute antidepressant treatment in major depressive disorder (MDD). Primary outcomes were efficacy in terms of the symptom change on the Hamilton Depression Scale (HAMD) and tolerability in terms of the dropout rate due to adverse events. The network compares 21 antidepressants consisting of 231 relative treatment comparisons, 164 (efficacy) and 127 (tolerability) of which are currently assumed to have inconclusive evidence. Results Required sample sizes to achieve new conclusive evidence with at least 80% conditional power were estimated to range between N = 894 - 4190 (efficacy) and N = 521 - 1246 (tolerability). Otherwise, sample sizes ranging between N = 49 - 485 (efficacy) and N = 40 - 320 (tolerability) may require stopping for futility based on a boundary at 20% conditional power. Optimizing trial designs by considering multiple trials that contribute both direct and indirect evidence, anticipating alternative effect sizes or alternative event probabilities, may increase conditional power but required sample sizes remain high. Antidepressants having the greatest conditional power associated with smallest required sample sizes were identified as those on which current evidence is low, i.e., clomipramine, levomilnacipran, milnacipran, nefazodone, and vilazodone, with respect to both outcomes. Conclusions The present results suggest that conditional power to achieve new conclusive evidence in ongoing or future trials on antidepressant treatments is low. Limiting the use of the presented conditional power analysis are primarily due to the estimated large sample sizes which would be required in future trials as well as due to the well-known small effect sizes in antidepressant treatments. These findings may inform researchers and decision-makers regarding the clinical relevance and justification of research in ongoing or future antidepressant RCTs in MDD.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


1995 ◽  
Vol 55 (5) ◽  
pp. 773-776 ◽  
Author(s):  
Bernard S. Gorman ◽  
Louis H. Primavera ◽  
David B. Allison

Author(s):  
Michael D. Jennions ◽  
Christopher J. Lortie ◽  
Julia Koricheva

This chapter begins with a brief review of why effect sizes and their variances are more informative than P-values. It then discusses how meta-analysis promotes “effective thinking” that can change approaches to several commonplace problems. Specifically, it addresses the issues of (1) exemplar studies versus average trends, (2) resolving “conflict” between specific studies, (3) presenting results, (4) deciding on the level at which to replicate studies, (5) understanding the constraints imposed by low statistical power, and (6) asking broad-scale questions that cannot be resolved in a single study. The chapter focuses on estimating effect sizes as a key outcome of meta-analysis, but acknowledges that other outcomes might be of more interest in other situations.


2020 ◽  
Vol 63 (5) ◽  
pp. 1572-1580
Author(s):  
Laura Gaeta ◽  
Christopher R. Brydges

Purpose The purpose was to examine and determine effect size distributions reported in published audiology and speech-language pathology research in order to provide researchers and clinicians with more relevant guidelines for the interpretation of potentially clinically meaningful findings. Method Cohen's d, Hedges' g, Pearson r, and sample sizes ( n = 1,387) were extracted from 32 meta-analyses in journals in speech-language pathology and audiology. Percentile ranks (25th, 50th, 75th) were calculated to determine estimates for small, medium, and large effect sizes, respectively. The median sample size was also used to explore statistical power for small, medium, and large effect sizes. Results For individual differences research, effect sizes of Pearson r = .24, .41, and .64 were found. For group differences, Cohen's d /Hedges' g = 0.25, 0.55, and 0.93. These values can be interpreted as small, medium, and large effect sizes in speech-language pathology and audiology. The majority of published research was inadequately powered to detect a medium effect size. Conclusions Effect size interpretations from published research in audiology and speech-language pathology were found to be underestimated based on Cohen's (1988, 1992) guidelines. Researchers in the field should consider using Pearson r = .25, .40, and .65 and Cohen's d /Hedges' g = 0.25, 0.55, and 0.95 as small, medium, and large effect sizes, respectively, and collect larger sample sizes to ensure that both significant and nonsignificant findings are robust and replicable.


2014 ◽  
Vol 115 (1) ◽  
pp. 276-278 ◽  
Author(s):  
Derrick C. McLean ◽  
Benjamin R. Thomas

A wide literature of the unsuccessful treatment of writer's block has emerged since the early 1970's. Findings within this literature seem to confer generalizability of this procedure; however, small sample sizes may limit this interpretation. This meta-analysis independently analyzed effect sizes for “self-treatments” and “group-treatments” using number of words in the body of the publication as indication of a failure to treat writer's block. Results of the reported findings suggest that group-treatments tend to be slightly more unsuccessful than self-treatments.


Author(s):  
Yayouk E. Willems ◽  
Jian-bin Li ◽  
Anne M. Hendriks ◽  
Meike Bartels ◽  
Catrin Finkenauer

Theoretical studies propose an association between family violence and low self-control in adolescence, yet empirical findings of this association are inconclusive. The aim of the present research was to systematically summarize available findings on the relation between family violence and self-control across adolescence. We included 27 studies with 143 effect sizes, representing more than 25,000 participants of eight countries from early to late adolescence. Applying a multi-level meta-analyses, taking dependency between effect sizes into account while retaining statistical power, we examined the magnitude and direction of the overall effect size. Additionally, we investigated whether theoretical moderators (e.g., age, gender, country), and methodological moderators (cross-sectional/longitudinal, informant) influenced the magnitude of the association between family violence and self-control. Our results revealed that family violence and self-control have a small to moderate significant negative association (r = -.191). This association did not vary across gender, country, and informants. The strength of the association, however, decreased with age and in longitudinal studies. This finding provides evidence that researchers and clinicians may expect low self-control in the wake of family violence, especially in early adolescence. Recommendations for future research in the area are discussed.


2004 ◽  
Vol 67 (11) ◽  
pp. 2587-2595 ◽  
Author(s):  
SUMEET R. PATIL ◽  
ROBERTA MORALES ◽  
SHERYL CATES ◽  
DONALD ANDERSON ◽  
DAVID KENDALL

Meta-analysis provides a structured method for combining results from several studies and accounting for and differentiating between study variables. Numerous food safety consumer research studies often focus on specific behaviors among different subpopulations but fail to provide a holistic picture of consumer behavior. Combining information from several studies provides a broader understanding of differences and trends among demographic subpopulations, and thus, helps in developing effective risk communication messages. In the illustrated example, raw/undercooked ground beef consumption and hygienic practices were evaluated according to gender, ethnicity, and age. Percentages of people engaging in each of the above behaviors (referred to as effect sizes) were combined using weighted averages of these percentages. Several measures, including sampling errors, random variance between studies, sample sizes of studies, and homogeneity of findings across studies, were used in the meta-analysis. The statistical significance of differences in behaviors across demographic segments was evaluated using analysis of variance. The meta-analysis identified considerable variability in effect sizes for raw/undercooked ground beef consumption and poor hygienic practices. More males, African Americans, and adults between 30 and 54 years (midage) consumed raw/undercooked ground beef than other demographic segments. Males, Caucasians, and Hispanics and young adults between 18 and 29 years were more likely to engage in poor hygienic practices. Compared to traditional qualitative review methods, meta-analysis quantitatively accounts for interstudy differences, allows greater consideration of data from studies with smaller sample sizes, and offers ease of analysis as newer data become available, and thus, merits consideration for its application in food safety consumer research.


Sign in / Sign up

Export Citation Format

Share Document