Brief Report: Post Hoc Power, Observed Power, A Priori Power, Retrospective Power, Prospective Power, Achieved Power: Sorting Out Appropriate Uses of Statistical Power Analyses

2007 ◽  
Vol 1 (4) ◽  
pp. 291-299 ◽  
Author(s):  
Daniel J. O'Keefe
2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2021 ◽  
Author(s):  
Jordan D.A. Hart ◽  
Daniel W. Franks ◽  
Lauren J.N. Brent ◽  
Michael N. Weiss

Power analysis is used to estimate the probability of correctly rejecting a null hypothesis for a given statistical model and dataset. Conventional power analyses assume complete information, but the stochastic nature of behavioural sampling can mean that true and estimated networks are poorly correlated. Power analyses of animal social networks do not currently take the effect of sampling into account. This could lead to inaccurate estimates of statistical power, potentially yielding misleading results. Here we develop a method for computing how well an estimated social network correlates with its true network using a Gamma-Poisson model of interaction rates. We use simulations to assess how the level of correlation between true and estimated networks affects the power of nodal regression analyses. We also develop a generic method of power analysis applicable to any statistical test, based on the concept of diminishing returns. We demonstrate that our network correlation estimator is both accurate and moderately robust to its assumptions being broken. We show that social differentiation, mean interaction rate, and the harmonic mean of sampling times positively impacts the strength of correlation between true and estimated networks. We also show that the required level of correlation between true and estimated networks to achieve a given power level depends on many factors, but that 80% correlation usually corresponded to around 80% power for nodal regression. We provide guidelines for using our network correlation estimator to verify the accuracy of interaction networks, and to conduct power analysis. This can be used prior to data collection, in post hoc analyses, or even for subsetting networks for use in dynamic network analysis. We make our code available so that custom power analysis can be used in future studies.


2021 ◽  
Vol 4 (1) ◽  
pp. 251524592095150
Author(s):  
Daniël Lakens ◽  
Aaron R. Caldwell

Researchers often rely on analysis of variance (ANOVA) when they report results of experiments. To ensure that a study is adequately powered to yield informative results with an ANOVA, researchers can perform an a priori power analysis. However, power analysis for factorial ANOVA designs is often a challenge. Current software solutions do not allow power analyses for complex designs with several within-participants factors. Moreover, power analyses often need [Formula: see text] or Cohen’s f as input, but these effect sizes are not intuitive and do not generalize to different experimental designs. We have created the R package Superpower and online Shiny apps to enable researchers without extensive programming experience to perform simulation-based power analysis for ANOVA designs of up to three within- or between-participants factors. Predicted effects are entered by specifying means, standard deviations, and, for within-participants factors, the correlations. The simulation provides the statistical power for all ANOVA main effects, interactions, and individual comparisons. The software can plot power across a range of sample sizes, can control for multiple comparisons, and can compute power when the homogeneity or sphericity assumption is violated. This Tutorial demonstrates how to perform a priori power analysis to design informative studies for main effects, interactions, and individual comparisons and highlights important factors that determine the statistical power for factorial ANOVA designs.


2019 ◽  
Author(s):  
Daniel Lakens ◽  
Aaron R Caldwell

Researchers often rely on analysis of variance (ANOVA) when they report results of experiments. To ensure a study is adequately powered to yield informative results when performing an ANOVA, researchers can perform an a-priori power analysis. However, power analysis for factorial ANOVA designs is often a challenge. Current software solutions do not allow power analyses for complex designs with several within-subject factors. Moreover, power analyses often need partial eta-squared or Cohen's $f$ as input, but these effect sizes are not intuitive and do not generalize to different experimental designs. We have created the R package Superpower and online Shiny apps to enable researchers without extensive programming experience to perform simulation-based power analysis for ANOVA designs of up to three within- or between-subject factors. Predicted effects are entered by specifying means, standard deviations, and for within-subject factors the correlations. The simulation provides the statistical power for all ANOVA main effects, interactions, and individual comparisons, and allows researchers to correct for multiple comparisons. The software can plot power across a range of sample sizes, can control error rates for multiple comparisons, and can compute power when the homogeneity or sphericity assumptions are violated. This tutorial will demonstrate how to perform a-priori power analysis to design informative studies for main effects, interactions, and individual comparisons, and highlights important factors that determine the statistical power for factorial ANOVA designs.


2014 ◽  
Vol 45 (3) ◽  
pp. 239-245 ◽  
Author(s):  
Robert J. Calin-Jageman ◽  
Tracy L. Caldwell

A recent series of experiments suggests that fostering superstitions can substantially improve performance on a variety of motor and cognitive tasks ( Damisch, Stoberock, & Mussweiler, 2010 ). We conducted two high-powered and precise replications of one of these experiments, examining if telling participants they had a lucky golf ball could improve their performance on a 10-shot golf task relative to controls. We found that the effect of superstition on performance is elusive: Participants told they had a lucky ball performed almost identically to controls. Our failure to replicate the target study was not due to lack of impact, lack of statistical power, differences in task difficulty, nor differences in participant belief in luck. A meta-analysis indicates significant heterogeneity in the effect of superstition on performance. This could be due to an unknown moderator, but no effect was observed among the studies with the strongest research designs (e.g., high power, a priori sampling plan).


2019 ◽  
Author(s):  
Curtis David Von Gunten ◽  
Bruce D Bartholow

A primary psychometric concern with laboratory-based inhibition tasks has been their reliability. However, a reliable measure may not be necessary or sufficient for reliably detecting effects (statistical power). The current study used a bootstrap sampling approach to systematically examine how the number of participants, the number of trials, the magnitude of an effect, and study design (between- vs. within-subject) jointly contribute to power in five commonly used inhibition tasks. The results demonstrate the shortcomings of relying solely on measurement reliability when determining the number of trials to use in an inhibition task: high internal reliability can be accompanied with low power and low reliability can be accompanied with high power. For instance, adding additional trials once sufficient reliability has been reached can result in large gains in power. The dissociation between reliability and power was particularly apparent in between-subject designs where the number of participants contributed greatly to power but little to reliability, and where the number of trials contributed greatly to reliability but only modestly (depending on the task) to power. For between-subject designs, the probability of detecting small-to-medium-sized effects with 150 participants (total) was generally less than 55%. However, effect size was positively associated with number of trials. Thus, researchers have some control over effect size and this needs to be considered when conducting power analyses using analytic methods that take such effect sizes as an argument. Results are discussed in the context of recent claims regarding the role of inhibition tasks in experimental and individual difference designs.


2017 ◽  
Vol 21 (4) ◽  
pp. 308-320 ◽  
Author(s):  
Mark Rubin

Hypothesizing after the results are known, or HARKing, occurs when researchers check their research results and then add or remove hypotheses on the basis of those results without acknowledging this process in their research report ( Kerr, 1998 ). In the present article, I discuss 3 forms of HARKing: (a) using current results to construct post hoc hypotheses that are then reported as if they were a priori hypotheses; (b) retrieving hypotheses from a post hoc literature search and reporting them as a priori hypotheses; and (c) failing to report a priori hypotheses that are unsupported by the current results. These 3 types of HARKing are often characterized as being bad for science and a potential cause of the current replication crisis. In the present article, I use insights from the philosophy of science to present a more nuanced view. Specifically, I identify the conditions under which each of these 3 types of HARKing is most and least likely to be bad for science. I conclude with a brief discussion about the ethics of each type of HARKing.


2018 ◽  
Vol 53 (7) ◽  
pp. 716-719
Author(s):  
Monica R. Lininger ◽  
Bryan L. Riemann

Objective: To describe the concept of statistical power as related to comparative interventions and how various factors, including sample size, affect statistical power.Background: Having a sufficiently sized sample for a study is necessary for an investigation to demonstrate that an effective treatment is statistically superior. Many researchers fail to conduct and report a priori sample-size estimates, which then makes it difficult to interpret nonsignificant results and causes the clinician to question the planning of the research design.Description: Statistical power is the probability of statistically detecting a treatment effect when one truly exists. The α level, a measure of differences between groups, the variability of the data, and the sample size all affect statistical power.Recommendations: Authors should conduct and provide the results of a priori sample-size estimations in the literature. This will assist clinicians in determining whether the lack of a statistically significant treatment effect is due to an underpowered study or to a treatment's actually having no effect.


2017 ◽  
Author(s):  
Etienne P. LeBel ◽  
Derek Michael Berger ◽  
Lorne Campbell ◽  
Timothy Loving

Finkel, Eastwick, and Reis (2016; FER2016) argued the post-2011 methodological reform movement has focused narrowly on replicability, neglecting other essential goals of research. We agree multiple scientific goals are essential, but argue, however, a more fine-grained language, conceptualization, and approach to replication is needed to accomplish these goals. Replication is the general empirical mechanism for testing and falsifying theory. Sufficiently methodologically similar replications, also known as direct replications, test the basic existence of phenomena and ensure cumulative progress is possible a priori. In contrast, increasingly methodologically dissimilar replications, also known as conceptual replications, test the relevance of auxiliary hypotheses (e.g., manipulation and measurement issues, contextual factors) required to productively investigate validity and generalizability. Without prioritizing replicability, a field is not empirically falsifiable. We also disagree with FER2016’s position that “bigger samples are generally better, but … that very large samples could have the downside of commandeering resources that would have been better invested in other studies” (abstract). We identify problematic assumptions involved in FER2016’s modifications of our original research-economic model, and present an improved model that quantifies when (and whether) it is reasonable to worry that increasing statistical power will engender potential trade-offs. Sufficiently-powering studies (i.e., >80%) maximizes both research efficiency and confidence in the literature (research quality). Given we are in agreement with FER2016 on all key open science points, we are eager to start seeing the accelerated rate of cumulative knowledge development of social psychological phenomena such a sufficiently transparent, powered, and falsifiable approach will generate.


Sign in / Sign up

Export Citation Format

Share Document