scholarly journals A Multi-faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles

Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.

2017 ◽  
Author(s):  
Chris Aberson

Preprint of Chapter appearing as: Aberson, C. L. (2015). Statistical power analysis. In R. A. Scott & S. M. Kosslyn (Eds.) Emerging trends in the behavioral and social sciences. Hoboken, NJ: Wiley.Statistical power refers to the probability of rejecting a false null hypothesis (i.e., finding what the researcher wants to find). Power analysis allows researchers to determine adequate sample size for designing studies with an optimal probability for rejecting false null hypotheses. When conducted correctly, power analysis helps researchers make informed decisions about sample size selection. Statistical power analysis most commonly involves specifying statistic test criteria (Type I error rate), desired level of power, and the effect size expected in the population. This article outlines the basic concepts relevant to statistical power, factors that influence power, how to establish the different parameters for power analysis, and determination and interpretation of the effect size estimates for power. I also address innovative work such as the continued development of software resources for power analysis and protocols for designing for precision of confidence intervals (a.k.a., accuracy in parameter estimation). Finally, I outline understudied areas such as power analysis for designs with multiple predictors, reporting and interpreting power analyses in published work, designing for meaningfully sized effects, and power to detect multiple effects in the same study.


2021 ◽  
Author(s):  
James Edward Bartlett ◽  
Sarah Jane Charles

Authors have highlighted for decades that sample size justification through power analysis is the exception rather than the rule. Even when authors do report a power analysis, there is often no justification for the smallest effect size of interest, or they do not provide enough information for the analysis to be reproducible. We argue one potential reason for these omissions is the lack of a truly accessible introduction to the key concepts and decisions behind power analysis. In this tutorial, we demonstrate a priori and sensitivity power analysis using jamovi for two independent samples and two dependent samples. Respectively, these power analyses allow you to ask the questions: “How many participants do I need to detect a given effect size?”, and “What effect sizes can I detect with a given sample size?”. We emphasise how power analysis is most effective as a reflective process during the planning phase of research to balance your inferential goals with your available resources. By the end of the tutorial, you will be able to understand the fundamental concepts behind power analysis and extend them to more advanced statistical models.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2020 ◽  
Vol 6 (2) ◽  
pp. 106-113
Author(s):  
A. M. Grjibovski ◽  
M. A. Gorbatova ◽  
A. N. Narkevich ◽  
K. A. Vinogradov

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.


Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-5 ◽  
Author(s):  
R. Eric Heidel

Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by ana priorisample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up ana priorisample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.


2017 ◽  
Author(s):  
Daniel Lakens ◽  
Casper J Albers

When designing a study, the planned sample size is often based on power analyses. One way to choose an effect size for power analyses is by relying on pilot data. A-priori power analyses are only accurate when the effect size estimate is accurate. In this paper we highlight two sources of bias when performing a-priori power analyses for between-subject designs based on pilot data. First, we examine how the choice of the effect size index (η2, ω2 and ε2) affects the sample size and power of the main study. Based on our observations, we recommend against the use of η2 in a-priori power analyses. Second, we examine how the maximum sample size researchers are willing to collect in a main study (e.g. due to time or financial constraints) leads to overestimated effect size estimates in the studies that are performed. Determining the required sample size exclusively based on the effect size estimates from pilot data, and following up on pilot studies only when the sample size estimate for the main study is considered feasible, creates what we term follow-up bias. We explain how follow-up bias leads to underpowered main studies.Our simulations show that designing main studies based on effect sizes estimated from small pilot studies does not yield desired levels of power due to accuracy bias and follow-up bias, even when publication bias is not an issue. We urge researchers to consider alternative approaches to determining the sample size of their studies, and discuss several options.


2021 ◽  
Author(s):  
Nick J. Broers ◽  
Henry Otgaar

Since the early work of Cohen (1962) psychological researchers have become aware of the importance of doing a power analysis to ensure that the predicted effect will be detectable with sufficient statistical power. APA guidelines require researchers to provide a justification of the chosen sample size with reference to the expected effect size; an expectation that should be based on previous research. However, we argue that a credible estimate of an expected effect size is only reasonable under two conditions: either the new study forms a direct replication of earlier work or the outcome scale makes use of meaningful and familiar units that allow for the quantification of a minimal effect of psychological interest. In practice neither of these conditions is usually met. We propose a different rationale for a power analysis that will ensure that researchers will be able to justify their sample size as meaningful and adequate.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2652 ◽  
Author(s):  
Todd C. Pataky ◽  
Mark A. Robinson ◽  
Jos Vanrenterghem

One-dimensional (1D) kinematic, force, and EMG trajectories are often analyzed using zero-dimensional (0D) metrics like local extrema. Recently whole-trajectory 1D methods have emerged in the literature as alternatives. Since 0D and 1D methods can yield qualitatively different results, the two approaches may appear to be theoretically distinct. The purposes of this paper were (a) to clarify that 0D and 1D approaches are actually just special cases of a more general region-of-interest (ROI) analysis framework, and (b) to demonstrate how ROIs can augment statistical power. We first simulated millions of smooth, random 1D datasets to validate theoretical predictions of the 0D, 1D and ROI approaches and to emphasize how ROIs provide a continuous bridge between 0D and 1D results. We then analyzed a variety of public datasets to demonstrate potential effects of ROIs on biomechanical conclusions. Results showed, first, thata prioriROI particulars can qualitatively affect the biomechanical conclusions that emerge from analyses and, second, that ROIs derived from exploratory/pilot analyses can detect smaller biomechanical effects than are detectable using full 1D methods. We recommend regarding ROIs, like data filtering particulars and Type I error rate, as parameters which can affect hypothesis testing results, and thus as sensitivity analysis tools to ensure arbitrary decisions do not influence scientific interpretations. Last, we describe open-source Python and MATLAB implementations of 1D ROI analysis for arbitrary experimental designs ranging from one-samplettests to MANOVA.


2018 ◽  
Vol 108 (1) ◽  
pp. 15-22 ◽  
Author(s):  
David H. Gent ◽  
Paul D. Esker ◽  
Alissa B. Kriss

In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 – (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even indicate some studies should not be undertaken. However, the conclusions of adequately powered studies are less prone to erroneous conclusions and inflated estimates of treatment effectiveness, especially when effect sizes are small.


2016 ◽  
Vol 27 (8) ◽  
pp. 2437-2446 ◽  
Author(s):  
Hezhi Lu ◽  
Hua Jin ◽  
Weixiong Zeng

Hida and Tango established a statistical testing framework for the three-arm non-inferiority trial including a placebo with a pre-specified non-inferiority margin to overcome the shortcomings of traditional two-arm non-inferiority trials (such as having to choose the non-inferiority margin). In this paper, we propose a new method that improves their approach with respect to two aspects. We construct our testing statistics based on the best unbiased pooled estimators of the homogeneous variance; and we use the principle of intersection-union tests to determine the rejection rule. We theoretically prove that our test is better than that of Hida and Tango for large sample sizes. Furthermore, when that sample size was small or moderate, our simulation studies showed that our approach performed better than Hida and Tango’s. Although both controlled the type I error rate, their test was more conservative and the statistical power of our test was higher.


Sign in / Sign up

Export Citation Format

Share Document