scholarly journals Statistical power in clinical trials of interventions for mood, anxiety, and psychotic disorders

Author(s):  
Ymkje Anna de Vries ◽  
Robert A Schoevers ◽  
Julian Higgins ◽  
Marcus Munafo ◽  
Jojanneke Bastiaansen

Background: Previous research has suggested that statistical power is suboptimal in many biomedical disciplines, but it is unclear whether power is better in trials for particular interventions, disorders, or outcome types. We therefore performed a detailed examination of power in trials of psychotherapy, pharmacotherapy, and complementary and alternative medicine (CAM) for mood, anxiety, and psychotic disorders. Methods: We extracted data from the Cochrane Database of Systematic Reviews (Mental Health). We focused on continuous efficacy outcomes and estimated power to detect standardized effect sizes (SMD=0.20-0.80, primary effect size SMD=0.40) and the meta-analytic effect size (ESMA). We performed meta-regression to estimate the influence of including underpowered studies in meta-analyses. Results: We included 216 reviews with 8809 meta-analyses and 36540 studies. Statistical power for continuous efficacy outcomes was very low across intervention and disorder types (overall median [IQR] power for SMD=0.40: 0.33 [0.19-0.54]; for ESMA: 0.15 [0.07-0.44]), only reaching conventionally acceptable levels (80%) for SMD=0.80. Median power to detect the ESMA was higher in TAU/waitlist-controlled (0.54-0.66) or placebo-controlled (0.15-0.40) trials than in trials comparing active treatments (0.07-0.10). Meta-regression indicated that adequately-powered studies produced smaller effect sizes than underpowered studies (B=-0.06, p=0.008). Conclusions: Power to detect both fixed and meta-analytic effect sizes in clinical trials in psychiatry was low across all interventions and disorders examined. As underpowered studies produced larger effect sizes than adequately-powered studies, these results confirm the need to increase sample sizes and to reduce reporting bias against studies reporting null results to improve the reliability of the published literature.

2020 ◽  
Vol 63 (5) ◽  
pp. 1572-1580
Author(s):  
Laura Gaeta ◽  
Christopher R. Brydges

Purpose The purpose was to examine and determine effect size distributions reported in published audiology and speech-language pathology research in order to provide researchers and clinicians with more relevant guidelines for the interpretation of potentially clinically meaningful findings. Method Cohen's d, Hedges' g, Pearson r, and sample sizes ( n = 1,387) were extracted from 32 meta-analyses in journals in speech-language pathology and audiology. Percentile ranks (25th, 50th, 75th) were calculated to determine estimates for small, medium, and large effect sizes, respectively. The median sample size was also used to explore statistical power for small, medium, and large effect sizes. Results For individual differences research, effect sizes of Pearson r = .24, .41, and .64 were found. For group differences, Cohen's d /Hedges' g = 0.25, 0.55, and 0.93. These values can be interpreted as small, medium, and large effect sizes in speech-language pathology and audiology. The majority of published research was inadequately powered to detect a medium effect size. Conclusions Effect size interpretations from published research in audiology and speech-language pathology were found to be underestimated based on Cohen's (1988, 1992) guidelines. Researchers in the field should consider using Pearson r = .25, .40, and .65 and Cohen's d /Hedges' g = 0.25, 0.55, and 0.95 as small, medium, and large effect sizes, respectively, and collect larger sample sizes to ensure that both significant and nonsignificant findings are robust and replicable.


2016 ◽  
Author(s):  
Daniel Lakens

Scientists should be able to provide support for the absence of a meaningful effect. Currently researchers often incorrectly conclude an effect is absent based a non-significant result. A widely recommended approach within a Frequentist framework is to test for equivalence. In equivalence tests, such as the Two One-Sided Tests (TOST) procedure discussed in this article, an upper and lower equivalence bound is specified based on the smallest effect size of interest. The TOST procedure can be used to statistically reject the presence of effects large enough to be considered worthwhile. This practical primer with accompanying spreadsheet and R package enables psychologists to easily perform equivalence tests (and power analyses) by setting equivalence bounds based on standardized effect sizes, and provides recommendations to pre-specify equivalence bounds. Extending your statistical toolkit with equivalence tests might very well be the easiest way for psychologists to improve their statistical and theoretical inferences.


2017 ◽  
Author(s):  
Herm J. Lamberink ◽  
Willem M. Otte ◽  
Michel R.T. Sinke ◽  
Daniël Lakens ◽  
Paul P. Glasziou ◽  
...  

AbstractBackgroundBiomedical studies with low statistical power are a major concern in the scientific community and are one of the underlying reasons for the reproducibility crisis in science. If randomized clinical trials, which are considered the backbone of evidence-based medicine, also suffer from low power, this could affect medical practice.MethodsWe analysed the statistical power in 137 032 clinical trials between 1975 and 2017 extracted from meta-analyses from the Cochrane database of systematic reviews. We determined study power to detect standardized effect sizes according to Cohen, and in meta-analysis with p-value below 0.05 we based power on the meta-analysed effect size. Average power, effect size and temporal patterns were examined.ResultsThe number of trials with power ≥80% was low but increased over time: from 9% in 1975–1979 to 15% in 2010–2014. This increase was mainly due to increasing sample sizes, whilst effect sizes remained stable with a median Cohen’s h of 0.21 (IQR 0.12-0.36) and a median Cohen’s d of 0.31 (0.19-0.51). The proportion of trials with power of at least 80% to detect a standardized effect size of 0.2 (small), 0.5 (moderate) and 0.8 (large) was 7%, 48% and 81%, respectively.ConclusionsThis study demonstrates that sufficient power in clinical trials is still problematic, although the situation is slowly improving. Our data encourages further efforts to increase statistical power in clinical trials to guarantee rigorous and reproducible evidence-based medicine.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2018 ◽  
Author(s):  
Qianying Wang ◽  
Jing Liao ◽  
Kaitlyn Hair ◽  
Alexandra Bannach-Brown ◽  
Zsanett Bahor ◽  
...  

AbstractBackgroundMeta-analysis is increasingly used to summarise the findings identified in systematic reviews of animal studies modelling human disease. Such reviews typically identify a large number of individually small studies, testing efficacy under a variety of conditions. This leads to substantial heterogeneity, and identifying potential sources of this heterogeneity is an important function of such analyses. However, the statistical performance of different approaches (normalised compared with standardised mean difference estimates of effect size; stratified meta-analysis compared with meta-regression) is not known.MethodsUsing data from 3116 experiments in focal cerebral ischaemia to construct a linear model predicting observed improvement in outcome contingent on 25 independent variables. We used stochastic simulation to attribute these variables to simulated studies according to their prevalence. To ascertain the ability to detect an effect of a given variable we introduced in addition this “variable of interest” of given prevalence and effect. To establish any impact of a latent variable on the apparent influence of the variable of interest we also introduced a “latent confounding variable” with given prevalence and effect, and allowed the prevalence of the variable of interest to be different in the presence and absence of the latent variable.ResultsGenerally, the normalised mean difference (NMD) approach had higher statistical power than the standardised mean difference (SMD) approach. Even when the effect size and the number of studies contributing to the meta-analysis was small, there was good statistical power to detect the overall effect, with a low false positive rate. For detecting an effect of the variable of interest, stratified meta-analysis was associated with a substantial false positive rate with NMD estimates of effect size, while using an SMD estimate of effect size had very low statistical power. Univariate and multivariable meta-regression performed substantially better, with low false positive rate for both NMD and SMD approaches; power was higher for NMD than for SMD. The presence or absence of a latent confounding variables only introduced an apparent effect of the variable of interest when there was substantial asymmetry in the prevalence of the variable of interest in the presence or absence of the confounding variable.ConclusionsIn meta-analysis of data from animal studies, NMD estimates of effect size should be used in preference to SMD estimates, and meta-regression should, where possible, be chosen over stratified meta-analysis. The power to detect the influence of the variable of interest depends on the effect of the variable of interest and its prevalence, but unless effects are very large adequate power is only achieved once at least 100 experiments are included in the meta-analysis.


2018 ◽  
Vol 102 ◽  
pp. 123-128 ◽  
Author(s):  
Herm J. Lamberink ◽  
Willem M. Otte ◽  
Michel R.T. Sinke ◽  
Daniël Lakens ◽  
Paul P. Glasziou ◽  
...  

Author(s):  
Ascensión Fumero ◽  
Wenceslao Peñate ◽  
Cristián Oyanadel ◽  
Bárbara Porter

Objective: There has been a growing interest in the study of the effectiveness of mindfulness-based interventions (MBIs). Many clinical trials and experimental designs have been implemented, with different samples and diverse MBI procedures. Reviews have shown unclear results, apart from a tendency to identify low-to-moderate effectiveness. The purpose of this review is to examine the effectiveness of MBIs on anxiety complaints, analyzing available systematic reviews and meta-analyses. Method: The literature search was done in MEDLINE (PubMed) and PsycINFO, from the first available review in 2003 until March 2020. From 82 initial references, 12 reviews were selected. Results: Reviews confirmed a moderate effect size of MBIs in improving anxiety symptoms. This efficacy was similar to that of well-established therapies for reducing anxiety symptoms, such as cognitive behavioral therapies. A large effect size was found when well-developed MBI protocols were applied. Discussion: More refined clinical trials are needed to establish clear conditions of MBI effectiveness (protocols, samples, psychological mechanisms, etc.). In addition, considering mindfulness processes, new outcome measures are needed (such as acceptance, self-awareness, or well-being) to test the incremental value of MBIs.


Sign in / Sign up

Export Citation Format

Share Document