Statistical Power of Negative Randomized Controlled Trials Presented at American Society for Clinical Oncology Annual Meetings

2007 ◽  
Vol 25 (23) ◽  
pp. 3482-3487 ◽  
Author(s):  
Philippe L. Bedard ◽  
Monika K. Krzyzanowska ◽  
Melania Pintilie ◽  
Ian F. Tannock

Purpose To investigate the prevalence of underpowered randomized controlled trials (RCTs) presented at American Society of Clinical Oncology (ASCO) annual meetings. Methods We surveyed all two-arm phase III RCTs presented at ASCO annual meetings from 1995 to 2003 for which negative results were obtained. Post hoc calculations were performed using a power of 80% and an α level of .05 (two sided) to determine sample sizes required to detect small, medium, and large effect sizes. For studies reporting a proportion or time-to-event as primary end point, effect size was expressed as an odds ratio (OR) or hazard ratio (HR), respectively, with a small effect size defined as OR/HR ≥ 1.3, medium effect size defined as OR/HR ≥ 1.5, and large effect size defined as OR/HR ≥ 2.0. Logistic regression was used to identify factors associated with lack of statistical power. Results Of 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 233 (55.1%) had adequate sample size to detect small, medium, and large effect sizes, respectively. Only 35 negative RCTs (7.1%) reported a reason for inadequate sample size. In a multivariable model, studies that were presented at oral sessions (P = .0038), multicenter studies supported by a cooperative group (P < .0001), and studies with time to event as primary outcome (P < .0001) were more likely to have adequate sample size. Conclusion More than half of negative RCTs presented at ASCO annual meetings do not have an adequate sample to detect a medium-size treatment effect.

2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 6516-6516
Author(s):  
P. Bedard ◽  
M. K. Krzyzanowska ◽  
M. Pintilie ◽  
I. F. Tannock

6516 Background: Underpowered randomized clinical trials (RCTs) may expose participants to risks and burdens of research without scientific merit. We investigated the prevalence of underpowered RCTs presented at ASCO annual meetings. Methods: We surveyed all two-arm parallel phase III RCTs presented at the ASCO annual meeting from 1995–2003 where differences for the primary endpoint were non-statistically significant. Post hoc calculations were performed using a power of 80% and a=0.05 (two-sided) to determine the sample size required to detect a small, medium, and large effect size between the two groups. For studies reporting a proportion or time to event as a primary endpoint, effect size was expressed as an odds ratio (OR) or hazard ratio (HR) respectively, with a small effect size defined as OR/HR=1.3, medium effect size OR/HR=1.5, and large effect OR/HR=2.0. Logistic regression was used to identify factors associated with lack of statistical power. Results: Of 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 333 (78.7%) had adequate sample size to detect small, medium, and large effect sizes respectively. Only 35 negative RCTs (7.1%) reported a reason for inadequate sample size. In a multivariable model, studies presented at plenary or oral sessions (p<0.0001) and multicenter studies supported by a co-operative group were more likely to have adequate sample size (p<0.0001). Conclusion: Two-thirds of negative RCTs presented at the ASCO annual meeting do not have an adequate sample to detect a medium-sized treatment effect. Most underpowered negative RCTs do not report a sample size calculation or reasons for inadequate patient accrual. No significant financial relationships to disclose.


2021 ◽  
pp. 174077452098487
Author(s):  
Brian Freed ◽  
Brian Williams ◽  
Xiaolu Situ ◽  
Victoria Landsman ◽  
Jeehyoung Kim ◽  
...  

Background: Blinding aims to minimize biases from what participants and investigators know or believe. Randomized controlled trials, despite being the gold standard to evaluate treatment effect, do not generally assess the success of blinding. We investigated the extent of blinding in back pain trials and the associations between participant guesses and treatment effects. Methods: We did a review with PubMed/OvidMedline, 2000–2019. Eligibility criteria were back pain trials with data available on treatment effect and participants’ guess of treatment. For blinding, blinding index was used as chance-corrected measure of excessive correct guess (0 for random guess). For treatment effects, within- or between-arm effect sizes were used. Analyses of investigators’ guess/blinding or by treatment modality were performed exploratorily. Results: Forty trials (3899 participants) were included. Active and sham treatment groups had mean blinding index of 0.26 (95% confidence interval: 0.12, 0.41) and 0.01 (−0.11, 0.14), respectively, meaning 26% of participants in active treatment believed they received active treatment, whereas only 1% in sham believed they received sham treatment, beyond chance, that is, random guess. A greater belief of receiving active treatment was associated with a larger within-arm effect size in both arms, and ideal blinding (namely, “random guess,” and “wishful thinking” that signifies both groups believing they received active treatment) showed smaller effect sizes, with correlation of effect size and summary blinding indexes of 0.35 ( p = 0.028) for between-arm comparison. We observed uniformly large sham treatment effects for all modalities, and larger correlation for investigator’s (un)blinding, 0.53 ( p = 0.046). Conclusion: Participants in active treatments in back pain trials guessed treatment identity more correctly, while those in sham treatments tended to display successful blinding. Excessive correct guesses (that could reflect weaker blinding and/or noticeable effects) by participants and investigators demonstrated larger effect sizes. Blinding and sham treatment effects on back pain need due consideration in individual trials and meta-analyses.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2018 ◽  
Author(s):  
Alexander Rozental ◽  
Roz Shafran ◽  
Tracey D Wade ◽  
Radha Kothari ◽  
Sarah J Egan ◽  
...  

BACKGROUND Perfectionism can become a debilitating condition that may negatively affect functioning in multiple areas, including mental health. Prior research has indicated that internet-based cognitive behavioral therapy can be beneficial, but few studies have included follow-up data. OBJECTIVE The objective of this study was to explore the outcomes at follow-up of internet-based cognitive behavioral therapy with guided self-help, delivered as 2 separate randomized controlled trials conducted in Sweden and the United Kingdom. METHODS In total, 120 participants randomly assigned to internet-based cognitive behavioral therapy were included in both intention-to-treat and completer analyses: 78 in the Swedish trial and 62 in the UK trial. The primary outcome measure was the Frost Multidimensional Perfectionism Scale, Concern over Mistakes subscale (FMPS CM). Secondary outcome measures varied between the trials and consisted of the Clinical Perfectionism Questionnaire (CPQ; both trials), the 9-item Patient Health Questionnaire (PHQ-9; Swedish trial), the 7-item Generalized Anxiety Disorder scale (GAD-7; Swedish trial), and the 21-item Depression Anxiety Stress Scale (DASS-21; UK trial). Follow-up occurred after 6 months for the UK trial and after 12 months for the Swedish trial. RESULTS Analysis of covariance revealed a significant difference between pretreatment and follow-up in both studies. Intention-to-treat within-group Cohen d effect sizes were 1.21 (Swedish trial; 95% CI 0.86-1.54) and 1.24 (UK trial; 95% CI 0.85-1.62) for the FMPS CM. Furthermore, 29 (59%; Swedish trial) and 15 (43%; UK trial) of the participants met the criteria for recovery on the FMPS CM. Improvements were also significant for the CPQ, with effect sizes of 1.32 (Swedish trial; 95% CI 0.97-1.66) and 1.49 (UK trial; 95% CI 1.09-1.88); the PHQ-9, effect size 0.60 (95% CI 0.28-0.92); the GAD-7, effect size 0.67 (95% CI 0.34-0.99); and the DASS-21, effect size 0.50 (95% CI 0.13-0.85). CONCLUSIONS The results are promising for the use of internet-based cognitive behavioral therapy as a way of targeting perfectionism, but the findings need to be replicated and include a comparison condition.


2017 ◽  
Author(s):  
Clarissa F. D. Carneiro ◽  
Thiago C. Moulin ◽  
Malcolm R. Macleod ◽  
Olavo B. Amaral

AbstractProposals to increase research reproducibility frequently call for focusing on effect sizes instead of p values, as well as for increasing the statistical power of experiments. However, it is unclear to what extent these two concepts are indeed taken into account in basic biomedical science. To study this in a real-case scenario, we performed a systematic review of effect sizes and statistical power in studies on learning of rodent fear conditioning, a widely used behavioral task to evaluate memory. Our search criteria yielded 410 experiments comparing control and treated groups in 122 articles. Interventions had a mean effect size of 29.5%, and amnesia caused by memory-impairing interventions was nearly always partial. Mean statistical power to detect the average effect size observed in well-powered experiments with significant differences (37.2%) was 65%, and was lower among studies with non-significant results. Only one article reported a sample size calculation, and our estimated sample size to achieve 80% power considering typical effect sizes and variances (15 animals per group) was reached in only 12.2% of experiments. Actual effect sizes correlated with effect size inferences made by readers on the basis of textual descriptions of results only when findings were non-significant, and neither effect size nor power correlated with study quality indicators, number of citations or impact factor of the publishing journal. In summary, effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited. Failure to take these concepts into consideration might limit attempts to improve reproducibility in this field of science.


2017 ◽  
Vol 28 (11) ◽  
pp. 1547-1562 ◽  
Author(s):  
Samantha F. Anderson ◽  
Ken Kelley ◽  
Scott E. Maxwell

The sample size necessary to obtain a desired level of statistical power depends in part on the population value of the effect size, which is, by definition, unknown. A common approach to sample-size planning uses the sample effect size from a prior study as an estimate of the population value of the effect to be detected in the future study. Although this strategy is intuitively appealing, effect-size estimates, taken at face value, are typically not accurate estimates of the population effect size because of publication bias and uncertainty. We show that the use of this approach often results in underpowered studies, sometimes to an alarming degree. We present an alternative approach that adjusts sample effect sizes for bias and uncertainty, and we demonstrate its effectiveness for several experimental designs. Furthermore, we discuss an open-source R package, BUCSS, and user-friendly Web applications that we have made available to researchers so that they can easily implement our suggested methods.


Sign in / Sign up

Export Citation Format

Share Document