Randomized trials in oncology stopped early for benefit: A systematic review

2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 6513-6513
Author(s):  
R. A. Wilcox ◽  
G. H. Guyatt ◽  
V. M. Montori

6513 Background: Investigators finding a large treatment effect in an interim analysis may terminate a randomized trial (RCT) earlier than planned. A systematic review (Montori et. al., JAMA 2005; 294: 2203–2209) found that RCTs stopped early for benefit are poorly reported and may overestimate the true treatment affect. The extent to which RCTs in oncology stopped early for benefit share similar concerns remains unclear. Methods: We selected RCTs in oncology which had been reported in the original systematic review and reviewed the study characteristics, features related to the decision to monitor and stop the study early (sample size, interim analyses, monitoring and stopping rules), and the number of events and the estimated treatment effects. Results: We found 29 RCTs in malignant hematology (n=6) and oncology (n=23), 52% published in 2000–2004 and 41% in 3 high-impact medical journals (New England Journal of Medicine, Lancet, JAMA). The majority (79%) of trials reported a planned sample size and, on average, recruited 67% of the planned sample size (SD 31%). RCTs reported (1) the planned sample size (n=20), (2) the interim analysis at which the study was terminated (n=16), and (3) whether the decision to stop the study prematurely was informed by a stopping rule (n=16); only 13 reported all three. There was a highly significant correlation between the number of events and the treatment effect (r=0.68, p=0.0007). The odds of finding a large treatment effect (a relative risk < median of 0.54, IQR 0.3–0.7) when studies stopped after a few events (no. events < median of 54 events, IQR 22–125) was 6.2 times greater than when studies stopped later. Conclusions: RCTs in oncology stopped early for benefit tend to report large treatment effects that may overestimate the true treatment effect, particularly when the number of events driving study termination is small. Also, information pertinent to the decision to stop early was inconsistently reported. Clinicians and policymakers should interpret such studies with caution, especially when information about the decision to stop early is not provided and few events occurred. No significant financial relationships to disclose.

Biometrika ◽  
2020 ◽  
Author(s):  
Oliver Dukes ◽  
Stijn Vansteelandt

Summary Eliminating the effect of confounding in observational studies typically involves fitting a model for an outcome adjusted for covariates. When, as often, these covariates are high-dimensional, this necessitates the use of sparse estimators, such as the lasso, or other regularization approaches. Naïve use of such estimators yields confidence intervals for the conditional treatment effect parameter that are not uniformly valid. Moreover, as the number of covariates grows with the sample size, correctly specifying a model for the outcome is nontrivial. In this article we deal with both of these concerns simultaneously, obtaining confidence intervals for conditional treatment effects that are uniformly valid, regardless of whether the outcome model is correct. This is done by incorporating an additional model for the treatment selection mechanism. When both models are correctly specified, we can weaken the standard conditions on model sparsity. Our procedure extends to multivariate treatment effect parameters and complex longitudinal settings.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Brent Strong ◽  
John A Oostema ◽  
Nadia Nikroo ◽  
Murtaza Hussain ◽  
Mathew J Reeves

Introduction: A priori sample size determination is an essential step in designing randomized controlled trials (RCTs). Failure to reach pre-planned sample size introduces risk of both falsely negative and spuriously positive findings. We undertook a systematic review of contemporary acute stroke trials to document the prevalence and reasons for termination of trials prior to completion of enrollment. Methods: We searched MEDLINE for RCTs of acute stroke therapy published between 2013 and 2018 in 9 major journals. Manuscripts describing the final primary results of phase 3 and large phase 2 trials of any therapeutic intervention were eligible for inclusion. Study characteristics, including the presence of a data monitoring committee (DMC) and stopping rules, risk-of-bias assessment, funding sources and conflicts of interest, were abstracted from published manuscripts and trial protocols by two independent reviewers. The prevalence of and reasons for early termination were quantified. Multivariable logistic regression was used to identify study-level predictors of early termination. Results: Of 756 hits, 60 were eligible for inclusion, 21 (35%) of which were terminated early. Among the trials stopped early, 10 (48%) reported stopping for benefit or newly available evidence while 11 (52%) were terminated for futility; 20 (95%) reported a DMC and 17 (81%) reported the use of a pre-specified statistical stopping rule. Factors associated with early termination included study location in North America, larger planned sample size, and industry funding (Table). Study location in North America and larger planned sample size retained statistical significance in a multivariable model. Conclusions: One in three contemporary stroke trials were terminated prior to completion of enrollment. Reasons for termination were evenly split between benefit and futility. Further study is needed to understand the reasons for and impact of early termination on study results.


Trials ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Julia M. Edwards ◽  
Stephen J. Walters ◽  
Cornelia Kunz ◽  
Steven A. Julious

Abstract Introduction Sample size calculations require assumptions regarding treatment response and variability. Incorrect assumptions can result in under- or overpowered trials, posing ethical concerns. Sample size re-estimation (SSR) methods investigate the validity of these assumptions and increase the sample size if necessary. The “promising zone” (Mehta and Pocock, Stat Med 30:3267–3284, 2011) concept is appealing to researchers for its design simplicity. However, it is still relatively new in the application and has been a source of controversy. Objectives This research aims to synthesise current approaches and practical implementation of the promising zone design. Methods This systematic review comprehensively identifies the reporting of methodological research and of clinical trials using promising zone. Databases were searched according to a pre-specified search strategy, and pearl growing techniques implemented. Results The combined search methods resulted in 270 unique records identified; 171 were included in the review, of which 30 were trials. The median time to the interim analysis was 60% of the original target sample size (IQR 41–73%). Of the 15 completed trials, 7 increased their sample size. Only 21 studies reported the maximum sample size that would be considered, for which the median increase was 50% (IQR 35–100%). Conclusions Promising zone is being implemented in a range of trials worldwide, albeit in low numbers. Identifying trials using promising zone was difficult due to the lack of reporting of SSR methodology. Even when SSR methodology was reported, some had key interim analysis details missing, and only eight papers provided promising zone ranges.


2020 ◽  
Vol 18 (6) ◽  
pp. 3045-3089
Author(s):  
Eva Vivalt

Abstract Impact evaluations can help to inform policy decisions, but they are rooted in particular contexts and to what extent they generalize is an open question. I exploit a new data set of impact evaluation results and find a large amount of effect heterogeneity. Effect sizes vary systematically with study characteristics, with government-implemented programs having smaller effect sizes than academic or non-governmental organization-implemented programs, even controlling for sample size. I show that treatment effect heterogeneity can be appreciably reduced by taking study characteristics into account.


2012 ◽  
Vol 46 (1) ◽  
pp. 12-14
Author(s):  
Zahra Sohani

ABSTRACT The goal of a systematic review is to present a balanced summary of existing research. In order to accomplish this, systematic reviews include a thorough search of relevant articles, including published and unpublished, using explicitly defined and reproducible criteria. The main rationale for conducting a meta-analysis comes from the fact that combining individual studies provide an increased sample size, which consequently improves the statistical power to detect treatment effect. If all steps outlined are followed properly and authors remain transparent regarding the design and conduct of the meta-analysis, this technique provides an excellent and scientifically sound means of synthesizing evidence. How to cite this article Sohani Z. Meta-analysis: Statistical Trickery or Sound Science? J Postgrad Med Edu Res 2012; 46(1):12-14.


2009 ◽  
Vol 26 (3) ◽  
pp. 931-951 ◽  
Author(s):  
Yanqin Fan ◽  
Sang Soo Park

In this paper, we propose nonparametric estimators of sharp bounds on the distribution of treatment effects of a binary treatment and establish their asymptotic distributions. We note the possible failure of the standard bootstrap with the same sample size and apply the fewer-than-nbootstrap to making inferences on these bounds. The finite sample performances of the confidence intervals for the bounds based on normal critical values, the standard bootstrap, and the fewer-than-nbootstrap are investigated via a simulation study. Finally we establish sharp bounds on the treatment effect distribution when covariates are available.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e13564-e13564
Author(s):  
Brian Hobbs ◽  
Thanh Ton ◽  
Xiao Li ◽  
David S. Hong ◽  
Rebecca A. Hubbard ◽  
...  

e13564 Background: Traditional rPh2 trials have limitations that may yield suboptimal Ph3-GO. Compared to a rPh2 of equivalent sample size, SAT+rwEC allows more patients to receive experimental therapies while preserving the ability to compare experimental and control groups. Bias arising from measurement error and confounding in the rwEC, however, poses challenges to statistical inference. Preliminary studies suggest higher response rates are observed in rwEC than randomized controls. We compared Ph3-GO decisions between SAT+rwEC and rPh2. Methods: Ph3-GO probability was compared using simulation studies that resembled the oncology setting with objective response rate (ORR) endpoint. rPh2 simulated parameters were: sample size (60-120) with 1:1 randomization, ORR in rPh2 control (15%-50%), true treatment effect (ΔORR: 0-50). For each rPh2 of a given sample size, we evaluated an SAT+rwEC that re-allocated all rPh2 control patients to the experimental arm (i.e., doubling the sample size of the experimental arm) and added an rwEC. SAT+rwEC were simulated with assumptions for size (rwEC to SAT ratio: 0.5 to 2) and net bias (-10 to +10), which was simulated as a composite representing ORR measurement error plus residual confounding after multivariable adjustment. Positive direction of net bias corresponds to higher ORR in the rwEC. Ph3-GO thresholds varied from 10-30%. Ph3-GO was considered “False-GO” when true treatment effect < threshold, and “True-GO” when true treatment effect ≥ threshold. Results: With positive net bias of +10, SAT+rwEC had lower False-GO and True-GO decisions compared to rPh2. With negative net bias of -10, both False-GO and True-GO probabilities were higher for the SAT+rwEC. When net bias=0, the increased size of SAT+rwEC resulted in observable Ph3-GO improvements with lower False-GO and higher True-GO than corresponding rPh2. Conclusions: An interactive dashboard was developed for users. The magnitude and direction of net bias relative to the decision threshold affect the performance of SAT+rwEC. The relative sample size of rwEC to rPh2 may also impact performance. The dashboard can provide quantitative guidance for Ph3-GO if net bias can be estimated by independent studies. Further work to quantify net bias and refine Ph3-GO criteria can help reduce the currently high False-GO rates while increasing opportunities for patients to receive experimental therapies through the SAT+rwEC design. Ph3-GO probability for rPh2 vs. SAT+rwEC with threshold=15%, baseline ORR=20% (select scenarios).[Table: see text]


Author(s):  
David Aaby ◽  
Juned Siddique

Abstract Background Lifestyle intervention studies often use self-reported measures of diet as an outcome variable to measure changes in dietary intake. The presence of measurement error in self-reported diet due to participant failure to accurately report their diet is well known. Less familiar to researchers is differential measurement error, where the nature of measurement error differs by treatment group and/or time. Differential measurement error is often present in intervention studies and can result in biased estimates of the treatment effect and reduced power to detect treatment effects. Investigators need to be aware of the impact of differential measurement error when designing intervention studies that use self-reported measures. Methods We use simulation to assess the consequences of differential measurement error on the ability to estimate treatment effects in a two-arm randomized trial with two time points. We simulate data under a variety of scenarios, focusing on how different factors affect power to detect a treatment effect, bias of the treatment effect, and coverage of the 95% confidence interval of the treatment effect. Simulations use realistic scenarios based on data from the Trials of Hypertension Prevention Study. Simulated sample sizes ranged from 110-380 per group. Results Realistic differential measurement error seen in lifestyle intervention studies can require an increased sample size to achieve 80% power to detect a treatment effect and may result in a biased estimate of the treatment effect. Conclusions Investigators designing intervention studies that use self-reported measures should take differential measurement error into account by increasing their sample size, incorporating an internal validation study, and/or identifying statistical methods to correct for differential measurement error.


Sign in / Sign up

Export Citation Format

Share Document