scholarly journals A Menu-driven Facility for Complex Sample Size Calculation in Randomized Controlled Trials with a Survival or a Binary Outcome

Author(s):  
Patrick Royston ◽  
Abdel Babiker

We present a menu-driven Stata program for the calculation of sample size or power for complex clinical trials with a survival time or a binary outcome. The features supported include up to six treatment arms, an arbitrary time-to-event distribution, fixed or time-varying hazard ratios, unequal patient allocation, loss to follow-up, staggered patient entry, and crossover of patients from their allocated treatment to an alternative treatment. The computations of sample size and power are based on the logrank test and are done according to the asymptotic distribution of the logrank test statistic, adjusted appropriately for the design features.

Author(s):  
John W Welsh ◽  
Behnood Bikdeli ◽  
Yasir Akram ◽  
Ike Lee ◽  
Nihar R Desai ◽  
...  

Introduction: Randomized controlled trials (RCTs) designed to demonstrate non-inferiority of an intervention compared with control have become increasingly common in cardiovascular medicine. Such RCTs may be biased toward null findings through low enrollment, post-randomization exclusions, loss to follow-up, or wide inferiority margins. We characterized the features of non-inferiority cardiovascular RCTs published in high-impact journals that could lead to bias. Methods: We searched PubMed for non-inferiority cardiovascular RCTs published between January 1, 1990 and August 11, 2016 in The New England Journal of Medicine, Lancet , and JAMA . We reviewed methodological characteristics, including sample size, power estimates, selected non-inferiority margin, and success of studies in achieving non-inferiority. Results: Of 3,689 screened studies, we identified 104 non-inferiority RCTs. Publication increased over time (P<0.001), as more than 50% (n=53) were published since 2010. Of 101 trials with eligible data, 80 (77%) trials claimed non-inferiority (19 of which also demonstrated superiority), whereas 21 (20%) did not (including 7 which showed worse outcomes with the tested intervention, and 14 that had inconclusive results, Figure). Only 1 study had >10% of participants lost to follow-up. Of 75 studies with available data, 14 reported >10% post-randomization exclusions. Of 89 studies with available information, 10 analyzed a cohort >20% smaller than their calculated sample size. Only 55 studies (53%) reported all the randomized patients in the primary endpoint analyses. Only 52 trials (50%) reported analyses from both the intention-to-treat and per-protocol cohorts, of which 2 found a discrepancy in analyses. Treatment adherence was reported in 18 trials (34%). Pre-specified non-inferiority margins ranged widely, with absolute differences between 0.4-14%, hazard ratios between 1.05-2.85, odds ratios between 1.1-2.0, and relative risks between 1.1-2.0. Only 9 studies (8.7%) used a placebo or no-intervention arm. Conclusion: Non-inferiority designed RCTs in cardiovascular medicine are increasingly published in high-impact journals, commonly conclude non-inferiority of the new intervention, and frequently have design features that might bias the studies toward non-inferiority.


2020 ◽  
Author(s):  
Miles D. Witham ◽  
James Wason ◽  
Richard M Dodds ◽  
Avan A Sayer

Abstract Introduction Frailty is the loss of ability to withstand a physiological stressor, and is associated with multiple adverse outcomes in older people. Trials to prevent or ameliorate frailty are in their infancy. A range of different outcome measures have been proposed, but current measures require either large sample sizes, long follow-up, or do not directly measure the construct of frailty. Methods We propose a composite outcome for frailty prevention trials, comprising progression to the frail state, death, or being too unwell to continue in a trial. To determine likely event rates, we used data from the English Longitudinal Study for Ageing, collected 4 years apart. We calculated transition rates between non-frail, prefrail, frail or loss to follow up due to death or illness. We used Markov state transition models to interpolate one- and two-year transition rates, and performed sample size calculations for a range of differences in transition rates using simple and composite outcomes. Results The frailty category was calculable for 4650 individuals at baseline (2226 non-frail, 1907 prefrail, 517 frail); at follow up, 1282 were non-frail, 1108 were prefrail, 318 were frail and 1936 had dropped out or were unable to complete all tests for frailty. Transition probabilities for those prefrail at baseline, measured at wave 4 were respectively 0.176, 0.286, 0.096 and 0.442 to non-frail, prefrail, frail and dead/dropped out. Interpolated transition probabilities were 0.159, 0.494, 0.113 and 0.234 at two years, and 0.108, 0.688, 0.087 and 0.117 at one year. Required sample sizes for a two-year outcome were between 1000 and 7200 for transition from prefrailty to frailty alone, 250 to 1600 for transition to the composite measure, and 75 to 350 using the composite measure with an ordinal logistic regression approach. Conclusion Use of a composite outcome for frailty trials offers reduced sample sizes and could ameliorate the effect of high loss to follow up inherent in such trials due to death and illness.


2019 ◽  
Author(s):  
Miles D. Witham ◽  
James Wason ◽  
Richard M Dodds ◽  
Avan A Sayer

Abstract Introduction Frailty is the loss of ability to withstand a physiological stressor, and is associated with multiple adverse outcomes in older people. Trials to prevent or ameliorate frailty are in their infancy. A range of different outcome measures have been proposed, but current measures require either large sample sizes, long follow-up, or do not directly measure the construct of frailty. Methods We propose a composite outcome for frailty prevention trials, comprising progression to the frail state, death, or being too unwell to continue in a trial. To determine likely event rates, we used data from the English Longitudinal Study for Ageing, collected 4 years apart. We calculated transition rates between non-frail, prefrail, frail or loss to follow up due to death or illness. We used Markov state transition models to interpolate one- and two-year transition rates, and performed sample size calculations for a range of differences in transition rates using simple and composite outcomes. Results The frailty category was calculable for 4650 individuals at baseline (2226 non-frail, 1907 prefrail, 517 frail); at follow up, 1282 were non-frail, 1108 were prefrail, 318 were frail and 1936 had dropped out or were unable to complete all tests for frailty. Transition probabilities for those prefrail at baseline, measured at wave 4 were respectively 0.176, 0.286, 0.096 and 0.442 to non-frail, prefrail, frail and dead/dropped out. Interpolated transition probabilities were 0.159, 0.494, 0.113 and 0.234 at two years, and 0.108, 0.688, 0.087 and 0.117 at one year. Required sample sizes for a two-year outcome were between 1000 and 7200 for transition from prefrailty to frailty alone, 250 to 1600 for transition to the composite measure, and 75 to 350 using the composite measure with an ordinal logistic regression approach. Conclusion Use of a composite outcome for frailty trials offers reduced sample sizes and could ameliorate the effect of high loss to follow up inherent in such trials due to death and illness.


Author(s):  
Patrick Royston

Most randomized controlled trials with a time-to-event outcome are designed and analyzed assuming proportional hazards of the treatment effect. The sample-size calculation is based on a log-rank test or the equivalent Cox test. Nonproportional hazards are seen increasingly in trials and are recognized as a potential threat to the power of the log-rank test. To address the issue, Royston and Parmar (2016, BMC Medical Research Methodology 16: 16) devised a new “combined test” of the global null hypothesis of identical survival curves in each trial arm. The test, which combines the conventional Cox test with a new formulation, is based on the maximal standardized difference in restricted mean survival time (RMST) between the arms. The test statistic is based on evaluations of RMST over several preselected time points. The combined test involves the minimum p-value across the Cox and RMST-based tests, appropriately standardized to have the correct null distribution. In this article, I outline the combined test and introduce a command, stctest, that implements the combined test. I point the way to additional tools currently under development for power and sample-size calculation for the combined test.


2016 ◽  
Vol 27 (7) ◽  
pp. 2132-2141 ◽  
Author(s):  
Guogen Shan

In an agreement test between two raters with binary endpoints, existing methods for sample size calculation are always based on asymptotic approaches that use limiting distributions of a test statistic under null and alternative hypotheses. These calculated sample sizes may be not reliable due to the unsatisfactory type I error control of asymptotic approaches. We propose a new sample size calculation based on exact approaches which control for the type I error rate. The two exact approaches are considered: one approach based on maximization and the other based on estimation and maximization. We found that the latter approach is generally more powerful than the one based on maximization. Therefore, we present the sample size calculation based on estimation and maximization. A real example from a clinical trial to diagnose low back pain of patients is used to illustrate the two exact testing procedures and sample size determination.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jaclyn M. Beca ◽  
Kelvin K. W. Chan ◽  
David M. J. Naimark ◽  
Petros Pechlivanoglou

Abstract Introduction Extrapolation of time-to-event data from clinical trials is commonly used in decision models for health technology assessment (HTA). The objective of this study was to assess performance of standard parametric survival analysis techniques for extrapolation of time-to-event data for a single event from clinical trials with limited data due to small samples or short follow-up. Methods Simulated populations with 50,000 individuals were generated with an exponential hazard rate for the event of interest. A scenario consisted of 5000 repetitions with six sample size groups (30–500 patients) artificially censored after every 10% of events observed. Goodness-of-fit statistics (AIC, BIC) were used to determine the best-fitting among standard parametric distributions (exponential, Weibull, log-normal, log-logistic, generalized gamma, Gompertz). Median survival, one-year survival probability, time horizon (1% survival time, or 99th percentile of survival distribution) and restricted mean survival time (RMST) were compared to population values to assess coverage and error (e.g., mean absolute percentage error). Results The true exponential distribution was correctly identified using goodness-of-fit according to BIC more frequently compared to AIC (average 92% vs 68%). Under-coverage and large errors were observed for all outcomes when distributions were specified by AIC and for time horizon and RMST with BIC. Error in point estimates were found to be strongly associated with sample size and completeness of follow-up. Small samples produced larger average error, even with complete follow-up, than large samples with short follow-up. Correctly specifying the event distribution reduced magnitude of error in larger samples but not in smaller samples. Conclusions Limited clinical data from small samples, or short follow-up of large samples, produce large error in estimates relevant to HTA regardless of whether the correct distribution is specified. The associated uncertainty in estimated parameters may not capture the true population values. Decision models that base lifetime time horizon on the model’s extrapolated output are not likely to reliably estimate mean survival or its uncertainty. For data with an exponential event distribution, BIC more reliably identified the true distribution than AIC. These findings have important implications for health decision modelling and HTA of novel therapies seeking approval with limited evidence.


Author(s):  
Friederike M.-S. Barthel ◽  
Patrick Royston ◽  
Abdel Babiker

Royston and Babiker (2002) presented a menu-driven Stata program for the calculation of sample size or power for complex clinical trial designs under a survival time or binary outcome. In the present article, the package is updated to Stata 8 under the new name ART. Furthermore, the program has been extended to incorporate noninferiority designs and provides more detailed output. This package is the only realistic sample size tool for survival studies available in Stata.


2019 ◽  
Vol 101-B (11) ◽  
pp. 1408-1415 ◽  
Author(s):  
Peter D. Hull ◽  
Daud T. S. Chou ◽  
Sophie Lewis ◽  
Andrew D. Carrothers ◽  
Joseph M. Queally ◽  
...  

Aims The aim of this study was to assess the feasibility of conducting a full-scale, appropriately powered, randomized controlled trial (RCT) comparing internal fracture fixation and distal femoral replacement (DFR) for distal femoral fractures in older patients. Patients and Methods Seven centres recruited patients into the study. Patients were eligible if they were greater than 65 years of age with a distal femoral fracture, and if the surgeon felt that they were suitable for either form of treatment. Outcome measures included the patients’ willingness to participate, clinicians’ willingness to recruit, rates of loss to follow-up, the ability to capture data, estimates of standard deviation to inform the sample size calculation, and the main determinants of cost. The primary clinical outcome measure was the EuroQol five-dimensional index (EQ-5D) at six months following injury. Results Of 36 patients who met the inclusion criteria, five declined to participate and eight were not recruited, leaving 23 patients to be randomized. One patient withdrew before surgery. Of the remaining patients, five (23%) withdrew during the follow-up period and six (26%) died. A 100% response rate was achieved for the EQ-5D at each follow-up point, excluding one missing datapoint at baseline. In the DFR group, the mean cost of the implant outweighed the mean cost of many other items, including theatre time, length of stay, and readmissions. For a powered RCT, a total sample size of 1400 would be required with 234 centres recruiting over three years. At six months, the EQ-5D utility index was lower in the DFR group. Conclusion This study found that running a full-scale trial in this country would not be feasible. However, it may be feasible to undertake an international multicentre trial, and our findings provide some guidance about the power of such a study, the numbers required, and some challenges that should be anticipated and addressed. Cite this article: Bone Joint J 2019;101-B:1408–1415.


Sign in / Sign up

Export Citation Format

Share Document