scholarly journals Using the Geometric Average Hazard Ratio in Sample Size Calculation for Time-to-event Data With Composite Endpoints

Author(s):  
Jordi Cortés Martínez ◽  
Ronald B Geskus ◽  
KyungMann Kim ◽  
Guadalupe Gómez Melis

Abstract Background: Sample size calculation is a key point in the design of a randomized controlled trial. With time-to-event outcomes, it’s often based on the logrank test. We provide a sample size calculation method for a composite endpoint (CE) based on the geometric average hazard ratio (gAHR) in case the proportional hazards assumption can be assumed to hold for the components, but not for the CE. Methods: The required number of events, sample size and power formulae are based on the non-centrality parameter of the logrank test under the alternative hypothesis which is a function of the gAHR. We use the web platform, CompARE, for the sample size computations. A simulation study evaluates the empirical power of the logrank test for the CE based on the sample size in terms of the gAHR. We consider different values of the component hazard ratios, the probabilities of observing the events in the control group and the degrees of association between the components. We illustrate the sample size computations using two published randomized controlled trials. Their primary CEs are, respectively, progression-free survival (time to progression of disease or death) and the composite of bacteriologically confirmed treatment failure or Staphilococcus aureus related death by 12 weeks. Results: For a target power of 0.80, the simulation study provided mean (± SE) empirical powers equal to 0.799 (±0.004) and 0.798 (±0.004) in the exponential and non-exponential settings, respectively. The power was attained in more than 95% of the simulated scenarios and was always above 0.78, regardless of compliance with the proportional-hazard assumption.Conclusions: The geometric average hazard ratio as an effect measure for a composite endpoint has a meaningful interpretation in the case of non-proportional hazards. Furthermore it is the natural effect measure when using the logrank test to compare the hazard rates of two groups and should be used instead of the standard hazard ratio.

2018 ◽  
Vol 57 (03) ◽  
pp. 089-100 ◽  
Author(s):  
Werner Brannath ◽  
Matthias Brückner ◽  
Meinhard Kieser ◽  
Geraldine Rauch

Summary Background: In many clinical trial applications, the endpoint of interest corresponds to a time-to-event endpoint. In this case, group differences are usually expressed by the hazard ratio. Group differences are commonly assessed by the logrank test, which is optimal under the proportional hazard assumption. However, there are many situations in which this assumption is violated. Especially in applications were a full population and several subgroups or a composite time-to-first-event endpoint and several components are considered, the proportional hazard assumption usually does not simultaneously hold true for all test problems under investigation. As an alternative effect measure, Kalbfleisch and Prentice proposed the so-called ‘average hazard ratio’. The average hazard ratio is based on a flexible weighting function to modify the influence of time and has a meaningful interpretation even in the case of non-proportional hazards. Despite this favorable property, it is hardly ever used in practice, whereas the standard hazard ratio is commonly reported in clinical trials regardless of whether the proportional hazard assumption holds true or not. Objectives: There exist two main approaches to construct corresponding estimators and tests for the average hazard ratio where the first relies on weighted Cox regression and the second on a simple plug-in estimator. The aim of this work is to give a systematic comparison of these two approaches and the standard logrank test for different time-toevent settings with proportional and nonproportional hazards and to illustrate the pros and cons in application. Methods: We conduct a systematic comparative study based on Monte-Carlo simulations and by a real clinical trial example. Results: Our results suggest that the properties of the average hazard ratio depend on the underlying weighting function. The two approaches to construct estimators and related tests show very similar performance for adequately chosen weights. In general, the average hazard ratio defines a more valid effect measure than the standard hazard ratio under non-proportional hazards and the corresponding tests provide a power advantage over the common logrank test. Conclusions: As non-proportional hazards are often met in clinical practice and the average hazard ratio tests often outperform the common logrank test, this approach should be used more routinely in applications.


2021 ◽  
pp. 096228022098857
Author(s):  
Yongqiang Tang

Log-rank tests have been widely used to compare two survival curves in biomedical research. We describe a unified approach to power and sample size calculation for the unweighted and weighted log-rank tests in superiority, noninferiority and equivalence trials. It is suitable for both time-driven and event-driven trials. A numerical algorithm is suggested. It allows flexible specification of the patient accrual distribution, baseline hazards, and proportional or nonproportional hazards patterns, and enables efficient sample size calculation when there are a range of choices for the patient accrual pattern and trial duration. A confidence interval method is proposed for the trial duration of an event-driven trial. We point out potential issues with several popular sample size formulae. Under proportional hazards, the power of a survival trial is commonly believed to be determined by the number of observed events. The belief is roughly valid for noninferiority and equivalence trials with similar survival and censoring distributions between two groups, and for superiority trials with balanced group sizes. In unbalanced superiority trials, the power depends also on other factors such as data maturity. Surprisingly, the log-rank test usually yields slightly higher power than the Wald test from the Cox model under proportional hazards in simulations. We consider various nonproportional hazards patterns induced by delayed effects, cure fractions, and/or treatment switching. Explicit power formulae are derived for the combination test that takes the maximum of two or more weighted log-rank tests to handle uncertain nonproportional hazards patterns. Numerical examples are presented for illustration.


Author(s):  
Patrick Royston ◽  
Abdel Babiker

We present a menu-driven Stata program for the calculation of sample size or power for complex clinical trials with a survival time or a binary outcome. The features supported include up to six treatment arms, an arbitrary time-to-event distribution, fixed or time-varying hazard ratios, unequal patient allocation, loss to follow-up, staggered patient entry, and crossover of patients from their allocated treatment to an alternative treatment. The computations of sample size and power are based on the logrank test and are done according to the asymptotic distribution of the logrank test statistic, adjusted appropriately for the design features.


Author(s):  
Patrick Royston

The changes made to Royston (2018) and to power_ct are i) in section 2.4 ( Sample-size calculation for the combined test), to replace ordinary least-squares (OLS) regression using regress with grouped probit regression using glm; ii) in section 4 ( Examples), to revisit the worked examples of sample-size estimation in light of the revised estimation procedure; and iii) to update the help file entry for the option n( numlist). The updated software is version 1.2.0.


Sign in / Sign up

Export Citation Format

Share Document