Using the Geometric Average Hazard Ratio in Sample Size Calculation for Time-to-event Data With Composite Endpoints

Abstract Background: Sample size calculation is a key point in the design of a randomized controlled trial. With time-to-event outcomes, it’s often based on the logrank test. We provide a sample size calculation method for a composite endpoint (CE) based on the geometric average hazard ratio (gAHR) in case the proportional hazards assumption can be assumed to hold for the components, but not for the CE. Methods: The required number of events, sample size and power formulae are based on the non-centrality parameter of the logrank test under the alternative hypothesis which is a function of the gAHR. We use the web platform, CompARE, for the sample size computations. A simulation study evaluates the empirical power of the logrank test for the CE based on the sample size in terms of the gAHR. We consider different values of the component hazard ratios, the probabilities of observing the events in the control group and the degrees of association between the components. We illustrate the sample size computations using two published randomized controlled trials. Their primary CEs are, respectively, progression-free survival (time to progression of disease or death) and the composite of bacteriologically confirmed treatment failure or Staphilococcus aureus related death by 12 weeks. Results: For a target power of 0.80, the simulation study provided mean (± SE) empirical powers equal to 0.799 (±0.004) and 0.798 (±0.004) in the exponential and non-exponential settings, respectively. The power was attained in more than 95% of the simulated scenarios and was always above 0.78, regardless of compliance with the proportional-hazard assumption.Conclusions: The geometric average hazard ratio as an effect measure for a composite endpoint has a meaningful interpretation in the case of non-proportional hazards. Furthermore it is the natural effect measure when using the logrank test to compare the hazard rates of two groups and should be used instead of the standard hazard ratio.

Download Full-text

The Average Hazard Ratio – A Good Effect Measure for Time-to-event Endpoints when the Proportional Hazard Assumption is Violated?

Methods of Information in Medicine ◽

10.3414/me17-01-0058 ◽

2018 ◽

Vol 57 (03) ◽

pp. 089-100 ◽

Cited By ~ 1

Author(s):

Werner Brannath ◽

Matthias Brückner ◽

Meinhard Kieser ◽

Geraldine Rauch

Keyword(s):

Proportional Hazards ◽

Weighting Function ◽

Hazard Ratio ◽

Group Differences ◽

Proportional Hazard ◽

Time To Event ◽

Logrank Test ◽

Proportional Hazard Assumption ◽

Effect Measure ◽

Average Hazard Ratio

Summary Background: In many clinical trial applications, the endpoint of interest corresponds to a time-to-event endpoint. In this case, group differences are usually expressed by the hazard ratio. Group differences are commonly assessed by the logrank test, which is optimal under the proportional hazard assumption. However, there are many situations in which this assumption is violated. Especially in applications were a full population and several subgroups or a composite time-to-first-event endpoint and several components are considered, the proportional hazard assumption usually does not simultaneously hold true for all test problems under investigation. As an alternative effect measure, Kalbfleisch and Prentice proposed the so-called ‘average hazard ratio’. The average hazard ratio is based on a flexible weighting function to modify the influence of time and has a meaningful interpretation even in the case of non-proportional hazards. Despite this favorable property, it is hardly ever used in practice, whereas the standard hazard ratio is commonly reported in clinical trials regardless of whether the proportional hazard assumption holds true or not. Objectives: There exist two main approaches to construct corresponding estimators and tests for the average hazard ratio where the first relies on weighted Cox regression and the second on a simple plug-in estimator. The aim of this work is to give a systematic comparison of these two approaches and the standard logrank test for different time-toevent settings with proportional and nonproportional hazards and to illustrate the pros and cons in application. Methods: We conduct a systematic comparative study based on Monte-Carlo simulations and by a real clinical trial example. Results: Our results suggest that the properties of the average hazard ratio depend on the underlying weighting function. The two approaches to construct estimators and related tests show very similar performance for adequately chosen weights. In general, the average hazard ratio defines a more valid effect measure than the standard hazard ratio under non-proportional hazards and the corresponding tests provide a power advantage over the common logrank test. Conclusions: As non-proportional hazards are often met in clinical practice and the average hazard ratio tests often outperform the common logrank test, this approach should be used more routinely in applications.

Download Full-text

Sample size calculation for small sample single-arm trials for time-to-event data: Logrank test with normal approximation or test statistic based on exact chi-square distribution?

Contemporary Clinical Trials Communications ◽

10.1016/j.conctc.2019.100360 ◽

2019 ◽

Vol 15 ◽

pp. 100360 ◽

Cited By ~ 4

Author(s):

Milind A. Phadnis

Keyword(s):

Sample Size ◽

Normal Approximation ◽

Sample Size Calculation ◽

Small Sample ◽

Event Data ◽

Time To Event ◽

Test Statistic ◽

Chi Square ◽

Logrank Test ◽

Time To Event Data

Download Full-text

A unified approach to power and sample size determination for log-rank tests under proportional and nonproportional hazards

Statistical Methods in Medical Research ◽

10.1177/0962280220988570 ◽

2021 ◽

pp. 096228022098857

Author(s):

Yongqiang Tang

Keyword(s):

Sample Size ◽

Proportional Hazards ◽

Sample Size Calculation ◽

Unified Approach ◽

Rank Tests ◽

Trial Duration ◽

Nonproportional Hazards ◽

Equivalence Trials ◽

Superiority Trials ◽

Log Rank Tests

Log-rank tests have been widely used to compare two survival curves in biomedical research. We describe a unified approach to power and sample size calculation for the unweighted and weighted log-rank tests in superiority, noninferiority and equivalence trials. It is suitable for both time-driven and event-driven trials. A numerical algorithm is suggested. It allows flexible specification of the patient accrual distribution, baseline hazards, and proportional or nonproportional hazards patterns, and enables efficient sample size calculation when there are a range of choices for the patient accrual pattern and trial duration. A confidence interval method is proposed for the trial duration of an event-driven trial. We point out potential issues with several popular sample size formulae. Under proportional hazards, the power of a survival trial is commonly believed to be determined by the number of observed events. The belief is roughly valid for noninferiority and equivalence trials with similar survival and censoring distributions between two groups, and for superiority trials with balanced group sizes. In unbalanced superiority trials, the power depends also on other factors such as data maturity. Surprisingly, the log-rank test usually yields slightly higher power than the Wald test from the Cox model under proportional hazards in simulations. We consider various nonproportional hazards patterns induced by delayed effects, cure fractions, and/or treatment switching. Explicit power formulae are derived for the combination test that takes the maximum of two or more weighted log-rank tests to handle uncertain nonproportional hazards patterns. Numerical examples are presented for illustration.

Download Full-text

Determination of hazard ratio for progression‐free survival considering the tumor assessment schedule in sample size calculation

Pharmaceutical Statistics ◽

10.1002/pst.1973 ◽

2020 ◽

Vol 19 (2) ◽

pp. 126-136

Author(s):

Takanori Tanase

Keyword(s):

Sample Size ◽

Sample Size Calculation ◽

Progression Free Survival ◽

Hazard Ratio ◽

Free Survival

Download Full-text

A Menu-driven Facility for Complex Sample Size Calculation in Randomized Controlled Trials with a Survival or a Binary Outcome

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x0200200204 ◽

2002 ◽

Vol 2 (2) ◽

pp. 151-163 ◽

Cited By ~ 18

Author(s):

Patrick Royston ◽

Abdel Babiker

Keyword(s):

Sample Size ◽

Sample Size Calculation ◽

Binary Outcome ◽

Test Statistic ◽

Logrank Test ◽

Loss To Follow Up ◽

Patient Allocation ◽

Hazard Ratios ◽

Event Distribution

We present a menu-driven Stata program for the calculation of sample size or power for complex clinical trials with a survival time or a binary outcome. The features supported include up to six treatment arms, an arbitrary time-to-event distribution, fixed or time-varying hazard ratios, unequal patient allocation, loss to follow-up, staggered patient entry, and crossover of patients from their allocated treatment to an alternative treatment. The computations of sample size and power are based on the logrank test and are done according to the asymptotic distribution of the logrank test statistic, adjusted appropriately for the design features.

Download Full-text

Sample size calculation for logrank test and prediction of number of events over time

Pharmaceutical Statistics ◽

10.1002/pst.2069 ◽

2020 ◽

Author(s):

Kaifeng Lu

Keyword(s):

Sample Size ◽

Sample Size Calculation ◽

Logrank Test ◽

Over Time

Download Full-text

Power and Sample-Size Analysis for the Royston–Parmar Combined Test in Clinical Trials with a Time-to-Event Outcome: Correction and Program Update

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800414 ◽

2018 ◽

Vol 18 (4) ◽

pp. 995-996 ◽

Cited By ~ 1

Author(s):

Patrick Royston

Keyword(s):

Sample Size ◽

Sample Size Calculation ◽

Estimation Procedure ◽

Worked Examples ◽

Ordinary Least Squares ◽

Size Analysis ◽

Size Estimation ◽

Probit Regression ◽

Time To Event ◽

Combined Test

The changes made to Royston (2018) and to power_ct are i) in section 2.4 ( Sample-size calculation for the combined test), to replace ordinary least-squares (OLS) regression using regress with grouped probit regression using glm; ii) in section 4 ( Examples), to revisit the worked examples of sample-size estimation in light of the revised estimation procedure; and iii) to update the help file entry for the option n( numlist). The updated software is version 1.2.0.

Download Full-text