Event-specific win ratios and testing with terminal and non-terminal events

2020 ◽  
pp. 174077452097240
Author(s):  
Song Yang ◽  
James Troendle

Background/aims In clinical trials, the primary outcome is often a composite endpoint defined as time to the first occurrence of either death or certain non-fatal events. Thus, a portion of available data would be omitted. In the win ratio approach, priorities are given to the clinically more important events, and more data are used. However, its power may be low if the treatment effect is predominantly on the non-terminal event. Methods We propose event-specific win ratios obtained separately on the terminal and non-terminal events. They can then be used to form global tests such as a linear combination test, the maximum test, or a [Formula: see text] test. Results In simulations, these tests often improve the power of the original win ratio test. Furthermore, when the terminal and non-terminal events experience differential treatment effects, the new tests are often more powerful than the log-rank test for the composite outcome. Whether the treatment effect is primarily on the terminal events or not, the new tests based on the event-specific win ratios can be useful when different types of events are present. The new tests can reject the null hypothesis of no difference in the event distributions in the two treatment arms with the terminal event showing detrimental effect and the non-terminal event showing beneficial effect. The maximum test and the [Formula: see text] test do not have test-estimation coherency, but the maximum test has the coherency that the global null is rejected if and only if the null for one of the event types is rejected. When applied to data from the trial Aldosterone Antagonist Therapy for Adults With Heart Failure and Preserved Systolic Function (TOPCAT), the new tests all reject the null hypothesis of no treatment effect while both the log-rank test used in TOPCAT and the original win ratio approach show non-significant p-values. Conclusion Whether the treatment effect is primarily on the terminal events or the non-terminal events, the maximum test based on the event-specific win ratios can be a useful alternative for testing treatment effect in clinical trials with time-to-event outcomes when different types of events are present.

2020 ◽  
Vol 29 (12) ◽  
pp. 3525-3532
Author(s):  
Thomas J Prior

Clinical trials in oncology often involve the statistical analysis of time-to-event data such as progression-free survival or overall survival to determine the benefit of a treatment or therapy. The log-rank test is commonly used to compare time-to-event data from two groups. The log-rank test is especially powerful when the two groups have proportional hazards. However, survival curves encountered in oncology studies that differ from one another do not always differ by having proportional hazards; in such instances, the log-rank test loses power, and the survival curves are said to have “non-proportional hazards”. This non-proportional hazards situation occurs for immunotherapies in oncology; immunotherapies often have a delayed treatment effect when compared to chemotherapy or radiation therapy. To correctly identify and deliver efficacious treatments to patients, it is important in oncology studies to have available a statistical test that can detect the difference in survival curves even in a non-proportional hazards situation such as one caused by delayed treatment effect. An attempt to address this need was the “max-combo” test, which was originally described only for a single analysis timepoint; this article generalizes that test to preserve type I error when there are one or more interim analyses, enabling efficacious treatments to be identified and made available to patients more rapidly.


Author(s):  
Patrick Royston

Randomized controlled trials with a time-to-event outcome are usually designed and analyzed assuming proportional hazards (PH) of the treatment effect. The sample-size calculation is based on a log-rank test or the nearly identical Cox test, henceforth called the Cox/log-rank test. Nonproportional hazards (non-PH) has become more common in trials and is recognized as a potential threat to interpreting the trial treatment effect and the power of the log-rank test—hence to the success of the trial. To address the issue, in 2016, Royston and Parmar ( BMC Medical Research Methodology 16: 16) proposed a “combined test” of the global null hypothesis of identical survival curves in each trial arm. The Cox/log-rank test is combined with a new test derived from the maximal standardized difference in restricted mean survival time (RMST) between the trial arms. The test statistic is based on evaluations of the between-arm difference in RMST over several preselected time points. The combined test involves the minimum p-value across the Cox/log-rank and RMST-based tests, appropriately standardized to have the correct distribution under the global null hypothesis. In this article, I introduce a new command, power_ct, that uses simulation to implement power and sample-size calculations for the combined test. power_ct supports designs with PH or non-PH of the treatment effect. I provide examples in which the power of the combined test is compared with that of the Cox/log-rank test under PH and non-PH scenarios. I conclude by offering guidance for sample-size calculations in time-to-event trials to allow for possible non-PH.


2019 ◽  
Vol 111 (11) ◽  
pp. 1186-1191 ◽  
Author(s):  
Julien Péron ◽  
Alexandre Lambert ◽  
Stephane Munier ◽  
Brice Ozenne ◽  
Joris Giai ◽  
...  

Abstract Background The treatment effect in survival analysis is commonly quantified as the hazard ratio, and tested statistically using the standard log-rank test. Modern anticancer immunotherapies are successful in a proportion of patients who remain alive even after a long-term follow-up. This new phenomenon induces a nonproportionality of the underlying hazards of death. Methods The properties of the net survival benefit were illustrated using the dataset from a trial evaluating ipilimumab in metastatic melanoma. The net survival benefit was then investigated through simulated datasets under typical scenarios of proportional hazards, delayed treatment effect, and cure rate. The net survival benefit test was computed according to the value of the minimal survival difference considered clinically relevant. As comparators, the standard and the weighted log-rank tests were also performed. Results In the illustrative dataset, the net survival benefit favored ipilimumab [Δ(0) = 15.8%, 95% confidence interval = 4.6% to 27.3%, P = .006]. This favorable effect was maintained when the analysis was focused on long-term survival differences (eg, >12 months, Δ(12) = 12.5% (95% confidence interval = 4.4% to 20.6%, P = .002). Under the scenarios of a delayed treatment effect and cure rate, the power of the net survival benefit test compared favorably to the standard log-rank test power and was comparable to the power of the weighted log-rank test for large values of the threshold of clinical relevance. Conclusion The net long-term survival benefit is a measure of treatment effect that is meaningful whether or not hazards are proportional. The associated statistical test is more powerful than the standard log-rank test when a delayed treatment effect is anticipated.


Author(s):  
Patrick Royston

Most randomized controlled trials with a time-to-event outcome are designed and analyzed assuming proportional hazards of the treatment effect. The sample-size calculation is based on a log-rank test or the equivalent Cox test. Nonproportional hazards are seen increasingly in trials and are recognized as a potential threat to the power of the log-rank test. To address the issue, Royston and Parmar (2016, BMC Medical Research Methodology 16: 16) devised a new “combined test” of the global null hypothesis of identical survival curves in each trial arm. The test, which combines the conventional Cox test with a new formulation, is based on the maximal standardized difference in restricted mean survival time (RMST) between the arms. The test statistic is based on evaluations of RMST over several preselected time points. The combined test involves the minimum p-value across the Cox and RMST-based tests, appropriately standardized to have the correct null distribution. In this article, I outline the combined test and introduce a command, stctest, that implements the combined test. I point the way to additional tools currently under development for power and sample-size calculation for the combined test.


2018 ◽  
Vol 15 (3) ◽  
pp. 305-312 ◽  
Author(s):  
Song Yang ◽  
Walter T Ambrosius ◽  
Lawrence J Fine ◽  
Adam P Bress ◽  
William C Cushman ◽  
...  

Background/aims In clinical trials with time-to-event outcomes, usually the significance tests and confidence intervals are based on a proportional hazards model. Thus, the temporal pattern of the treatment effect is not directly considered. This could be problematic if the proportional hazards assumption is violated, as such violation could impact both interim and final estimates of the treatment effect. Methods We describe the application of inference procedures developed recently in the literature for time-to-event outcomes when the treatment effect may or may not be time-dependent. The inference procedures are based on a new model which contains the proportional hazards model as a sub-model. The temporal pattern of the treatment effect can then be expressed and displayed. The average hazard ratio is used as the summary measure of the treatment effect. The test of the null hypothesis uses adaptive weights that often lead to improvement in power over the log-rank test. Results Without needing to assume proportional hazards, the new approach yields results consistent with previously published findings in the Systolic Blood Pressure Intervention Trial. It provides a visual display of the time course of the treatment effect. At four of the five scheduled interim looks, the new approach yields smaller p values than the log-rank test. The average hazard ratio and its confidence interval indicates a treatment effect nearly a year earlier than a restricted mean survival time–based approach. Conclusion When the hazards are proportional between the comparison groups, the new methods yield results very close to the traditional approaches. When the proportional hazards assumption is violated, the new methods continue to be applicable and can potentially be more sensitive to departure from the null hypothesis.


Sign in / Sign up

Export Citation Format

Share Document