Comparison of four sequential methods allowing for early stopping of comparative clinical trials

2000 ◽  
Vol 98 (5) ◽  
pp. 569-578 ◽  
Author(s):  
Véronique SEBILLE ◽  
Eric BELLISSANT

Phase III trials aim to assess whether a new treatment has superior efficacy than a standard treatment. Sequential methods, such as the sequential probability ratio test (SPRT), the triangular test (TT) and so-called one-parameter boundaries (OPB), now allow early stopping of such trials, both in the case of efficacy (alternative hypothesis; H1) and in the case of lack of efficacy (null hypothesis; H0). We compared the statistical properties of the SPRT and the TT, and of OPB with Pocock (OPBΔ = 0.5) and O'Brien and Fleming (OPBΔ = 0) type boundaries, in the setting of one-sided comparative trials with normal response. We studied the type I error (α), power (1-β), average sample number (ASN) and 90th percentile (P90) of the number of patients required to reach a conclusion using simulations. The four tests were also compared with the corresponding single-stage design (SSD). All sequential tests display α and 1-β close to nominal values and, as compared with SSD, allow important decreases in ASN: for example, -48%, -42%, -40% and -31% under H0 and H1 for SPRT, TT, OPBΔ = 0.5 and OPBΔ = 0 respectively. For situations between H0 and H1, ASNs of all sequential tests were still smaller than the sample size required by SSD, with the TT displaying the largest decrease (-25%). The P90s of the TT and OPBΔ = 0 under H0 and H1 were smaller than the P90s of the SPRT and OPBΔ = 0.5, which were similar to the sample size required by SSD. If all sequential tests display approximately similar features, the TT is the most appealing regarding decreases in sample size, especially for situations between H0 and H1.

2016 ◽  
Vol 14 (1) ◽  
pp. 48-58 ◽  
Author(s):  
Qiang Zhang ◽  
Boris Freidlin ◽  
Edward L Korn ◽  
Susan Halabi ◽  
Sumithra Mandrekar ◽  
...  

Background: Futility (inefficacy) interim monitoring is an important component in the conduct of phase III clinical trials, especially in life-threatening diseases. Desirable futility monitoring guidelines allow timely stopping if the new therapy is harmful or if it is unlikely to demonstrate to be sufficiently effective if the trial were to continue to its final analysis. There are a number of analytical approaches that are used to construct futility monitoring boundaries. The most common approaches are based on conditional power, sequential testing of the alternative hypothesis, or sequential confidence intervals. The resulting futility boundaries vary considerably with respect to the level of evidence required for recommending stopping the study. Purpose: We evaluate the performance of commonly used methods using event histories from completed phase III clinical trials of the Radiation Therapy Oncology Group, Cancer and Leukemia Group B, and North Central Cancer Treatment Group. Methods: We considered published superiority phase III trials with survival endpoints initiated after 1990. There are 52 studies available for this analysis from different disease sites. Total sample size and maximum number of events (statistical information) for each study were calculated using protocol-specified effect size, type I and type II error rates. In addition to the common futility approaches, we considered a recently proposed linear inefficacy boundary approach with an early harm look followed by several lack-of-efficacy analyses. For each futility approach, interim test statistics were generated for three schedules with different analysis frequency, and early stopping was recommended if the interim result crossed a futility stopping boundary. For trials not demonstrating superiority, the impact of each rule is summarized as savings on sample size, study duration, and information time scales. Results: For negative studies, our results show that the futility approaches based on testing the alternative hypothesis and repeated confidence interval rules yielded less savings (compared to the other two rules). These boundaries are too conservative, especially during the first half of the study (<50% of information). The conditional power rules are too aggressive during the second half of the study (>50% of information) and may stop a trial even when there is a clinically meaningful treatment effect. The linear inefficacy boundary with three or more interim analyses provided the best results. For positive studies, we demonstrated that none of the futility rules would have stopped the trials. Conclusion: The linear inefficacy boundary futility approach is attractive from statistical, clinical, and logistical standpoints in clinical trials evaluating new anti-cancer agents.


2016 ◽  
Vol 27 (4) ◽  
pp. 1115-1127 ◽  
Author(s):  
Stavros Nikolakopoulos ◽  
Kit CB Roes ◽  
Ingeborg van der Tweel

Sequential monitoring is a well-known methodology for the design and analysis of clinical trials. Driven by the lower expected sample size, recent guidelines and published research suggest the use of sequential methods for the conduct of clinical trials in rare diseases. However, the vast majority of the developed and most commonly used sequential methods relies on asymptotic assumptions concerning the distribution of the test statistics. It is not uncommon for trials in (very) rare diseases to be conducted with only a few decades of patients and the use of sequential methods that rely on large-sample approximations could inflate the type I error probability. Additionally, the setting of a rare disease could make the traditional paradigm of designing a clinical trial (deciding on the sample size given type I and II errors and anticipated effect size) irrelevant. One could think of the situation where the number of patients available has a maximum and this should be utilized in the most efficient way. In this work, we evaluate the operational characteristics of sequential designs in the setting of very small to moderate sample sizes with normally distributed outcomes and demonstrate the necessity of simple corrections of the critical boundaries. We also suggest a method for deciding on an optimal sequential design given a maximum sample size and some (data driven or based on expert opinion) prior belief on the treatment effect.


2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 103-103
Author(s):  
Aaron James Scott ◽  
Steven J. Cohen ◽  
Atrayee Basu Mallick ◽  
Efrat Dotan ◽  
Philip Jordan Gold ◽  
...  

103 Background: Therapeutic resistance to antiangiogenics in metastatic colorectal cancer (mCRC) inevitably develops via multiple mechanisms including upregulation of the MET kinase pathway. Cabozantinib, an oral multityrosine kinase inhibitor targeting MET, AXL, and VEGFR, demonstrated significant anti-tumor activity in CRC xenograft and cell line models. Methods: A single-arm, two-stage phase II study was conducted at 7 AGICC centers nationwide. 44 patients (pts) with mCRC who had progressed on or were intolerant of standard of care agents were treated with cabozantinib 60 mg daily in q3 wk cycles. The primary endpoint was 12-wk PFS rate. Based on the control arm of phase III CORRECT study, the Kaplan-Meier 12-wk PFS rate estimate was 13% and served as the null hypothesis. This study was powered at 0.906 to detect the alternative hypothesis of 12-wk PFS rate of 33% with a type I error rate of 0.044. Secondary endpoints were safety, RR, OS, and retrospective analysis of PFS and RR based on RAS, BRAF, and PIK3CA mutation status. Results: 44 pts were enrolled and 34 pts were response-evaluable as having undergone at least the first 6-wk restaging scan. 10 pts discontinued treatment prior to the first 6-wk scan due to clinical disease progression. Median number of cycles was 4 and median follow-up was 2.5 months. As of data-cutoff 8/23/2019, 55 Grade 3/4 AEs were reported with the most common being hypertension, fatigue, diarrhea, pain, HFS, nausea, vomiting, and proteinuria. 32 SAEs occurred in 18 pts. 5 Grade 5 AEs were reported: disease progression (3), disseminated intravascular coagulopathy, and bowel perforation. 15 pts (34%) achieved ≥ 12-wk PFS and 8 patients remain on treatment. Best response was 1 PR and 31 SD with a DCR at 6 wks of 72.7%. Of the pts who achieved ≥ 12-wk PFS, 12 had left-sided primary tumors, 5 had a RAS mutation, 1 had a PIK3CA mutation, and all pts were BRAF WT and MSI stable. Conclusions: Cabozantinib was deemed safe and demonstrates encouraging efficacy in a heavily pretreated mCRC pt population. These results support further investigation of cabozantinib in mCRC. Clinical trial information: NCT03542877.


1996 ◽  
Vol 26 (4) ◽  
pp. 525-536 ◽  
Author(s):  
Yue Wang ◽  
Valerie M. LeMay

To determine whether an existing tree volume estimation equation is acceptable for application to a new species, a new region, or a local area, an accuracy test could be carried out. Three sequential accuracy testing plans (SATP) were developed, along with approximate operating characteristic and average sample number curves. These tests are extensions of fixed sample size accuracy tests using sequential probability ratio tests to obtain sequential sampling tests. Using Monte Carlo simulations with normally distributed error terms, the SATP procedures were shown to be reliable for classifying existing volume models as acceptable or not acceptable with achieved error probabilities (type I and type II error probabilities) that were lower than the nominal values specified prior to testing. Also, on average, the use of the SATP procedures resulted in a 40 to 50% reduction in the expected sample size compared with that required for an equally reliable fixed sample size procedure. A detailed example is given to illustrate the application of the SATP procedures.


2012 ◽  
Vol 30 (30_suppl) ◽  
pp. 34-34 ◽  
Author(s):  
Sumithra J. Mandrekar ◽  
Ming-Wen An ◽  
Daniel J. Sargent

34 Background: Phase II clinical trials aim to identify promising experimental regimens for further testing in phase III trials. Testing targeted therapies with predictive biomarkers mandates efficient trial designs. Current biomarker-based trial designs, including the enrichment, all-comers, and adaptive designs, randomize patients to receive treatment or not throughout the entire duration of the trial. Recognizing the need for randomization yet acknowledging the possibility of promising but nonconclusive results after a preplanned interim analysis (IA), we propose a two-stage phase II design that allows for the possibility of direct assignment (i.e., stop randomization and assign all patients to the experimental arm in stage II) based on IA results. Methods: Using simulations, we compared properties of the direct assignment option design to a 1:1 randomized phase II design and assessed the impact of the timing of IA (after 33%, 50%, or 67% of accrual) and number of IA (one versus two with option for direct assignment at the first and second) over a range of response rate ratios (between 1.0 and 3.0). Results: Between 12% and 30% of the trials (out of 6,000 simulated trials) adopt direct assignment in stage II, with direct adoption depending on the treatment effect size and specified type I error rate (TIER). The direct assignment option design has minimal loss in power (<1.8%) and minimal increase in T1ER (<2.1%) compared to a 1:1 randomized design. The maximum loss in power across possible timings of IA was <1.2%. For the direct assignment option design, there was a 20%-50% increase in the number of patients treated on the experimental (vs. control) arm for the 1 IA case, and 40%-100% increase for the 2 IA case. Conclusions: Testing predictive biomarkers in clinical trials requires new design strategies. In the spectrum of phase II designs from adaptive to balanced randomized all-comers or enrichment designs, the direct assignment design provides a middle ground with desirable statistical properties that may appeal to both clinicians and patients.


2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 6576-6576
Author(s):  
Satoshi Teramukai ◽  
Takashi Daimon ◽  
Sarah Zohar

6576 Background: The aim of phase II trials is to determine if a new treatment is promising for further testing in confirmatory clinical trials. Most phase II clinical trials are designed as single-arm trials using a binary outcome with or without interim monitoring for early stopping. In this context, we propose a Bayesian adaptive design denoted as PSSD, predictive sample size selection design (Statistics in Medicine 2012;31:4243-4254). Methods: The design allows for sample size selection followed by any planned interim analyses for early stopping of a trial, together with sample size determination before starting the trial. In the PSSD, we determined the sample size using the predictive probability criterion with two kinds of prior distributions, that is, an ‘analysis prior’ used to compute posterior probabilities and a ‘design prior’ used to obtain prior predictive distributions. In the sample size determination, we provide two sample sizes, that is, N and Nmax, using two types of design priors. At each interim analysis, we calculate the predictive probability of achieving a successful result at the end of the trial using analysis prior in order to stop the trial in case of low or high efficacy, and we select an optimal sample size, that is, either N or Nmax as needed, on the basis of the predictive probabilities. Results: We investigated the operating characteristics through simulation studies, and the PSSD retrospectively applies to a lung cancer clinical trial. As the number of interim looks increases, the probability of type I errors slightly decreases, and that of type II errors increases. The type I error probabilities of the probabilities of the proposed PSSD are almost similar to those of the non-adaptive design. The type II error probabilities in the PSSD are between those of the two fixed sample size (N or Nmax) designs. Conclusions: From a practical standpoint, the proposed design could be useful in phase II single-arm clinical trials with a binary endpoint. In the near future, this approach will be implemented in actual clinical trials to assess its usefulness and to extend it to more complicated clinical trials.


2015 ◽  
Vol 26 (6) ◽  
pp. 2812-2820 ◽  
Author(s):  
Songshan Yang ◽  
James A Cranford ◽  
Runze Li ◽  
Robert A Zucker ◽  
Anne Buu

This study proposes a time-varying effect model that can be used to characterize gender-specific trajectories of health behaviors and conduct hypothesis testing for gender differences. The motivating examples demonstrate that the proposed model is applicable to not only multi-wave longitudinal studies but also short-term studies that involve intensive data collection. The simulation study shows that the accuracy of estimation of trajectory functions improves as the sample size and the number of time points increase. In terms of the performance of the hypothesis testing, the type I error rates are close to their corresponding significance levels under all combinations of sample size and number of time points. Furthermore, the power increases as the alternative hypothesis deviates more from the null hypothesis, and the rate of this increasing trend is higher when the sample size and the number of time points are larger.


2017 ◽  
Vol 54 (3) ◽  
pp. 419-427 ◽  
Author(s):  
Rukhsana Liza ◽  
Gordon A. Fenton ◽  
Craig B. Lake ◽  
D.V. Griffiths

This paper presents an analytical approach to selecting the sample size required to achieve acceptable quality control in a cement-based “solidification/stabilization” construction cell program intended for the treatment–containment of contaminated soils. The proposed approach is based on the hypothesis test that the cell does not have an acceptably low hydraulic conductivity (the null hypothesis) versus the alternative hypothesis that it does. Analytical solutions are developed to compute the probabilities of both type I (mistakenly rejecting the null hypothesis) and type II (mistakenly failing to reject the null hypothesis) errors as functions of the number of samples and the statistics of the hydraulic conductivity field. The analytical results are validated by Monte Carlo simulations and are then used to develop rational sampling requirements. An example is presented to illustrate how the proposed approach can be used in practice to assess the required sample size for the quality control program of cement-based S/S construction cells.


1983 ◽  
Vol 13 (6) ◽  
pp. 1197-1203 ◽  
Author(s):  
Gary W. Fowler

Monte Carlo operating characteristic (OC) and average sample number (ASN) functions were compared with Wald's OC and ASN equations for sequential sampling plans based on Wald's sequential probability ratio test (SPRT) using the binomial, negative binomial, normal, and Poisson distributions. This comparison showed that the errors inherent in Wald's equations as a result of "overshooting" the decision boundaries of the SPRT can be large. Relative errors increased for the OC and ASN equations as the difference between the null (θ0)) and alternative (θ1) test parameter values increased. Relative errors also increased for the ASN equation as the probabilities of type I (α) and type II (β) errors increased. For discrete distributions, the relative errors also increased as θ0 increased with θ1/θ0 fixed. Wald's equations, in general, overestimate the true error probabilities and underestimate the true ASN. For the values of θ0, θ1, α, and β used in many sequential sampling plans in forestry, Wald's equations may not be adequate. For those cases where the errors in Wald's equations are important compared with the other errors associated with the sampling plan, two alternative Monte Carlo OC and ASN functions are proposed.


Author(s):  
Hyun Kang

Appropriate sample size calculation and power analysis have become major issues in research and publication processes. However, the complexity and difficulty of calculating sample size and power require broad statistical knowledge, there is a shortage of personnel with programming skills, and commercial programs are often too expensive to use in practice. The review article aimed to explain the basic concepts of sample size calculation and power analysis; the process of sample estimation; and how to calculate sample size using G*Power software (latest ver. 3.1.9.7; Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany) with 5 statistical examples. The null and alternative hypothesis, effect size, power, alpha, type I error, and type II error should be described when calculating the sample size or power. G*Power is recommended for sample size and power calculations for various statistical methods (F, t, χ2, Z, and exact tests), because it is easy to use and free. The process of sample estimation consists of establishing research goals and hypotheses, choosing appropriate statistical tests, choosing one of 5 possible power analysis methods, inputting the required variables for analysis, and selecting the “Calculate” button. The G*Power software supports sample size and power calculation for various statistical methods (F, t, χ2, z, and exact tests). This software is helpful for researchers to estimate the sample size and to conduct power analysis.


Sign in / Sign up

Export Citation Format

Share Document