scholarly journals Using multiple outcomes in intervention studies for improved trade-off between power and type I errors:   the Adjust NVar approach

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 991
Author(s):  
Dorothy V. M. Bishop

Background  The CONSORT guidelines for clinical trials recommend use of a single primary outcome, to guard against the raised risk of false positive findings when multiple measures are considered. It is, however, possible to include a suite of multiple outcomes in an intervention study, while controlling the familywise error rate, if the criterion for rejecting the null hypothesis specifies that N or more of the outcomes reach an agreed level of statistical significance, where N depends on the total number of outcome measures included in the study, and the correlation between them.  Methods  Simulations were run, using a conventional null-hypothesis significance testing approach with alpha set at .05, to explore the case when between 2 and 12 outcome measures are included to compare two groups, with average correlation between measures ranging from zero to .8, and true effect size ranging from 0 to .7. In step 1, a table is created giving the minimum N significant outcomes (MinNSig) that is required for a given set of outcome measures to control the familywise error rate at 5%. In step 2, data are simulated using MinNSig values for each set of correlated outcomes and the resulting proportion of significant results is computed for different sample sizes,correlations, and effect sizes.  Results  The Adjust NVar approach can achieve a more efficient trade-off between power and type I error rate than use of a single outcome when there are three or more moderately intercorrelated outcome variables.  Conclusions  Where it is feasible to have a suite of moderately correlated outcome measures, then this might be a more efficient approach than reliance on a single primary outcome measure in an intervention study. In effect, it builds in an internal replication to the study. This approach can also be used to evaluate published intervention studies.

2015 ◽  
Vol 14 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Rosa J. Meijer ◽  
Thijmen J.P. Krebs ◽  
Jelle J. Goeman

AbstractWe present a multiple testing method for hypotheses that are ordered in space or time. Given such hypotheses, the elementary hypotheses as well as regions of consecutive hypotheses are of interest. These region hypotheses not only have intrinsic meaning but testing them also has the advantage that (potentially small) signals across a region are combined in one test. Because the expected number and length of potentially interesting regions are usually not available beforehand, we propose a method that tests all possible region hypotheses as well as all individual hypotheses in a single multiple testing procedure that controls the familywise error rate. We start at testing the global null-hypothesis and when this hypothesis can be rejected we continue with further specifying the exact location/locations of the effect present. The method is implemented in the


BMJ Open ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. e041319
Author(s):  
Naveen Poonai ◽  
Kamary Coriolano ◽  
Terry Klassen ◽  
Anna Heath ◽  
Maryna Yaskina ◽  
...  

IntroductionUp to 40% of orthopaedic injuries in children require a closed reduction, almost always necessitating procedural sedation. Intravenous ketamine is the most commonly used sedative agent. However, intravenous insertion is painful and can be technically difficult in children. We hypothesise that a combination of intranasal dexmedetomidine plus intranasal ketamine (Ketodex) will be non-inferior to intravenous ketamine for effective sedation in children undergoing a closed reduction.Methods and analysisThis is a six-centre, four-arm, adaptive, randomised, blinded, controlled, non-inferiority trial. We will include children 4–17 years with a simple upper limb fracture or dislocation that requires sedation for a closed reduction. Participants will be randomised to receive either intranasal Ketodex (one of three dexmedetomidine and ketamine combinations) or intravenous ketamine. The primary outcome is adequate sedation as measured using the Paediatric Sedation State Scale. Secondary outcomes include length of stay, time to wakening and adverse effects. The results of both per protocol and intention-to-treat analyses will be reported for the primary outcome. All inferential analyses will be undertaken using a response-adaptive Bayesian design. Logistic regression will be used to model the dose–response relationship for the combinations of intranasal Ketodex. Using the Average Length Criterion for Bayesian sample size estimation, a survey-informed non-inferiority margin of 17.8% and priors from historical data, a sample size of 410 participants will be required. Simulations estimate a type II error rate of 0.08 and a type I error rate of 0.047.Ethics and disseminationEthics approval was obtained from Clinical Trials Ontario for London Health Sciences Centre and McMaster Research Ethics Board. Other sites have yet to receive approval from their institutions. Informed consent will be obtained from guardians of all participants in addition to assent from participants. Study data will be submitted for publication regardless of results.Trial registration numberNCT0419525.


Methodology ◽  
2008 ◽  
Vol 4 (4) ◽  
pp. 159-167 ◽  
Author(s):  
Donna L. Coffman

This study investigated the degree to which violation of the parameter drift assumption affects the Type I error rate for the test of close fit and the power analysis procedures proposed by MacCallum et al. (1996) for both the test of close fit and the test of exact fit. The parameter drift assumption states that as sample size increases both sampling error and model error (i.e., the degree to which the model is an approximation in the population) decrease. Model error was introduced using a procedure proposed by Cudeck and Browne (1992). The empirical power for both the test of close fit, in which the null hypothesis specifies that the root mean square error of approximation (RMSEA) ≤ 0.05, and the test of exact fit, in which the null hypothesis specifies that RMSEA = 0, is compared with the theoretical power computed using the MacCallum et al. (1996) procedure. The empirical power and the theoretical power for both the test of close fit and the test of exact fit are nearly identical under violations of the assumption. The results also indicated that the test of close fit maintains the nominal Type I error rate under violations of the assumption.


Biometrika ◽  
2019 ◽  
Vol 106 (2) ◽  
pp. 353-367 ◽  
Author(s):  
B Karmakar ◽  
B French ◽  
D S Small

Summary A sensitivity analysis for an observational study assesses how much bias, due to nonrandom assignment of treatment, would be necessary to change the conclusions of an analysis that assumes treatment assignment was effectively random. The evidence for a treatment effect can be strengthened if two different analyses, which could be affected by different types of biases, are both somewhat insensitive to bias. The finding from the observational study is then said to be replicated. Evidence factors allow for two independent analyses to be constructed from the same dataset. When combining the evidence factors, the Type I error rate must be controlled to obtain valid inference. A powerful method is developed for controlling the familywise error rate for sensitivity analyses with evidence factors. It is shown that the Bahadur efficiency of sensitivity analysis for the combined evidence is greater than for either evidence factor alone. The proposed methods are illustrated through a study of the effect of radiation exposure on the risk of cancer. An R package, evidenceFactors, is available from CRAN to implement the methods of the paper.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S169-S170
Author(s):  
Klaus Hauer ◽  
Patrick Heldmann ◽  
Christian Werner

Abstract Selecting appropriate outcome measures for multimorbid, acutely hospitalized geriatric patients poses specific challenges, which may have caused inconsistent findings of previous intervention trials on early inpatient rehabilitation. The objective of this review was to describe primary outcome measures used in randomized controlled trials (RCTs) on early rehabilitation in older hospital patients, to analyze their matching to intervention programs, and to evaluate the effects of matching on the main findings of these RCTs. A systematic literature search was conducted in PubMed, Cochrane CENTRAL, CINAHL, and PEDro databases. Inclusion criteria were: RCT, patients aged ≥ 65 years, admission to hospital, physical exercise intervention, and primary outcome measure during hospitalization. Two independent reviewers extracted the data, assessed the methodological quality, and analyzed the matching of primary outcome measures to the intervention, study sample, and setting. Main study findings were related to the results of the matching procedure. In 28 included articles, 33 different primary outcome measures were identified, which we grouped into six categories: functional status, mobility status, hospital outcomes, adverse clinical events, psychological status, and cognitive functioning. Outcome measures differed considerably within each category showing a large heterogeneity in their matching to the intervention, study sample, and setting. Outcome measures that specifically matched the intervention contents were more likely to document intervention-induced benefits. Mobility instruments seemed to be the most sensitive outcome measures to reveal such benefits. High specificity (optimized match) of outcome measures and intervention contents is a key factor to reveal benefits of early rehabilitation in acutely hospitalized geriatric patients.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Ertugrul Colak ◽  
Hulya Ozen ◽  
Busra Emir ◽  
Setenay Oner

The aim of this study is to propose a new pairwise multiple comparison adjustment procedure based on Genz’s numerical computation of probabilities from a multivariate normal distribution. This method is applied to the results of two-sample log-rank and weighted log-rank statistics where the survival data contained right-censored observations. We conducted Monte Carlo simulation studies not only to evaluate the familywise error rate and power of the proposed procedure but also to compare the procedure with conventional methods. The proposed method is also applied to the data set consisting of 815 patients on a liver transplant waiting list from 1990 to 1999. It was found that the proposed method can control the type I error rate, and it yielded similar power as Tukey’s and high power with respect to the other adjustment procedures. In addition to having a straightforward formula, it is easy to implement.


2017 ◽  
Vol 21 (3) ◽  
pp. 269-275 ◽  
Author(s):  
Mark Rubin

Several researchers have recently argued that p values lose their meaning in exploratory analyses due to an unknown inflation of the alpha level (e.g., Nosek & Lakens, 2014 ; Wagenmakers, 2016 ). For this argument to be tenable, the familywise error rate must be defined in relation to the number of hypotheses that are tested in the same study or article. Under this conceptualization, the familywise error rate is usually unknowable in exploratory analyses because it is usually unclear how many hypotheses have been tested on a spontaneous basis and then omitted from the final research report. In the present article, I argue that it is inappropriate to conceptualize the familywise error rate in relation to the number of hypotheses that are tested. Instead, it is more appropriate to conceptualize familywise error in relation to the number of different tests that are conducted on the same null hypothesis in the same study. Under this conceptualization, alpha-level adjustments in exploratory analyses are (a) less necessary and (b) objectively verifiable. As a result, p values do not lose their meaning in exploratory analyses.


Biometrika ◽  
2019 ◽  
Vol 106 (4) ◽  
pp. 929-940
Author(s):  
J Pouget-Abadie ◽  
G Saint-Jacques ◽  
M Saveski ◽  
W Duan ◽  
S Ghosh ◽  
...  

Summary Experimentation platforms are essential to large modern technology companies, as they are used to carry out many randomized experiments daily. The classic assumption of no interference among users, under which the outcome for one user does not depend on the treatment assigned to other users, is rarely tenable on such platforms. Here, we introduce an experimental design strategy for testing whether this assumption holds. Our approach is in the spirit of the Durbin–Wu–Hausman test for endogeneity in econometrics, where multiple estimators return the same estimate if and only if the null hypothesis holds. The design that we introduce makes no assumptions on the interference model between units, nor on the network among the units, and has a sharp bound on the variance and an implied analytical bound on the Type I error rate. We discuss how to apply the proposed design strategy to large experimentation platforms, and we illustrate it in the context of an experiment on the LinkedIn platform.


1997 ◽  
Vol 22 (3) ◽  
pp. 291-307 ◽  
Author(s):  
Gregory R. Hancock ◽  
Alan J. Klockars

When testing a family of comparisons or contrasts across k treatment groups, researchers are often encouraged to maintain control over the familywise Type I error rate. For common families such as comparisons against a reference group, sets of orthogonal and/or nonorthogonal contrasts, and all possible pairwise comparisons, numerous simultaneous (and more recently sequential) testing methods have been proposed. Many of the simultaneous methods can be shown to be a form of Krishnaiah’s (e.g., 1979) finite intersection test (FIT) for simultaneous multiple comparisons, which controls the familywise error rate to precisely a under conditions assumed in standard ANOVA scenarios. Other methods, however, merely represent conservative approximations to a FIT procedure, yielding suboptimal power for conducting simultaneous testing. The purpose of the current article is threefold. First, we discuss how FIT methodology represents a paradigm that unifies many existing methods for simultaneous inference, as well as how it suggests an improved method for testing nonorthogonal contrasts. Second, we illustrate more powerful multiple comparison procedures that combine FIT methodology with sequential hypothesis testing strategies. Third, we present a simple simulation strategy for generating critical values necessary to conduct these more powerful FIT-based methods. Examples of these methods are given.


2019 ◽  
Vol 16 (2) ◽  
pp. 132-141 ◽  
Author(s):  
Alexandra Blenkinsop ◽  
Mahesh KB Parmar ◽  
Babak Choodari-Oskooei

Background The multi-arm multi-stage framework uses intermediate outcomes to assess lack-of-benefit of research arms at interim stages in randomised trials with time-to-event outcomes. However, the design lacks formal methods to evaluate early evidence of overwhelming efficacy on the definitive outcome measure. We explore the operating characteristics of this extension to the multi-arm multi-stage design and how to control the pairwise and familywise type I error rate. Using real examples and the updated nstage program, we demonstrate how such a design can be developed in practice. Methods We used the Dunnett approach for assessing treatment arms when conducting comprehensive simulation studies to evaluate the familywise error rate, with and without interim efficacy looks on the definitive outcome measure, at the same time as the planned lack-of-benefit interim analyses on the intermediate outcome measure. We studied the effect of the timing of interim analyses, allocation ratio, lack-of-benefit boundaries, efficacy rule, number of stages and research arms on the operating characteristics of the design when efficacy stopping boundaries are incorporated. Methods for controlling the familywise error rate with efficacy looks were also addressed. Results Incorporating Haybittle–Peto stopping boundaries on the definitive outcome at the interim analyses will not inflate the familywise error rate in a multi-arm design with two stages. However, this rule is conservative; in general, more liberal stopping boundaries can be used with minimal impact on the familywise error rate. Efficacy bounds in trials with three or more stages using an intermediate outcome may inflate the familywise error rate, but we show how to maintain strong control. Conclusion The multi-arm multi-stage design allows stopping for both lack-of-benefit on the intermediate outcome and efficacy on the definitive outcome at the interim stages. We provide guidelines on how to control the familywise error rate when efficacy boundaries are implemented in practice.


Sign in / Sign up

Export Citation Format

Share Document