scholarly journals Simple compared to covariate-constrained randomization methods in balancing baseline characteristics: a case study of randomly allocating 72 hemodialysis centers in a cluster trial

Trials ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ahmed A. Al-Jaishi ◽  
Stephanie N. Dixon ◽  
Eric McArthur ◽  
P. J. Devereaux ◽  
Lehana Thabane ◽  
...  

Abstract Background and aim Some parallel-group cluster-randomized trials use covariate-constrained rather than simple randomization. This is done to increase the chance of balancing the groups on cluster- and patient-level baseline characteristics. This study assessed how well two covariate-constrained randomization methods balanced baseline characteristics compared with simple randomization. Methods We conducted a mock 3-year cluster-randomized trial, with no active intervention, that started April 1, 2014, and ended March 31, 2017. We included a total of 11,832 patients from 72 hemodialysis centers (clusters) in Ontario, Canada. We randomly allocated the 72 clusters into two groups in a 1:1 ratio on a single date using individual- and cluster-level data available until April 1, 2013. Initially, we generated 1000 allocation schemes using simple randomization. Then, as an alternative, we performed covariate-constrained randomization based on historical data from these centers. In one analysis, we restricted on a set of 11 individual-level prognostic variables; in the other, we restricted on principal components generated using 29 baseline historical variables. We created 300,000 different allocations for the covariate-constrained randomizations, and we restricted our analysis to the 30,000 best allocations based on the smallest sum of the penalized standardized differences. We then randomly sampled 1000 schemes from the 30,000 best allocations. We summarized our results with each randomization approach as the median (25th and 75th percentile) number of balanced baseline characteristics. There were 156 baseline characteristics, and a variable was balanced when the between-group standardized difference was ≤ 10%. Results The three randomization techniques had at least 125 of 156 balanced baseline characteristics in 90% of sampled allocations. The median number of balanced baseline characteristics using simple randomization was 147 (142, 150). The corresponding value for covariate-constrained randomization using 11 prognostic characteristics was 149 (146, 151), while for principal components, the value was 150 (147, 151). Conclusion In this setting with 72 clusters, constraining the randomization using historical information achieved better balance on baseline characteristics compared with simple randomization; however, the magnitude of benefit was modest.

2021 ◽  
Author(s):  
Ahmed A Al-Jaishi ◽  
Stephanie N Dixon ◽  
Eric McArthur ◽  
PJ Devereaux ◽  
Lehana Thabane ◽  
...  

Abstract Background and aim: Some parallel-group cluster randomized trials use covariate-constrained rather than simple randomization. This is done to increase the chance of balancing the groups on cluster- and patient-level baseline characteristics. This study assessed how well two covariate-constrained randomization methods balanced baseline characteristics compared with simple randomization. Methods: We conducted a mock three-year cluster randomized trial, with no active intervention, that started April 1st, 2014, and ended March 31st, 2017. We included a total of 11,832 patients from 72 hemodialysis centers (clusters) in Ontario, Canada. We randomly allocated the 72 clusters into two groups in a 1:1 ratio on a single date using individual- and cluster-level data available up to April 1st, 2013. Initially, we generated 1,000 allocation schemes using simple randomization. Then, as an alternative, we performed covariate-constrained randomization based on historical data from these centers. In one analysis, we restricted on a set of 11 individual-level prognostic variables; in the other, we restricted on principal components generated using 29 baseline historical variables. We created 300,000 different allocations for the covariate-constrained randomizations, and we restricted our analysis to the 30,000 best allocations. We then randomly sampled 1,000 schemes from the 30,000 best allocations. We summarized our results with each randomization approach as the median (25th, 75th percentile) number of balanced baseline characteristics. There were 156 baseline characteristics, and a variable was balanced when the between-group standardized difference was ≤ 10%. Results: The three randomization techniques had at least 125 of 156 balanced baseline characteristics in 90% of sampled allocations. The median number of balanced baseline characteristics using simple randomization was 147 (142, 150). The corresponding value for covariate-constrained randomization using 11 prognostic characteristics was 149 (146, 151), while for principal components, the value was 150 (147, 151). The median number of balanced baseline characteristics using the two covariate-constrained randomizations were statistically different from simple randomization (p-value < 0.0001). Conclusion: In this setting with 72 clusters, constraining the randomization using historical information achieved better balance on baseline characteristics compared with simple randomization; however, the magnitude of benefit was modest.


Biostatistics ◽  
2020 ◽  
Author(s):  
Dustin J Rabideau ◽  
Rui Wang

Summary In a cluster randomized trial (CRT), groups of people are randomly assigned to different interventions. Existing parametric and semiparametric methods for CRTs rely on distributional assumptions or a large number of clusters to maintain nominal confidence interval (CI) coverage. Randomization-based inference is an alternative approach that is distribution-free and does not require a large number of clusters to be valid. Although it is well-known that a CI can be obtained by inverting a randomization test, this requires testing a non-zero null hypothesis, which is challenging with non-continuous and survival outcomes. In this article, we propose a general method for randomization-based CIs using individual-level data from a CRT. This approach accommodates various outcome types, can account for design features such as matching or stratification, and employs a computationally efficient algorithm. We evaluate this method’s performance through simulations and apply it to the Botswana Combination Prevention Project, a large HIV prevention trial with an interval-censored time-to-event outcome.


2021 ◽  
pp. 174077452110285
Author(s):  
Conner L Jackson ◽  
Kathryn Colborn ◽  
Dexiang Gao ◽  
Sangeeta Rao ◽  
Hannah C Slater ◽  
...  

Background: Cluster-randomized trials allow for the evaluation of a community-level or group-/cluster-level intervention. For studies that require a cluster-randomized trial design to evaluate cluster-level interventions aimed at controlling vector-borne diseases, it may be difficult to assess a large number of clusters while performing the additional work needed to monitor participants, vectors, and environmental factors associated with the disease. One such example of a cluster-randomized trial with few clusters was the “efficacy and risk of harms of repeated ivermectin mass drug administrations for control of malaria” trial. Although previous work has provided recommendations for analyzing trials like repeated ivermectin mass drug administrations for control of malaria, additional evaluation of the multiple approaches for analysis is needed for study designs with count outcomes. Methods: Using a simulation study, we applied three analysis frameworks to three cluster-randomized trial designs (single-year, 2-year parallel, and 2-year crossover) in the context of a 2-year parallel follow-up of repeated ivermectin mass drug administrations for control of malaria. Mixed-effects models, generalized estimating equations, and cluster-level analyses were evaluated. Additional 2-year parallel designs with different numbers of clusters and different cluster correlations were also explored. Results: Mixed-effects models with a small sample correction and unweighted cluster-level summaries yielded both high power and control of the Type I error rate. Generalized estimating equation approaches that utilized small sample corrections controlled the Type I error rate but did not confer greater power when compared to a mixed model approach with small sample correction. The crossover design generally yielded higher power relative to the parallel equivalent. Differences in power between analysis methods became less pronounced as the number of clusters increased. The strength of within-cluster correlation impacted the relative differences in power. Conclusion: Regardless of study design, cluster-level analyses as well as individual-level analyses like mixed-effects models or generalized estimating equations with small sample size corrections can both provide reliable results in small cluster settings. For 2-year parallel follow-up of repeated ivermectin mass drug administrations for control of malaria, we recommend a mixed-effects model with a pseudo-likelihood approximation method and Kenward–Roger correction. Similarly designed studies with small sample sizes and count outcomes should consider adjustments for small sample sizes when using a mixed-effects model or generalized estimating equation for analysis. Although the 2-year parallel follow-up of repeated ivermectin mass drug administrations for control of malaria is already underway as a parallel trial, applying the simulation parameters to a crossover design yielded improved power, suggesting that crossover designs may be valuable in settings where the number of available clusters is limited. Finally, the sensitivity of the analysis approach to the strength of within-cluster correlation should be carefully considered when selecting the primary analysis for a cluster-randomized trial.


2017 ◽  
Author(s):  
Ronald de Vlaming ◽  
Magnus Johannesson ◽  
Patrik K.E. Magnusson ◽  
M. Arfan Ikram ◽  
Peter M. Visscher

AbstractLD-score (LDSC) regression disentangles the contribution of polygenic signal, in terms of SNP-based heritability, and population stratification, in terms of a so-called intercept, to GWAS test statistics. Whereas LDSC regression uses summary statistics, methods like Haseman-Elston (HE) regression and genomic-relatedness-matrix (GRM) restricted maximum likelihood infer parameters such as SNP-based heritability from individual-level data directly. Therefore, these two types of methods are typically considered to be profoundly different. Nevertheless, recent work has revealed that LDSC and HE regression yield near-identical SNP-based heritability estimates when confounding stratification is absent. We now extend the equivalence; under the stratification assumed by LDSC regression, we show that the intercept can be estimated from individual-level data by transforming the coefficients of a regression of the phenotype on the leading principal components from the GRM. Using simulations, considering various degrees and forms of population stratification, we find that intercept estimates obtained from individual-level data are nearly equivalent to estimates from LDSC regression (R2> 99%). An empirical application corroborates these findings. Hence, LDSC regression is not profoundly different from methods using individual-level data; parameters that are identified by LDSC regression are also identified by methods using individual-level data. In addition, our results indicate that, under strong stratification, there is misattribution of stratification to the slope of LDSC regression, inflating estimates of SNP-based heritability from LDSC regression ceteris paribus. Hence, the intercept is not a panacea for population stratification. Consequently, LDSC-regression estimates should be interpreted with caution, especially when the intercept estimate is significantly greater than one.


2020 ◽  
Vol 42 (3) ◽  
pp. 354-374
Author(s):  
Jessaca Spybrook ◽  
Qi Zhang ◽  
Ben Kelcey ◽  
Nianbo Dong

Over the past 15 years, we have seen an increase in the use of cluster randomized trials (CRTs) to test the efficacy of educational interventions. These studies are often designed with the goal of determining whether a program works, or answering the what works question. Recently, the goals of these studies expanded to include for whom and under what conditions an intervention is effective. In this study, we examine the capacity of a set of CRTs to provide rigorous evidence about for whom and under what conditions an intervention is effective. The findings suggest that studies are more likely to be designed with the capacity to detect potentially meaningful individual-level moderator effects, for example, gender, than cluster-level moderator effects, for example, school type.


2019 ◽  
Vol 3 (1) ◽  
pp. 81-93 ◽  
Author(s):  
Blakeley B. McShane ◽  
Ulf Böckenholt

Meta-analysis typically involves the analysis of summary data (e.g., means, standard deviations, and sample sizes) from a set of studies via a statistical model that is a special case of a hierarchical (or multilevel) model. Unfortunately, the common summary-data approach to meta-analysis used in psychological research is often employed in settings where the complexity of the data warrants alternative approaches. In this article, we propose a thought experiment that can lead meta-analysts to move away from the common summary-data approach to meta-analysis and toward richer and more appropriate summary-data approaches when the complexity of the data warrants it. Specifically, we propose that it can be extremely fruitful for meta-analysts to act as if they possess the individual-level data from the studies and consider what model specifications they might fit even when they possess only summary data. This thought experiment is justified because (a) the analysis of the individual-level data from the studies via a hierarchical model is considered the “gold standard” for meta-analysis and (b) for a wide variety of cases common in meta-analysis, the summary-data and individual-level-data approaches are, by a principle known as statistical sufficiency, equivalent when the underlying models are appropriately specified. We illustrate the value of our thought experiment via a case study that evolves across five parts that cover a wide variety of data settings common in meta-analysis.


Author(s):  
Stephanie Hazel ◽  
Angela Detlev

Faculty often ask what they can learn about their students before the semesterbegins so that they can plan instruction that will better engage students inlearning. While most individual-level data are protected, Mason makes availablea great deal of information through student-level data, statistical profiles, theCommon Data Set, student surveys, and learning outcomes assessment reports.We will guide participants through a brief case study and discuss the implications,limitations, and inferences that can be reasonably drawn from institutional data.


Sign in / Sign up

Export Citation Format

Share Document