Power and sample size calculations for cluster randomized trials with binary outcomes when intracluster correlation coefficients vary by treatment arm

Background/Aims Generalized estimating equations are commonly used to fit logistic regression models to clustered binary data from cluster randomized trials. A commonly used correlation structure assumes that the intracluster correlation coefficient does not vary by treatment arm or other covariates, but the consequences of this assumption are understudied. We aim to evaluate the effect of allowing variation of the intracluster correlation coefficient by treatment or other covariates on the efficiency of analysis and show how to account for such variation in sample size calculations. Methods We develop formulae for the asymptotic variance of the estimated difference in outcome between treatment arms obtained when the true exchangeable correlation structure depends on the treatment arm and the working correlation structure used in the generalized estimating equations analysis is: (i) correctly specified, (ii) independent, or (iii) exchangeable with no dependence on treatment arm. These formulae require a known distribution of cluster sizes; we also develop simplifications for the case when cluster sizes do not vary and approximations that can be used when the first two moments of the cluster size distribution are known. We then extend the results to settings with adjustment for a second binary cluster-level covariate. We provide formulae to calculate the required sample size for cluster randomized trials using these variances. Results We show that the asymptotic variance of the estimated difference in outcome between treatment arms using these three working correlation structures is the same if all clusters have the same size, and this asymptotic variance is approximately the same when intracluster correlation coefficient values are small. We illustrate these results using data from a recent cluster randomized trial for infectious disease prevention in which the clusters are groups of households and modest in size (mean 9.6 individuals), with intracluster correlation coefficient values of 0.078 in the control arm and 0.057 in an intervention arm. In this application, we found a negligible difference between the variances calculated using structures (i) and (iii) and only a small increase (typically [Formula: see text]) for the independent correlation structure (ii), and hence minimal effect on power or sample size requirements. The impact may be larger in other applications if there is greater variation in the ICC between treatment arms or with an additional covariate. Conclusion The common approach of fitting generalized estimating equations with an exchangeable working correlation structure with a common intracluster correlation coefficient across arms likely does not substantially reduce the power or efficiency of the analysis in the setting of a large number of small or modest-sized clusters, even if the intracluster correlation coefficient varies by treatment arm. Our formulae, however, allow formal evaluation of this and may identify situations in which variation in intracluster correlation coefficient by treatment arm or another binary covariate may have a more substantial impact on power and hence sample size requirements.

Download Full-text

Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research

Clinical Trials ◽

10.1191/1740774505cn071oa ◽

2005 ◽

Vol 2 (2) ◽

pp. 99-107 ◽

Cited By ~ 137

Author(s):

Marion K Campbell ◽

Peter M Fayers ◽

Jeremy M Grimshaw

Keyword(s):

Correlation Coefficient ◽

Implementation Research ◽

Randomized Trials ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Intracluster Correlation Coefficient ◽

Cluster Randomized

Download Full-text

Prior distributions for the intracluster correlation coefficient, based on multiple previous estimates, and their application in cluster randomized trials

Clinical Trials ◽

10.1191/1740774505cn072oa ◽

2005 ◽

Vol 2 (2) ◽

pp. 108-118 ◽

Cited By ~ 36

Author(s):

Rebecca M Turner ◽

Simon G Thompson ◽

David J Spiegelhalter

Keyword(s):

Correlation Coefficient ◽

Randomized Trials ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Prior Distributions ◽

Intracluster Correlation Coefficient ◽

Cluster Randomized

Download Full-text

Effect of imbalance and intracluster correlation coefficient in cluster randomized trials with binary outcomes

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2008.09.007 ◽

2009 ◽

Vol 53 (3) ◽

pp. 596-602 ◽

Cited By ~ 4

Author(s):

Chul Ahn ◽

Fan Hu ◽

Celette Sugg Skinner

Keyword(s):

Correlation Coefficient ◽

Randomized Trials ◽

Binary Outcomes ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Intracluster Correlation Coefficient ◽

Cluster Randomized

Download Full-text

Allowing for imprecision of the intracluster correlation coefficient in the design of cluster randomized trials

Statistics in Medicine ◽

10.1002/sim.1721 ◽

2004 ◽

Vol 23 (8) ◽

pp. 1195-1214 ◽

Cited By ~ 38

Author(s):

Rebecca M. Turner ◽

A. Toby Prevost ◽

Simon G. Thompson

Keyword(s):

Correlation Coefficient ◽

Randomized Trials ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Intracluster Correlation Coefficient ◽

Cluster Randomized

Download Full-text

Constructing intervals for the intracluster correlation coefficient using Bayesian modelling, and application in cluster randomized trials

Statistics in Medicine ◽

10.1002/sim.2304 ◽

2006 ◽

Vol 25 (9) ◽

pp. 1443-1456 ◽

Cited By ~ 17

Author(s):

Rebecca M. Turner ◽

Rumana Z. Omar ◽

Simon G. Thompson

Keyword(s):

Correlation Coefficient ◽

Randomized Trials ◽

Bayesian Modelling ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Intracluster Correlation Coefficient ◽

Cluster Randomized

Download Full-text

Sample size estimation for modified Poisson analysis of cluster randomized trials with a binary outcome

Statistical Methods in Medical Research ◽

10.1177/0962280221990415 ◽

2021 ◽

pp. 096228022199041

Author(s):

Fan Li ◽

Guangyu Tong

Keyword(s):

Relative Risk ◽

Sample Size ◽

Poisson Regression ◽

Cluster Size ◽

Randomized Trials ◽

Cluster Randomized Trials ◽

Binomial Regression ◽

Size Variability ◽

Working Correlation ◽

Cluster Randomized

The modified Poisson regression coupled with a robust sandwich variance has become a viable alternative to log-binomial regression for estimating the marginal relative risk in cluster randomized trials. However, a corresponding sample size formula for relative risk regression via the modified Poisson model is currently not available for cluster randomized trials. Through analytical derivations, we show that there is no loss of asymptotic efficiency for estimating the marginal relative risk via the modified Poisson regression relative to the log-binomial regression. This finding holds both under the independence working correlation and under the exchangeable working correlation provided a simple modification is used to obtain the consistent intraclass correlation coefficient estimate. Therefore, the sample size formulas developed for log-binomial regression naturally apply to the modified Poisson regression in cluster randomized trials. We further extend the sample size formulas to accommodate variable cluster sizes. An extensive Monte Carlo simulation study is carried out to validate the proposed formulas. We find that the proposed formulas have satisfactory performance across a range of cluster size variability, as long as suitable finite-sample corrections are applied to the sandwich variance estimator and the number of clusters is at least 10. Our findings also suggest that the sample size estimate under the exchangeable working correlation is more robust to cluster size variability, and recommend the use of an exchangeable working correlation over an independence working correlation for both design and analysis. The proposed sample size formulas are illustrated using the Stop Colorectal Cancer (STOP CRC) trial.

Download Full-text

A Review of Assumed and Reported Intracluster Correlations in Cluster Randomized Trials

10.21203/rs.2.12500/v1 ◽

2019 ◽

Author(s):

Xiaoran Han ◽

Jiaye Lin ◽

Jinjing Xu ◽

Maggie Wang ◽

Benny Zee ◽

...

Keyword(s):

Sample Size ◽

Randomized Trials ◽

Intraclass Correlation ◽

Cluster Effect ◽

Size Estimation ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Eligibility Criteria ◽

Sample Size Planning ◽

Cluster Randomized

Abstract Background Cluster randomized trials (CRTs) are widely adopted in health and primary care research. However, the cluster effect needs to be taken into account appropriately in the design and analysis of CRTs. The objectives of this study were (i) to review the reporting of intracluster correlations in CRTs; and (ii) to evaluate whether the assumed intracluster correlation measures in sample size planning are consistent with those obtained in the analysis. Methods The Aggregate Analysis of ClinicalTrials.gov database was searched to identify CRTs registered between January 1, 2004 and March 27, 2016. The selected CRTs with accessible publications were screened according to eligibility criteria. Results Of the 281 CRTs identified, the percentage of studies accounting for cluster effect increased annually. A total of 183 studies accounted for clustering in sample size estimation, among them 43% of CRTs adopted the intraclass correlation coefficient (ICC) but the exact estimated value of ICC was provided in only 26% of the included studies. In different intervention types, there were no statistically significant differences between the assumed and reported values of ICC (all p-values >0.05). Conclusion Although the difference between the values of ICC assumed in sample size planning and that reported in the analysis was not statistically significant, deficiencies in CRTs are still common, such as low rates of considering cluster effect in sample size and reporting intracluster correlation estimates. We also suggest that researchers ought to be familiar with the properties of statistical approaches to improve the analysis of CRTs. Thus, more recommendations and guidelines such as the CONSORT statement for CRTs should be suggested to researchers.

Download Full-text

Intracluster correlation coefficients in the Greater Mekong Subregion for sample size calculations of cluster randomized malaria trials

Malaria Journal ◽

10.1186/s12936-019-3062-x ◽

2019 ◽

Vol 18 (1) ◽

Cited By ~ 1

Author(s):

Pimnara Peerawaranun ◽

Jordi Landier ◽

Francois H. Nosten ◽

Thuy-Nhien Nguyen ◽

Tran Tinh Hien ◽

...

Keyword(s):

Randomized Trial ◽

Sample Size ◽

Randomized Trials ◽

Cluster Randomized Trial ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Variable Approach ◽

Sample Size Calculations ◽

Cluster Randomized ◽

Greater Mekong Subregion

Abstract Background Sample size calculations for cluster randomized trials are a recognized methodological challenge for malaria research in pre-elimination settings. Positively correlated responses from the participants in the same cluster are a key feature in the estimated sample size required for a cluster randomized trial. The degree of correlation is measured by the intracluster correlation coefficient (ICC) where a higher coefficient suggests a closer correlation hence less heterogeneity within clusters but more heterogeneity between clusters. Methods Data on uPCR-detected Plasmodium falciparum and Plasmodium vivax infections from a recent cluster randomized trial which aimed at interrupting malaria transmission through mass drug administrations were used to calculate the ICCs for prevalence and incidence of Plasmodium infections. The trial was conducted in four countries in the Greater Mekong Subregion, Laos, Myanmar, Vietnam and Cambodia. Exact and simulation approaches were used to estimate ICC values for both the prevalence and the incidence of parasitaemia. In addition, the latent variable approach to estimate ICCs for the prevalence was utilized. Results The ICCs for prevalence ranged between 0.001 and 0.082 for all countries. The ICC from the combined 16 villages in the Greater Mekong Subregion were 0.26 and 0.21 for P. falciparum and P. vivax respectively. The ICCs for incidence of parasitaemia ranged between 0.002 and 0.075 for Myanmar, Cambodia and Vietnam. There were very high ICCs for incidence in the range of 0.701 to 0.806 in Laos during follow-up. Conclusion ICC estimates can help researchers when designing malaria cluster randomized trials. A high variability in ICCs and hence sample size requirements between study sites was observed. Realistic sample size estimates for cluster randomized malaria trials in the Greater Mekong Subregion have to assume high between cluster heterogeneity and ICCs. This work focused on uPCR-detected infections; there remains a need to develop more ICC references for trials designed around prevalence and incidence of clinical outcomes. Adequately powered trials are critical to estimate the benefit of interventions to malaria in a reliable and reproducible fashion. Trial registration: ClinicalTrials.govNCT01872702. Registered 7 June 2013. Retrospectively registered. https://clinicaltrials.gov/ct2/show/NCT01872702

Download Full-text

Impact of unequal cluster sizes for GEE analyses of stepped wedge cluster randomized trials with binary outcomes

10.1101/2021.04.07.21255090 ◽

2021 ◽

Author(s):

Zibo Tian ◽

John Preisser ◽

Denise Esserman ◽

Elizabeth Turner ◽

Paul Rathouz ◽

...

Keyword(s):

Sample Size ◽

Randomized Trials ◽

Washington State ◽

Binary Outcomes ◽

Cluster Randomized Trials ◽

Stepped Wedge ◽

Study Planning ◽

Working Correlation ◽

Cluster Randomized ◽

Unequal Cluster

The stepped wedge design is a type of unidirectional crossover design where cluster units switch from control to intervention condition at different pre-specified time points. While a convention in study planning is to assume the cluster-period sizes are identical, stepped wedge cluster randomized trials (SW-CRTs) involving repeated cross-sectional designs frequently have unequal cluster-period sizes, which can impact the efficiency of the treatment effect estimator. In this article, we provide a comprehensive investigation of the efficiency impact of unequal cluster sizes for generalized estimating equation analyses of SW-CRTs, with a focus on binary outcomes as in the Washington State Expedited Partner Therapy trial. Several major distinctions between our work and existing work include: (i) we consider multilevel correlation structures in marginal models with binary outcomes; (ii) we study the implications of both the between-cluster and within-cluster imbalances in sizes; and (iii) we provide a comparison between the independence working correlation versus the true working correlation and detail the consequences of ignoring correlation estimation in SW-CRTs with unequal cluster sizes. We conclude that the working independence assumption can lead to substantial efficiency loss and a large sample size regardless of cluster-period size variability in SW-CRTs, and recommend accounting for correlations in the analysis. To improve study planning, we additionally provide a computationally efficient search algorithm to estimate the sample size in SW-CRTs accounting for unequal cluster-period sizes, and conclude by illustrating the proposed approach in the context of the Washington State study.

Download Full-text