Constructing intervals for the intracluster correlation coefficient using Bayesian modelling, and application in cluster randomized trials

Background/Aims Generalized estimating equations are commonly used to fit logistic regression models to clustered binary data from cluster randomized trials. A commonly used correlation structure assumes that the intracluster correlation coefficient does not vary by treatment arm or other covariates, but the consequences of this assumption are understudied. We aim to evaluate the effect of allowing variation of the intracluster correlation coefficient by treatment or other covariates on the efficiency of analysis and show how to account for such variation in sample size calculations. Methods We develop formulae for the asymptotic variance of the estimated difference in outcome between treatment arms obtained when the true exchangeable correlation structure depends on the treatment arm and the working correlation structure used in the generalized estimating equations analysis is: (i) correctly specified, (ii) independent, or (iii) exchangeable with no dependence on treatment arm. These formulae require a known distribution of cluster sizes; we also develop simplifications for the case when cluster sizes do not vary and approximations that can be used when the first two moments of the cluster size distribution are known. We then extend the results to settings with adjustment for a second binary cluster-level covariate. We provide formulae to calculate the required sample size for cluster randomized trials using these variances. Results We show that the asymptotic variance of the estimated difference in outcome between treatment arms using these three working correlation structures is the same if all clusters have the same size, and this asymptotic variance is approximately the same when intracluster correlation coefficient values are small. We illustrate these results using data from a recent cluster randomized trial for infectious disease prevention in which the clusters are groups of households and modest in size (mean 9.6 individuals), with intracluster correlation coefficient values of 0.078 in the control arm and 0.057 in an intervention arm. In this application, we found a negligible difference between the variances calculated using structures (i) and (iii) and only a small increase (typically [Formula: see text]) for the independent correlation structure (ii), and hence minimal effect on power or sample size requirements. The impact may be larger in other applications if there is greater variation in the ICC between treatment arms or with an additional covariate. Conclusion The common approach of fitting generalized estimating equations with an exchangeable working correlation structure with a common intracluster correlation coefficient across arms likely does not substantially reduce the power or efficiency of the analysis in the setting of a large number of small or modest-sized clusters, even if the intracluster correlation coefficient varies by treatment arm. Our formulae, however, allow formal evaluation of this and may identify situations in which variation in intracluster correlation coefficient by treatment arm or another binary covariate may have a more substantial impact on power and hence sample size requirements.

Download Full-text

A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials

Statistics in Medicine ◽

10.1002/sim.1330 ◽

2002 ◽

Vol 21 (24) ◽

pp. 3757-3774 ◽

Cited By ~ 61

Author(s):

Obioha C. Ukoumunne

Keyword(s):

Confidence Interval ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Randomized Trials ◽

Intraclass Correlation ◽

Interval Methods ◽

Cluster Randomized Trials ◽

Cluster Randomized

Download Full-text

A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE–type marginal model for binary outcomes

Clinical Trials ◽

10.1177/1740774518803635 ◽

2018 ◽

Vol 16 (1) ◽

pp. 41-51 ◽

Cited By ~ 1

Author(s):

Philip M Westgate

Keyword(s):

Correlation Coefficient ◽

Randomized Trials ◽

Estimating Equation ◽

Generalized Estimating Equation ◽

Cluster Randomized Trials ◽

Coefficient Estimation ◽

Pseudo Likelihood ◽

Cluster Randomized ◽

Cluster Correlation ◽

Generalized Estimating

Background/aims Cluster randomized trials are popular in health-related research due to the need or desire to randomize clusters of subjects to different trial arms as opposed to randomizing each subject individually. As outcomes from subjects within the same cluster tend to be more alike than outcomes from subjects within other clusters, an exchangeable correlation arises that is measured via the intra-cluster correlation coefficient. Intra-cluster correlation coefficient estimation is especially important due to the increasing awareness of the need to publish such values from studies in order to help guide the design of future cluster randomized trials. Therefore, numerous methods have been proposed to accurately estimate the intra-cluster correlation coefficient, with much attention given to binary outcomes. As marginal models are often of interest, we focus on intra-cluster correlation coefficient estimation in the context of fitting such a model with binary outcomes using generalized estimating equations. Traditionally, intra-cluster correlation coefficient estimation with generalized estimating equations has been based on the method of moments, although such estimators can be negatively biased. Furthermore, alternative estimators that work well, such as the analysis of variance estimator, are not as readily applicable in the context of practical data analyses with generalized estimating equations. Therefore, in this article we assess, in terms of bias, the readily available residual pseudo-likelihood approach to intra-cluster correlation coefficient estimation with the GLIMMIX procedure of SAS (SAS Institute, Cary, NC). Furthermore, we study a possible corresponding approach to confidence interval construction for the intra-cluster correlation coefficient. Methods We utilize a simulation study and application example to assess bias in intra-cluster correlation coefficient estimates obtained from GLIMMIX using residual pseudo-likelihood. This estimator is contrasted with method of moments and analysis of variance estimators which are standards of comparison. The approach to confidence interval construction is assessed by examining coverage probabilities. Results Overall, the residual pseudo-likelihood estimator performs very well. It has considerably less bias than moment estimators, which are its competitor for general generalized estimating equation–based analyses, and therefore, it is a major improvement in practice. Furthermore, it works almost as well as analysis of variance estimators when they are applicable. Confidence intervals have near-nominal coverage when the intra-cluster correlation coefficient estimate has negligible bias. Conclusion Our results show that the residual pseudo-likelihood estimator is a good option for intra-cluster correlation coefficient estimation when conducting a generalized estimating equation–based analysis of binary outcome data arising from cluster randomized trials. The estimator is practical in that it is simply a result from fitting a marginal model with GLIMMIX, and a confidence interval can be easily obtained. An additional advantage is that, unlike most other options for performing generalized estimating equation–based analyses, GLIMMIX provides analysts the option to utilize small-sample adjustments that ensure valid inference.

Download Full-text

The Intra-Cluster Correlation Coefficient in Cluster Randomized Trials: A Review of Definitions

International Statistical Review ◽

10.1111/j.1751-5823.2009.00092.x ◽

2009 ◽

Vol 77 (3) ◽

pp. 378-394 ◽

Cited By ~ 79

Author(s):

Sandra M. Eldridge ◽

Obioha C. Ukoumunne ◽

John B. Carlin

Keyword(s):

Correlation Coefficient ◽

Randomized Trials ◽

Cluster Randomized Trials ◽

Cluster Randomized ◽

Cluster Correlation

Download Full-text

Estimates of intraclass correlation coefficient and design effect for surveys and cluster randomized trials on injection use in Pakistan and developing countries

Tropical Medicine & International Health ◽

10.1111/j.1365-3156.2006.01736.x ◽

2006 ◽

Vol 11 (12) ◽

pp. 1832-1840 ◽

Cited By ~ 9

Author(s):

Naveed Zafar Janjua ◽

Mohammad Imran Khan ◽

John D. Clemens

Keyword(s):

Developing Countries ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Randomized Trials ◽

Intraclass Correlation ◽

Cluster Randomized Trials ◽

Design Effect ◽

Injection Use ◽

Cluster Randomized

Download Full-text

A Review of Assumed and Reported Intracluster Correlations in Cluster Randomized Trials

10.21203/rs.2.12500/v1 ◽

2019 ◽

Author(s):

Xiaoran Han ◽

Jiaye Lin ◽

Jinjing Xu ◽

Maggie Wang ◽

Benny Zee ◽

...

Keyword(s):

Sample Size ◽

Randomized Trials ◽

Intraclass Correlation ◽

Cluster Effect ◽

Size Estimation ◽

Cluster Randomized Trials ◽

Intracluster Correlation ◽

Eligibility Criteria ◽

Sample Size Planning ◽

Cluster Randomized

Abstract Background Cluster randomized trials (CRTs) are widely adopted in health and primary care research. However, the cluster effect needs to be taken into account appropriately in the design and analysis of CRTs. The objectives of this study were (i) to review the reporting of intracluster correlations in CRTs; and (ii) to evaluate whether the assumed intracluster correlation measures in sample size planning are consistent with those obtained in the analysis. Methods The Aggregate Analysis of ClinicalTrials.gov database was searched to identify CRTs registered between January 1, 2004 and March 27, 2016. The selected CRTs with accessible publications were screened according to eligibility criteria. Results Of the 281 CRTs identified, the percentage of studies accounting for cluster effect increased annually. A total of 183 studies accounted for clustering in sample size estimation, among them 43% of CRTs adopted the intraclass correlation coefficient (ICC) but the exact estimated value of ICC was provided in only 26% of the included studies. In different intervention types, there were no statistically significant differences between the assumed and reported values of ICC (all p-values >0.05). Conclusion Although the difference between the values of ICC assumed in sample size planning and that reported in the analysis was not statistically significant, deficiencies in CRTs are still common, such as low rates of considering cluster effect in sample size and reporting intracluster correlation estimates. We also suggest that researchers ought to be familiar with the properties of statistical approaches to improve the analysis of CRTs. Thus, more recommendations and guidelines such as the CONSORT statement for CRTs should be suggested to researchers.

Download Full-text