Methods for dealing with unequal cluster sizes in cluster randomized trials: A scoping review

In a cluster-randomized trial (CRT), the number of participants enrolled often varies across clusters. This variation should be considered during both trial design and data analysis to ensure statistical performance goals are achieved. Most methodological literature on the CRT design has assumed equal cluster sizes. This scoping review focuses on methodology for unequal cluster size CRTs. EMBASE, Medline, Google Scholar, MathSciNet and Web of Science databases were searched to identify English-language articles reporting on methodology for unequal cluster size CRTs published until March 2021. We extracted data on the focus of the paper (power calculation, Type I error etc.), the type of CRT, the type and the range of parameter values investigated (number of clusters, mean cluster size, cluster size coefficient of variation, intra-cluster correlation coefficient, etc.), and the main conclusions. Seventy-nine of 5032 identified papers met the inclusion criteria. Papers primarily focused on the parallel-arm CRT (p-CRT, n = 60, 76%) and the stepped-wedge CRT (n = 14, 18%). Roughly 75% of the papers addressed trial design issues (sample size/power calculation) while 25% focused on analysis considerations (Type I error, bias, etc.). The ranges of parameter values explored varied substantially across different studies. Methods for accounting for unequal cluster sizes in the p-CRT have been investigated extensively for Gaussian and binary outcomes. Synthesizing the findings of these works is difficult as the magnitude of impact of the unequal cluster sizes varies substantially across the combinations and ranges of input parameters. Limited investigations have been done for other combinations of a CRT design by outcome type, particularly methodology involving binary outcomes—the most commonly used type of primary outcome in trials. The paucity of methodological papers outside of the p-CRT with Gaussian or binary outcomes highlights the need for further methodological development to fill the gaps.

Download Full-text

On Small-Sample Inference in Multiphase Stepped Wedge Cluster Randomized Trial (MSW-CRT) With Binary Outcomes

10.21203/rs.3.rs-993153/v1 ◽

2021 ◽

Author(s):

Jing Peng ◽

Abigail Shoben ◽

Pengyue Zhang ◽

Philip M. Westgate ◽

Soledad Fernandez

Keyword(s):

Randomized Trial ◽

Type I Error ◽

Event Rate ◽

Cluster Randomized Trial ◽

Small Sample ◽

Binary Outcomes ◽

Type I ◽

Stepped Wedge ◽

Number Of Clusters ◽

Cluster Randomized

Abstract BackgroundThe stepped wedge cluster randomized trial (SW-CRT) design is now preferred for many health- related trials because of its flexibility on resource allocation and clinical ethics concerns. However, as a necessary extension of studying multiple interventions, multiphase stepped wedge designs (MSW-CRT) have not been studied adequately. Since estimated intervention effect from Generalized estimating equations (GEE) has a population-average interpretation, valid inference methods for binary outcomes based on GEE are preferred by public health policy makers.MethodsWe form hypothesis testing of add-on effect of a second treatment based on GEE analysis in an MSW-CRT design with limited number of clusters. Four variance-correction estimators are used to adjust the bias of the sandwich estimator. Simulation studies have been used to compare the statistical power and type I error rate of these methods under different correlation matrices.Results We demonstrate that an average estimator with t(I-3) can stably maintain type I error close to the nominal level with limited sample sizes in our settings. We show that power of testing the add-on effect depends on the baseline event rate, effect sizes of two interventions and the number of clusters. Moreover, by changing the design with including more sequences, power benefit can be achieved. ConclusionsFor designing the MSW-CRT, we suggest using more sequences and checking event rate after initiating the first intervention via interim analysis. When the number of clusters is not very large in MSW-CRTs, inference can be conduct using GEE analysis with an average estimator with t(I-3) sampling distribution.

Download Full-text

Type I Error Control for Cluster Randomized Trials Under Varying Small Sample Structures

10.21203/rs.2.17855/v1 ◽

2019 ◽

Author(s):

Joshua Nugent ◽

Ken Kleinman

Keyword(s):

Cluster Size ◽

Error Control ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Number Of Clusters ◽

Type I Error Rates ◽

Wald Tests ◽

Cluster Randomized

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of interactions of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates have not been well studied. Reviews of published CRTs find that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices. Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronouced when the cluster size is relatively large. Wald tests with the Between-Within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small. Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.

Download Full-text

Type I Error Control for Cluster Randomized Trials Under Varying Small Sample Structures

10.21203/rs.2.17855/v2 ◽

2020 ◽

Author(s):

Joshua Nugent ◽

Ken Kleinman

Keyword(s):

Cluster Size ◽

Error Control ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Number Of Clusters ◽

Type I Error Rates ◽

Wald Tests ◽

Cluster Randomized

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of different combinations of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates has not been well studied. Reviews of published CRTs nd that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices.Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronounced when the cluster size is relatively large. Wald tests with the between-within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small.Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.

Download Full-text

Type I error control for cluster randomized trials under varying small sample structures

BMC Medical Research Methodology ◽

10.1186/s12874-021-01236-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Joshua R. Nugent ◽

Ken P. Kleinman

Keyword(s):

Cluster Size ◽

Error Control ◽

Type I Error ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Number Of Clusters ◽

Type I Error Rates ◽

Wald Tests ◽

Cluster Randomized

Abstract Background Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of different combinations of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates has not been well studied. Reviews of published CRTs find that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices. Methods Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC. Results Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronouced when the cluster size is relatively large. Wald tests with the between-within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small. Conclusions Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.

Download Full-text

Alternative models and randomization techniques for Bayesian response-adaptive randomization with binary outcomes

Clinical Trials ◽

10.1177/17407745211010139 ◽

2021 ◽

pp. 174077452110101

Author(s):

Jennifer Proper ◽

John Connett ◽

Thomas Murray

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Error Rate ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Binary Outcomes ◽

Type I ◽

Operating Characteristics ◽

Type I Error Rate

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.

Download Full-text

Efficient Noninferiority Testing Procedures for Simultaneously Assessing Sensitivity and Specificity of Two Diagnostic Tests

Computational and Mathematical Methods in Medicine ◽

10.1155/2015/128930 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Guogen Shan ◽

Amei Amei ◽

Daniel Young

Keyword(s):

Sensitivity And Specificity ◽

Diagnostic Tests ◽

Error Control ◽

Type I Error ◽

Binary Outcomes ◽

Type I ◽

Test Statistics ◽

Asymptotic Approach ◽

Testing Procedures ◽

Assessing Sensitivity

Sensitivity and specificity are often used to assess the performance of a diagnostic test with binary outcomes. Wald-type test statistics have been proposed for testing sensitivity and specificity individually. In the presence of a gold standard, simultaneous comparison between two diagnostic tests for noninferiority of sensitivity and specificity based on an asymptotic approach has been studied by Chen et al. (2003). However, the asymptotic approach may suffer from unsatisfactory type I error control as observed from many studies, especially in small to medium sample settings. In this paper, we compare three unconditional approaches for simultaneously testing sensitivity and specificity. They are approaches based on estimation, maximization, and a combination of estimation and maximization. Although the estimation approach does not guarantee type I error, it has satisfactory performance with regard to type I error control. The other two unconditional approaches are exact. The approach based on estimation and maximization is generally more powerful than the approach based on maximization.

Download Full-text

Impact of unequal cluster sizes for GEE analyses of stepped wedge cluster randomized trials with binary outcomes

Biometrical Journal ◽

10.1002/bimj.202100112 ◽

2021 ◽

Author(s):

Zibo Tian ◽

John S. Preisser ◽

Denise Esserman ◽

Elizabeth L. Turner ◽

Paul J. Rathouz ◽

...

Keyword(s):

Randomized Trials ◽

Binary Outcomes ◽

Cluster Randomized Trials ◽

Stepped Wedge ◽

Cluster Randomized ◽

Unequal Cluster

Download Full-text

Inflation of type I error rate in two statistical tests for the detection of publication bias in meta-analyses with binary outcomes

Statistics in Medicine ◽

10.1002/sim.1224 ◽

2002 ◽

Vol 21 (17) ◽

pp. 2465-2477 ◽

Cited By ~ 50

Author(s):

Guido Schwarzer ◽

Gerd Antes ◽

Martin Schumacher

Keyword(s):

Publication Bias ◽

Error Rate ◽

Type I Error ◽

Statistical Tests ◽

Binary Outcomes ◽

Type I ◽

Type I Error Rate ◽

Meta Analyses

Download Full-text

Improved procedures and computer programs for equivalence assessment of correlation coefficients

PLoS ONE ◽

10.1371/journal.pone.0252323 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0252323

Author(s):

Gwowen Shieh

Keyword(s):

Correlation Coefficient ◽

Type I Error ◽

Reference Range ◽

Correlation Coefficients ◽

Sample Size Determination ◽

Power Calculation ◽

Type I ◽

Significance Tests ◽

Equivalence Tests ◽

Essential Problem

The correlation coefficient is the most commonly used measure for summarizing the magnitude and direction of linear relationship between two response variables. Considerable literature has been devoted to the inference procedures for significance tests and confidence intervals of correlations. However, the essential problem of evaluating correlation equivalence has not been adequately examined. For the purpose of expanding the usefulness of correlational techniques, this article focuses on the Pearson product-moment correlation coefficient and the Fisher’s z transformation for developing equivalence procedures of correlation coefficients. Equivalence tests are proposed to assess whether a correlation coefficient is within a designated reference range for declaring equivalence decisions. The important aspects of Type I error rate, power calculation, and sample size determination are also considered. Special emphasis is given to clarify the nature and deficiency of the two one-sided tests for detecting a lack of association. The findings demonstrate the inappropriateness of existing methods for equivalence appraisal and validate the suggested techniques as reliable and primary tools in correlation analysis.

Download Full-text