scholarly journals The impact of covariate misclassification using generalized linear regression under covariate–adaptive randomization

2015 ◽  
Vol 27 (1) ◽  
pp. 20-34 ◽  
Author(s):  
Liqiong Fan ◽  
Sharon D Yeatts ◽  
Bethany J Wolf ◽  
Leslie A McClure ◽  
Magdy Selim ◽  
...  

Under covariate adaptive randomization, the covariate is tied to both randomization and analysis. Misclassification of such covariate will impact the intended treatment assignment; further, it is unclear what the appropriate analysis strategy should be. We explore the impact of such misclassification on the trial’s statistical operating characteristics. Simulation scenarios were created based on the misclassification rate and the covariate effect on the outcome. Models including unadjusted, adjusted for the misclassified, or adjusted for the corrected covariate were compared using logistic regression for a binary outcome and Poisson regression for a count outcome. For the binary outcome using logistic regression, type I error can be maintained in the adjusted model, but the test is conservative using an unadjusted model. Power decreased with both increasing covariate effect on the outcome as well as the misclassification rate. Treatment effect estimates were biased towards the null for both the misclassified and unadjusted models. For the count outcome using a Poisson model, covariate misclassification led to inflated type I error probabilities and reduced power in the misclassified and the unadjusted model. The impact of covariate misclassification under covariate–adaptive randomization differs depending on the underlying distribution of the outcome.

2020 ◽  
pp. 174077452096420
Author(s):  
Leandro Garcia Barrado ◽  
Tomasz Burzykowski

Objective: We investigate the impact of biomarker assay’s accuracy on the operating characteristics of a Bayesian biomarker-driven outcome-adaptive randomization design. Methods: In a simulation study, we assume a trial with two treatments, two biomarker-based strata, and a binary clinical outcome (response). P bt denotes the probability of response for treatment t ( t = 0 or 1) in biomarker stratum ( b = 0 or 1). Four different scenarios in terms of true underlying response probabilities are considered: a null ( P00 = P01 = 0.25, P10 = P11= 0.25) and consistent ( P00 = P10 = 0.25, P01 = 0.5) treatment effect scenario, as well as a quantitative ( P00 = P01 = P10 = 0.25, P11 = 0.5) and a qualitative ( P00 = P11 = 0.5, P01 = P10 = 0.25) stratum-treatment interaction. For each scenario, we compare the case of a perfect with the case of an imperfect biomarker assay with sensitivity and specificity of 0.8 and 0.7, respectively. In addition, biomarker-positive prevalence values P( B = 1) = 0.2 and 0.5 are investigated. Results: Results show that the use of an imperfect assay affects the operational characteristics of the Bayesian biomarker-based outcome-adaptive randomization design. In particular, the misclassification causes a substantial reduction in power accompanied by a considerable increase in the type-I error probability. The magnitude of these effects depends on the sensitivity and specificity of the assay, as well as on the distribution of the biomarker in the patient population. Conclusion: With an imperfect biomarker assay, the decision to apply a biomarker-based outcome-adaptive randomization design may require careful reflection.


2021 ◽  
pp. 174077452110101
Author(s):  
Jennifer Proper ◽  
John Connett ◽  
Thomas Murray

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.


2021 ◽  
pp. 096228022110082
Author(s):  
Yang Li ◽  
Wei Ma ◽  
Yichen Qin ◽  
Feifang Hu

Concerns have been expressed over the validity of statistical inference under covariate-adaptive randomization despite the extensive use in clinical trials. In the literature, the inferential properties under covariate-adaptive randomization have been mainly studied for continuous responses; in particular, it is well known that the usual two-sample t-test for treatment effect is typically conservative. This phenomenon of invalid tests has also been found for generalized linear models without adjusting for the covariates and are sometimes more worrisome due to inflated Type I error. The purpose of this study is to examine the unadjusted test for treatment effect under generalized linear models and covariate-adaptive randomization. For a large class of covariate-adaptive randomization methods, we obtain the asymptotic distribution of the test statistic under the null hypothesis and derive the conditions under which the test is conservative, valid, or anti-conservative. Several commonly used generalized linear models, such as logistic regression and Poisson regression, are discussed in detail. An adjustment method is also proposed to achieve a valid size based on the asymptotic results. Numerical studies confirm the theoretical findings and demonstrate the effectiveness of the proposed adjustment method.


Author(s):  
Judith H. Parkinson-Schwarz ◽  
Arne C. Bathke

AbstractIn this paper, we propose a new non-parametric test for equality of distributions. The test is based on the recently introduced measure of (niche) overlap and its rank-based estimator. As the estimator makes only one basic assumption on the underlying distribution, namely continuity, the test is universal applicable in contrast to many tests that are restricted to only specific scenarios. By construction, the new test is capable of detecting differences in location and scale. It thus complements the large class of rank-based tests that are constructed based on the non-parametric relative effect. In simulations this new test procedure obtained higher power and lower type I error compared to two common tests in several settings. The new procedure shows overall good performance. Together with its simplicity, this test can be used broadly.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Elahe Allahyari ◽  
Peyman Jafari ◽  
Zahra Bagheri

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.


2015 ◽  
Vol 46 (3) ◽  
pp. 586-603 ◽  
Author(s):  
Ma Dolores Hidalgo ◽  
Isabel Benítez ◽  
Jose-Luis Padilla ◽  
Juana Gómez-Benito

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A simulation study was conducted, focusing on potential variables related to DIF in polytomous items, such as DIF pattern, sample size, magnitude, and percentage of DIF items. The results showed that DIF patterns and the number of DIF items affected the type I error rates and effect size of t-test values. The results highlighted the need to analyze DIF before making comparative group interpretations.


2021 ◽  
Author(s):  
Angély Loubert ◽  
Antoine Regnault ◽  
Véronique Sébille ◽  
Jean-Benoit Hardouin

Abstract BackgroundIn the analysis of clinical trial endpoints, calibration of patient-reported outcomes (PRO) instruments ensures that resulting “scores” represent the same quantity of the measured concept between applications. Rasch measurement theory (RMT) is a psychometric approach that guarantees algebraic separation of person and item parameter estimates, allowing formal calibration of PRO instruments. In the RMT framework, calibration is performed using the item parameter estimates obtained from a previous “calibration” study. But if calibration is based on poorly estimated item parameters (e.g., because the sample size of the calibration sample was low), this may hamper the ability to detect a treatment effect, and direct estimation of item parameters from the trial data (non-calibration) may then be preferred. The objective of this simulation study was to assess the impact of calibration on the comparison of PRO results between treatment groups, using different analysis methods.MethodsPRO results were simulated following a polytomous Rasch model, for a calibration and a trial sample. Scenarios included varying sample sizes, with instrument of varying number of items and modalities, and varying item parameters distributions. Different treatment effect sizes and distributions of the two patient samples were also explored. Comparison of treatment groups was performed using different methods based on a random effect Rasch model. Calibrated and non-calibrated approaches were compared based on type-I error, power, bias, and variance of the estimates for the difference between groups.Results There was no impact of the calibration approach on type-I error, power, bias, and dispersion of the estimates. Among other findings, mistargeting between the PRO instrument and patients from the trial sample (regarding the level of measured concept) resulted in a lower power and higher position bias than appropriate targeting. ConclusionsCalibration of PROs in clinical trials does not compromise the ability to accurately assess a treatment effect and is essential to properly interpret PRO results. Given its important added value, calibration should thus always be performed when a PRO instrument is used as an endpoint in a clinical trial, in the RMT framework.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuai Wang ◽  
James B. Meigs ◽  
Josée Dupuis

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both types of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have a well-controlled type-I error rate, but the multinomial logistic regression has an inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power. However, because the Wald statistics rely on computer-intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.


Sign in / Sign up

Export Citation Format

Share Document