Logistic regression for dichotomized counts

2016 ◽  
Vol 25 (6) ◽  
pp. 3038-3056 ◽  
Author(s):  
John S Preisser ◽  
Kalyan Das ◽  
Habtamu Benecha ◽  
John W Stamm

Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren.

2021 ◽  
pp. 174077452110101
Author(s):  
Jennifer Proper ◽  
John Connett ◽  
Thomas Murray

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuai Wang ◽  
James B. Meigs ◽  
Josée Dupuis

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both types of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have a well-controlled type-I error rate, but the multinomial logistic regression has an inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power. However, because the Wald statistics rely on computer-intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.


2021 ◽  
Author(s):  
Shuai Wang ◽  
James Meigs ◽  
Josee Dupuis

Abstract Background Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both type of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of widely used efficient statistical methods and software. Results We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have well-controlled type-I error rate, but the multinomial logistic regression has inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. Conclusion Both proposed tests have correct type-I error rate and similar power rate. However, because the Wald statistics rely on computer intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.


2010 ◽  
Vol 49 (06) ◽  
pp. 618-624 ◽  
Author(s):  
H. Schmidli ◽  
T. Friede

Summary Background: In the planning of clinical trials with count outcomes such as the number of exacerbations in chronic obstructive pulmonary disease (COPD) often considerable uncertainty exists with regard to the overall event rate and the level of overdispersion which are both crucial for sample size calculations. Objectives: To develop a sample size reestimation strategy that maintains the blinding of the trial, controls the type I error rate and is robust against misspecification of the nuisance parameters in the planning phase in that the actual power is close to the target. Methods: The operation characteristics of the developed sample size reestimation procedure are investigated in a Monte Carlo simulation study. Results: Estimators of the overall event rate and the overdispersion parameter that do not require unblinding can be used to effectively adjust the sample size without inflating the type I error rate while providing power values close to the target. Conclusions: If only little information is available regarding the size of the overall event rate and the overdispersion parameter in the design phase of a trial, we recommend the use of a design with sample size reestimation as the one suggested here. Trials in COPD are expected to benefit from the proposed sample size reestimation strategy.


2016 ◽  
Vol 2 (2) ◽  
pp. 290
Author(s):  
Ying Jin ◽  
Hershel Eason

<p>The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling units were more likely to be clusters (<em>e.g.</em>, schools). With short tests, regular DIF methods under MAD-present conditions might suffer from inflated type I error rate due to low reliability of test scores, which would adversely impact the matching ability of the covariate (<em>i.e.</em>, the total score) in DIF analysis. The current study compared the performance of three DIF methods: logistic regression (LR), hierarchical logistic regression (HLR) taking multilevel structure into account, and hierarchical logistic regression with latent covariate (HLR-LC) taking multilevel structure into account as well as accounting for low reliability and MAD. The results indicated that HLR-LC outperformed both LR and HLR under most simulated conditions, especially under the MAD-present conditions when tests were short. Practical implications of the implementation of HLR-LC were also discussed. <strong></strong></p>


2017 ◽  
Vol 2 ◽  
pp. 54 ◽  
Author(s):  
Richard Howey ◽  
Heather J. Cordell

Background: In a recent paper, a novel W-test for pairwise epistasis testing was proposed that appeared, in computer simulations, to have higher power than competing alternatives. Application to genome-wide bipolar data detected significant epistasis between SNPs in genes of relevant biological function. Network analysis indicated that the implicated genes formed two separate interaction networks, each containing genes highly related to autism and neurodegenerative disorders. Methods: Here we investigate further the properties and performance of the W-test via theoretical evaluation, computer simulations and application to real data. Results: We demonstrate that, for common variants, the W-test is closely related to several existing tests of association allowing for interaction, including logistic regression on 8 degrees of freedom, although logistic regression can show inflated type I error for low minor allele frequencies,  whereas the W-test shows good/conservative type I error control. Although in some situations the W-test can show higher power, logistic regression is not limited to tests on 8 degrees of freedom but can instead be tailored to impose greater structure on the assumed alternative hypothesis, offering a power advantage when the imposed structure matches the true structure. Conclusions: The W-test is a potentially useful method for testing for association - without necessarily implying interaction - between genetic variants disease, particularly when one or more of the genetic variants are rare. For common variants, the advantages of the W-test are less clear, and, indeed, there are situations where existing methods perform better. In our investigations, we further uncover a number of problems with the practical implementation and application of the W-test (to bipolar disorder) previously described, apparently due to inadequate use of standard data quality-control procedures. This observation leads us to urge caution in interpretation of the previously-presented results, most of which we consider are highly likely to be artefacts.


2015 ◽  
Vol 27 (1) ◽  
pp. 20-34 ◽  
Author(s):  
Liqiong Fan ◽  
Sharon D Yeatts ◽  
Bethany J Wolf ◽  
Leslie A McClure ◽  
Magdy Selim ◽  
...  

Under covariate adaptive randomization, the covariate is tied to both randomization and analysis. Misclassification of such covariate will impact the intended treatment assignment; further, it is unclear what the appropriate analysis strategy should be. We explore the impact of such misclassification on the trial’s statistical operating characteristics. Simulation scenarios were created based on the misclassification rate and the covariate effect on the outcome. Models including unadjusted, adjusted for the misclassified, or adjusted for the corrected covariate were compared using logistic regression for a binary outcome and Poisson regression for a count outcome. For the binary outcome using logistic regression, type I error can be maintained in the adjusted model, but the test is conservative using an unadjusted model. Power decreased with both increasing covariate effect on the outcome as well as the misclassification rate. Treatment effect estimates were biased towards the null for both the misclassified and unadjusted models. For the count outcome using a Poisson model, covariate misclassification led to inflated type I error probabilities and reduced power in the misclassified and the unadjusted model. The impact of covariate misclassification under covariate–adaptive randomization differs depending on the underlying distribution of the outcome.


Sign in / Sign up

Export Citation Format

Share Document