scholarly journals Shrinkage Parameter Estimation in Penalized Logistic Regression Analysis of Case-Control Data

Author(s):  
Ying Yu ◽  
Siyuan Chen ◽  
Brad McNeney

AbstractIn genetic epidemiology, rare variant case-control studies aim to investigate the association between rare genetic variants and human diseases. Rare genetic variants lead to sparse covariates that are predominately zeros and this sparseness leads to estimators of log-OR parameters that are biased away from their null value of zero. Different penalized-likelihood methods have been developed to mitigate this sparse-data bias for case-control studies. In this research article, we study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. We propose a maximum marginal likelihood method for estimating m, with the marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm and (ii) a combination of a Laplace approximation and derivative-free optimization of the marginal likelihood. We evaluate the statistical properties of the estimator through simulation studies and apply the methods to the analysis of genetic data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).

2017 ◽  
Vol 28 (3) ◽  
pp. 822-834
Author(s):  
Mitchell H Gail ◽  
Sebastien Haneuse

Sample size calculations are needed to design and assess the feasibility of case-control studies. Although such calculations are readily available for simple case-control designs and univariate analyses, there is limited theory and software for multivariate unconditional logistic analysis of case-control data. Here we outline the theory needed to detect scalar exposure effects or scalar interactions while controlling for other covariates in logistic regression. Both analytical and simulation methods are presented, together with links to the corresponding software.


Biostatistics ◽  
2020 ◽  
Author(s):  
Nadim Ballout ◽  
Cedric Garcia ◽  
Vivian Viallon

Summary The analysis of case–control studies with several disease subtypes is increasingly common, e.g. in cancer epidemiology. For matched designs, a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among disease subtypes, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of stratified regression models. For unmatched designs, we compare two standard methods based on $L_1$-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among disease subtypes: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs, in terms of estimation and prediction accuracy, variable selection and identification of heterogeneities. We also present preliminary results from the analysis of a case–control study nested within the EPIC (European Prospective Investigation into Cancer and nutrition) cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.


2014 ◽  
Vol 121 (2) ◽  
pp. 285-296 ◽  
Author(s):  
Cody L. Nesvick ◽  
Clinton J. Thompson ◽  
Frederick A. Boop ◽  
Paul Klimo

Object Observational studies, such as cohort and case-control studies, are valuable instruments in evidence-based medicine. Case-control studies, in particular, are becoming increasingly popular in the neurosurgical literature due to their low cost and relative ease of execution; however, no one has yet systematically assessed these types of studies for quality in methodology and reporting. Methods The authors performed a literature search using PubMed/MEDLINE to identify all studies that explicitly identified themselves as “case-control” and were published in the JNS Publishing Group journals (Journal of Neurosurgery, Journal of Neurosurgery: Pediatrics, Journal of Neurosurgery: Spine, and Neurosurgical Focus) or Neurosurgery. Each paper was evaluated for 22 descriptive variables and then categorized as having either met or missed the basic definition of a case-control study. All studies that evaluated risk factors for a well-defined outcome were considered true case-control studies. The authors sought to identify key features or phrases that were or were not predictive of a true case-control study. Those papers that satisfied the definition were further evaluated using the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist. Results The search detected 67 papers that met the inclusion criteria, of which 32 (48%) represented true case-control studies. The frequency of true case-control studies has not changed with time. Use of odds ratios (ORs) and logistic regression (LR) analysis were strong positive predictors of true case-control studies (for odds ratios, OR 15.33 and 95% CI 4.52–51.97; for logistic regression analysis, OR 8.77 and 95% CI 2.69–28.56). Conversely, negative predictors included focus on a procedure/intervention (OR 0.35, 95% CI 0.13–0.998) and use of the word “outcome” in the Results section (OR 0.23, 95% CI 0.082–0.65). After exclusion of nested case-control studies, the negative correlation between focus on a procedure/intervention and true case-control studies was strengthened (OR 0.053, 95% CI 0.0064–0.44). There was a trend toward a negative association between the use of survival analysis or Kaplan-Meier curves and true case-control studies (OR 0.13, 95% CI 0.015–1.12). True case-control studies were no more likely than their counterparts to use a potential study design “expert” (OR 1.50, 95% CI 0.57–3.95). The overall average STROBE score was 72% (range 50–86%). Examples of reporting deficiencies were reporting of bias (28%), missing data (55%), and funding (44%). Conclusions The results of this analysis show that the majority of studies in the neurosurgical literature that identify themselves as “case-control” studies are, in fact, labeled incorrectly. Positive and negative predictors were identified. The authors provide several recommendations that may reverse the incorrect and inappropriate use of the term “case-control” and improve the quality of design and reporting of true case-control studies in neurosurgery.


Sign in / Sign up

Export Citation Format

Share Document