scholarly journals Penalized logistic regression for high-dimensional DNA methylation data with case-control studies

2012 ◽  
Vol 28 (10) ◽  
pp. 1368-1375 ◽  
Author(s):  
Hokeun Sun ◽  
Shuang Wang
Author(s):  
Ying Yu ◽  
Siyuan Chen ◽  
Brad McNeney

AbstractIn genetic epidemiology, rare variant case-control studies aim to investigate the association between rare genetic variants and human diseases. Rare genetic variants lead to sparse covariates that are predominately zeros and this sparseness leads to estimators of log-OR parameters that are biased away from their null value of zero. Different penalized-likelihood methods have been developed to mitigate this sparse-data bias for case-control studies. In this research article, we study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. We propose a maximum marginal likelihood method for estimating m, with the marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm and (ii) a combination of a Laplace approximation and derivative-free optimization of the marginal likelihood. We evaluate the statistical properties of the estimator through simulation studies and apply the methods to the analysis of genetic data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).


2017 ◽  
Vol 28 (3) ◽  
pp. 822-834
Author(s):  
Mitchell H Gail ◽  
Sebastien Haneuse

Sample size calculations are needed to design and assess the feasibility of case-control studies. Although such calculations are readily available for simple case-control designs and univariate analyses, there is limited theory and software for multivariate unconditional logistic analysis of case-control data. Here we outline the theory needed to detect scalar exposure effects or scalar interactions while controlling for other covariates in logistic regression. Both analytical and simulation methods are presented, together with links to the corresponding software.


Cells ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 3384
Author(s):  
Chenglong Yu ◽  
Allison M. Hodge ◽  
Ee Ming Wong ◽  
Jihoon Eric Joo ◽  
Enes Makalic ◽  
...  

Genetic variants in FOXO3 are associated with longevity. Here, we assessed whether blood DNA methylation at FOXO3 was associated with cancer risk, survival, and mortality. We used data from eight prospective case–control studies of breast (n = 409 cases), colorectal (n = 835), gastric (n = 170), kidney (n = 143), lung (n = 332), prostate (n = 869), and urothelial (n = 428) cancer and B-cell lymphoma (n = 438). Case–control pairs were matched on age, sex, country of birth, and smoking (lung cancer study). Conditional logistic regression was used to assess associations between cancer risk and methylation at 45 CpGs of FOXO3 included on the HumanMethylation450 assay. Mixed-effects Cox models were used to estimate hazard ratios (HR) and 95% confidence intervals (CI) for associations with cancer survival (total n = 2286 deaths). Additionally, using data from 1088 older participants, we assessed associations of FOXO3 methylation with overall and cause-specific mortality (n = 354 deaths). Methylation at a CpG in the first exon region of FOXO3 (6:108882981) was associated with gastric cancer survival (HR = 2.39, 95% CI: 1.60–3.56, p = 1.9 × 10−5). Methylation at three CpGs in TSS1500 and gene body was associated with lung cancer survival (p < 6.1 × 10−5). We found no evidence of associations of FOXO3 methylation with cancer risk and mortality. Our findings may contribute to understanding the implication of FOXO3 in longevity.


PLoS ONE ◽  
2019 ◽  
Vol 14 (5) ◽  
pp. e0217057 ◽  
Author(s):  
Sam Doerken ◽  
Marta Avalos ◽  
Emmanuel Lagarde ◽  
Martin Schumacher

Biostatistics ◽  
2020 ◽  
Author(s):  
Nadim Ballout ◽  
Cedric Garcia ◽  
Vivian Viallon

Summary The analysis of case–control studies with several disease subtypes is increasingly common, e.g. in cancer epidemiology. For matched designs, a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among disease subtypes, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of stratified regression models. For unmatched designs, we compare two standard methods based on $L_1$-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among disease subtypes: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs, in terms of estimation and prediction accuracy, variable selection and identification of heterogeneities. We also present preliminary results from the analysis of a case–control study nested within the EPIC (European Prospective Investigation into Cancer and nutrition) cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.


Entropy ◽  
2020 ◽  
Vol 22 (5) ◽  
pp. 543 ◽  
Author(s):  
Konrad Furmańczyk ◽  
Wojciech Rejchel

In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results.


Sign in / Sign up

Export Citation Format

Share Document