scholarly journals Multinomial logistic regression approach to haplotype association analysis in population-based case-control studies

BMC Genetics ◽  
2006 ◽  
Vol 7 (1) ◽  
Author(s):  
Yi-Hau Chen ◽  
Jau-Tsuen Kao
Biostatistics ◽  
2020 ◽  
Author(s):  
Nadim Ballout ◽  
Cedric Garcia ◽  
Vivian Viallon

Summary The analysis of case–control studies with several disease subtypes is increasingly common, e.g. in cancer epidemiology. For matched designs, a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among disease subtypes, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of stratified regression models. For unmatched designs, we compare two standard methods based on $L_1$-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among disease subtypes: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs, in terms of estimation and prediction accuracy, variable selection and identification of heterogeneities. We also present preliminary results from the analysis of a case–control study nested within the EPIC (European Prospective Investigation into Cancer and nutrition) cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.


2005 ◽  
Vol 86 (3) ◽  
pp. 223-231 ◽  
Author(s):  
QIHUA TAN ◽  
LENE CHRISTIANSEN ◽  
KAARE CHRISTENSEN ◽  
LISE BATHUM ◽  
SHUXIA LI ◽  
...  

Haplotype inference has become an important part of human genetic data analysis due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping. Different statistical methods have been proposed for detecting haplotype – disease associations using unphased multi-locus genotype data, ranging from the early approach by the simple gene-counting method to the recent work using the generalized linear model. However, these methods are either confined to case – control design or unable to yield unbiased point and interval estimates of haplotype effects. Based on the popular logistic regression model, we present a new approach for haplotype association analysis of human disease traits. Using haplotype-based parameterization, our model infers the effects of specific haplotypes (point estimation) and constructs confidence interval for the risks of haplotypes (interval estimation). Based on the estimated parameters, the model calculates haplotype frequency conditional on the trait value for both discrete and continuous traits. Moreover, our model provides an overall significance level for the association between the disease trait and a group or all of the haplotypes. Featured by the direct maximization in haplotype estimation, our method also facilitates a computer simulation approach for correcting the significance level of individual haplotype to adjust for multiple testing. We show, by applying the model to an empirical data set, that our method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits.


Neurology ◽  
2018 ◽  
Vol 90 (7) ◽  
pp. e583-e592 ◽  
Author(s):  
Pei-Chen Lee ◽  
Ismaïl Ahmed ◽  
Marie-Anne Loriot ◽  
Claire Mulot ◽  
Kimberly C. Paul ◽  
...  

ObjectiveTo investigate whether cigarette smoking interacts with genes involved in individual susceptibility to xenobiotics for the risk of Parkinson disease (PD).MethodsTwo French population-based case-control studies (513 patients, 1,147 controls) were included as a discovery sample to examine gene-smoking interactions based on 3,179 single nucleotide polymorphisms (SNPs) in 289 genes involved in individual susceptibility to xenobiotics. SNP–by–cigarette smoking interactions were tested in the discovery sample through an empirical Bayes (EB) approach. Nine SNPs were selected for replication in a population-based case-control study from California (410 patients, 845 controls) with standard logistic regression and the EB approach. For SNPs that replicated, we performed pooled analyses including the discovery and replication datasets and computed pooled odds ratios and confidence intervals (CIs) using random-effects meta-analysis.ResultsNine SNPs interacted with smoking in the discovery dataset and were selected for replication. Interactions of smoking with rs4240705 in theRXRAgene and rs1900586 in theSLC17A6gene were replicated. In pooled analyses (logistic regression), the interactions between smoking and rs4240705-G and rs1900586-G were 1.66 (95% CI 1.28–2.14,p= 1.1 × 10−4,pfor heterogeneity = 0.366) and 1.61 (95% CI 1.17–2.21,p= 0.003,pfor heterogeneity = 0.616), respectively. For both SNPs, while smoking was significantly less frequent in patients than controls in AA homozygotes, this inverse association disappeared in G allele carriers.ConclusionsWe identified and replicated suggestive gene-by-smoking interactions in PD. The inverse association of smoking with PD was less pronounced in carriers of minor alleles of bothRXRA-rs4240705 andSLC17A6-rs1900586. These findings may help identify biological pathways involved in the inverse association between smoking and PD.


Cancers ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1378
Author(s):  
Tú Nguyen-Dumont ◽  
James G. Dowty ◽  
Jason A. Steen ◽  
Anne-Laure Renault ◽  
Fleur Hammet ◽  
...  

Case-control studies of breast cancer have consistently shown that pathogenic variants in CHEK2 are associated with about a 3-fold increased risk of breast cancer. Information about the recurrent protein-truncating variant CHEK2 c.1100delC dominates this estimate. There have been no formal estimates of age-specific cumulative risk of breast cancer for all CHEK2 pathogenic (including likely pathogenic) variants combined. We conducted a population-based case-control-family study of pathogenic CHEK2 variants (26 families, 1071 relatives) and estimated the age-specific cumulative risk of breast cancer using segregation analysis. The estimated hazard ratio for carriers of pathogenic CHEK2 variants (combined) was 4.9 (95% CI 2.5–9.5) relative to non-carriers. The HR for carriers of the CHEK2 c.1100delC variant was estimated to be 3.5 (95% CI 1.02–11.6) and the HR for carriers of all other CHEK2 variants combined was estimated to be 5.7 (95% CI 2.5–12.9). The age-specific cumulative risk of breast cancer was estimated to be 18% (95% CI 11–30%) and 33% (95% CI 21–48%) to age 60 and 80 years, respectively. These findings provide important information for the clinical management of breast cancer risk for women carrying pathogenic variants in CHEK2.


2017 ◽  
Vol 28 (3) ◽  
pp. 822-834
Author(s):  
Mitchell H Gail ◽  
Sebastien Haneuse

Sample size calculations are needed to design and assess the feasibility of case-control studies. Although such calculations are readily available for simple case-control designs and univariate analyses, there is limited theory and software for multivariate unconditional logistic analysis of case-control data. Here we outline the theory needed to detect scalar exposure effects or scalar interactions while controlling for other covariates in logistic regression. Both analytical and simulation methods are presented, together with links to the corresponding software.


2017 ◽  
Vol 17 (9) ◽  
pp. 965-973 ◽  
Author(s):  
Grant A Mackenzie ◽  
Philip C Hill ◽  
Shah M Sahito ◽  
David J Jeffries ◽  
Ilias Hossain ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document