1,000x Faster Than PLINK: Genome-Wide Epistasis Detection with Logistic Regression Using Combined FPGA and GPU Accelerators

GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies

BMC Bioinformatics ◽

10.1186/1471-2105-14-166 ◽

2013 ◽

Vol 14 (1) ◽

Cited By ~ 20

Author(s):

Karolina Sikorska ◽

Emmanuel Lesaffre ◽

Patrick FJ Groenen ◽

Paul HC Eilers

Keyword(s):

Logistic Regression ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Abstract TMP50: A Polymorphism in HDAC9 was Associated with Large Vessel Stroke in the Vienna and German Stroke Studies

Stroke ◽

10.1161/str.44.suppl_1.atmp50 ◽

2013 ◽

Vol 44 (suppl_1) ◽

Author(s):

May M Luke ◽

Carmen H Tong ◽

Joseph J Catanese ◽

James J Devlin ◽

Christine Mannhalter ◽

...

Keyword(s):

Logistic Regression ◽

Ischemic Stroke ◽

Genome Wide Association Study ◽

Multinomial Logistic Regression ◽

Large Vessel ◽

Stroke Subtype ◽

German Study ◽

Genome Wide ◽

Increased Risk ◽

Large Vessel Stroke

Introduction The International Stroke Genetics Consortium (ISGC) and the Wellcome Trust Case Control Consortium 2 (WTCCC2) performed a large genome wide association study of ischemic stroke and its subtypes (large vessel stroke (LVD), small vessel stroke (SVD), cardioembolic stroke (CE)), and identified a polymorphism in HDAC9 (rs11984041) associated with the LVD subtype of ischemic stroke. Hypothesis We assessed the hypothesis that rs11984041 is associated with LVD in two additional studies. Methods The genotype of rs11984041 was determined for participants of the Vienna Study (815 controls, 122 LVD, 165 SVD, 202 CE) and of the German Study (1040 controls, 495 LVD, 230 SVD, 462 CE). The association of rs11984041 with LVD was assessed by logistic regression. Heterogeneity of the effect of rs11984041 on LVD, CE or SVD was assessed by testing the equality of the corresponding regression coefficients from a multinomial logistic regression model. Results Carriers of the minor (T) allele of rs11984041 (23.3% of LVD cases and 17.4% of controls), compared with noncarriers, had increased risk for LVD: the odds ratios (OR) were 1.92 (95%CI 1.25-2.96) for the Vienna Study and 1.33 (95%CI 1.02-1.74) for the German Study. Adjusting for covariates including sex, age, diabetes, and hypertension did not materially change the ORs. Heterogeneity of the effects of rs11984041 on LVD vs CE was significant in the Vienna Study (p = 0.009) and in the German Study (p = 0.005). Heterogeneity of the effects of rs11984041 on LVD vs SVD trended toward significance in the Vienna Study (p = 0.088) and was significant in the German Study (p = 0.047). Adjusting for covariates did not materially change the heterogeneity test p values. Conclusions The HDAC9 polymorphism rs11984041 was associated with the LVD stroke subtype in the Vienna Study and the German Study. These results replicated the ISGC/WTCCC2 findings.

Download Full-text

Genome-wide discovery of monotone mutations using logistic regression analysis of evolutionary experiment with Drosophila

Journal of the Korean Data and Information Science Society ◽

10.7465/jkdi.2019.30.2.503 ◽

2019 ◽

Vol 30 (2) ◽

pp. 503-513

Author(s):

Minjung Kwak

Keyword(s):

Logistic Regression ◽

Regression Analysis ◽

Logistic Regression Analysis ◽

Genome Wide

Download Full-text

Mixed logistic regression in genome-wide association studies

BMC Bioinformatics ◽

10.1186/s12859-020-03862-2 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacqueline Milet ◽

David Courtin ◽

André Garcia ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Cox Model ◽

Association Studies ◽

Scale Up ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

Download Full-text

CUDA-LR: CUDA-accelerated logistic regression analysis tool for gene-gene interaction for genome-wide association study

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ◽

10.1109/bibmw.2011.6112454 ◽

2011 ◽

Author(s):

Sungyoung Lee ◽

Min-Seok Kwon ◽

Ik-Soo Huh ◽

Taesung Park

Keyword(s):

Logistic Regression ◽

Regression Analysis ◽

Association Study ◽

Logistic Regression Analysis ◽

Genome Wide Association Study ◽

Gene Interaction ◽

Genome Wide Association ◽

Analysis Tool ◽

Genome Wide

Download Full-text

Mixed Logistic Regression in Genome-Wide Association Studies

10.1101/2020.01.17.910109 ◽

2020 ◽

Author(s):

Jacqueline Milet ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Association Studies ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Detecting latent exposure in genome-wide association studies using a breakpoint model for logistic regression

Statistical Methods in Medical Research ◽

10.1177/0962280218776385 ◽

2018 ◽

Vol 28 (6) ◽

pp. 1781-1792

Author(s):

Flora Alarcon ◽

Gregory Nuel

Keyword(s):

Logistic Regression ◽

Control Method ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Powerful Method ◽

Challenging Problem ◽

Standard Methods ◽

Gene Environment ◽

Genome Wide

Detecting gene-environment (G × E) interactions in the context of genome-wide association studies (GWAS) is a challenging problem since standard methods generally present a lack of power. An additional difficulty arises from the fact that the causal exposure is seldom observed and only a proxy of this exposure is observed. This leads to an additional drop in terms of power and it explains the failure of standard methods in detecting interactions, even very strong ones. In this article, we consider the latent exposure as a source of heterogeneity and we propose a new powerful method, named “Breakpoint Model for Logistic Regression” (BMLR), based on a breakpoint model, in order to detect G × E interactions when causal exposure is unobserved. First, the BMLR method is compared to the ordered-subset analysis for case-control method, which has been developed for the same purpose, through simulations. This highlights the ability of BMLR to detect the heterogeneity, and therefore, to detect interaction with latent exposure. Finally, the BMLR method is compared to standard methods, such as Plink, to perform a GWAS on a published realistic benchmark.

Download Full-text

Genome-wide association analysis by lasso penalized logistic regression

Bioinformatics ◽

10.1093/bioinformatics/btp041 ◽

2009 ◽

Vol 25 (6) ◽

pp. 714-721 ◽

Cited By ~ 413

Author(s):

Tong Tong Wu ◽

Yi Fang Chen ◽

Trevor Hastie ◽

Eric Sobel ◽

Kenneth Lange

Keyword(s):

Logistic Regression ◽

Association Analysis ◽

Genome Wide Association ◽

Genome Wide Association Analysis ◽

Genome Wide ◽

Penalized Logistic Regression

Download Full-text

A Differential Privacy Preserving Approach for Logistic Regression in Genome-Wide Association Studies

2019 International Conference on Networking and Network Applications (NaNA) ◽

10.1109/nana.2019.00040 ◽

2019 ◽

Author(s):

Ziwei Han ◽

Laifeng Lu ◽

Hai Liu

Keyword(s):

Logistic Regression ◽

Differential Privacy ◽

Association Studies ◽

Privacy Preserving ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Identification of the NF-κB activating protein-like locus as a risk locus for rheumatoid arthritis

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2012-202076 ◽

2012 ◽

Vol 72 (7) ◽

pp. 1249-1254 ◽

Cited By ~ 4

Author(s):

Gang Xie ◽

Yue Lu ◽

Ye Sun ◽

Steven Shiyang Zhang ◽

Edward Clark Keystone ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Logistic Regression ◽

Haplotype Block ◽

Conditional Logistic Regression ◽

Risk Alleles ◽

Genome Wide ◽

The Us ◽

Single Marker ◽

Risk Locus ◽

Genome Wide Study

ObjectiveTo fine-map the NF-κB activating protein-like (NKAPL) locus identified in a prior genome-wide study as a possible rheumatoid arthritis (RA) risk locus and thereby delineate additional variants with stronger and/or independent disease association.MethodsGenotypes for 101 SNPs across the NKAPL locus on chromosome 6p22.1 were obtained on 1368 Canadian RA cases and 1471 controls. Single marker associations were examined using logistic regression and the most strongly associated NKAPL locus SNPs then typed in another Canadian and a US-based RA case/control cohort.ResultsFine-mapping analyses identified six NKAPL locus variants in a single haplotype block showing association with p≤5.6×10−8 in the combined Canadian cohort. Among these SNPs, rs35656932 in the zinc finger 193 gene and rs13208096 in the NKAPL gene remained significant after conditional logistic regression, contributed independently to risk for disease, and were replicated in the US cohort (Pcomb=4.24×10−10 and 2.44×10−9, respectively). These associations remained significant after conditioning on SNPs tagging the HLA-shared epitope (SE) DRB1*0401 allele and were significantly stronger in the HLA-SE negative versus positive subgroup, with a significant negative interaction apparent between HLA-DRB1 SE and NKAPL risk alleles.ConclusionsBy illuminating additional NKAPL variants with highly significant effects on risk that are distinct from, but interactive with those arising from the HLA-DRB1 locus, our data conclusively identify NKAPL as an RA susceptibility locus.

Download Full-text