1,000x Faster Than PLINK: Genome-Wide Epistasis Detection with Logistic Regression Using Combined FPGA and GPU Accelerators

Author(s):  
Lars Wienbrandt ◽  
Jan Christian Kässens ◽  
Matthias Hübenthal ◽  
David Ellinghaus
Stroke ◽  
2013 ◽  
Vol 44 (suppl_1) ◽  
Author(s):  
May M Luke ◽  
Carmen H Tong ◽  
Joseph J Catanese ◽  
James J Devlin ◽  
Christine Mannhalter ◽  
...  

Introduction The International Stroke Genetics Consortium (ISGC) and the Wellcome Trust Case Control Consortium 2 (WTCCC2) performed a large genome wide association study of ischemic stroke and its subtypes (large vessel stroke (LVD), small vessel stroke (SVD), cardioembolic stroke (CE)), and identified a polymorphism in HDAC9 (rs11984041) associated with the LVD subtype of ischemic stroke. Hypothesis We assessed the hypothesis that rs11984041 is associated with LVD in two additional studies. Methods The genotype of rs11984041 was determined for participants of the Vienna Study (815 controls, 122 LVD, 165 SVD, 202 CE) and of the German Study (1040 controls, 495 LVD, 230 SVD, 462 CE). The association of rs11984041 with LVD was assessed by logistic regression. Heterogeneity of the effect of rs11984041 on LVD, CE or SVD was assessed by testing the equality of the corresponding regression coefficients from a multinomial logistic regression model. Results Carriers of the minor (T) allele of rs11984041 (23.3% of LVD cases and 17.4% of controls), compared with noncarriers, had increased risk for LVD: the odds ratios (OR) were 1.92 (95%CI 1.25-2.96) for the Vienna Study and 1.33 (95%CI 1.02-1.74) for the German Study. Adjusting for covariates including sex, age, diabetes, and hypertension did not materially change the ORs. Heterogeneity of the effects of rs11984041 on LVD vs CE was significant in the Vienna Study (p = 0.009) and in the German Study (p = 0.005). Heterogeneity of the effects of rs11984041 on LVD vs SVD trended toward significance in the Vienna Study (p = 0.088) and was significant in the German Study (p = 0.047). Adjusting for covariates did not materially change the heterogeneity test p values. Conclusions The HDAC9 polymorphism rs11984041 was associated with the LVD stroke subtype in the Vienna Study and the German Study. These results replicated the ISGC/WTCCC2 findings.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jacqueline Milet ◽  
David Courtin ◽  
André Garcia ◽  
Hervé Perdry

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).


2020 ◽  
Author(s):  
Jacqueline Milet ◽  
Hervé Perdry

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Vol 28 (6) ◽  
pp. 1781-1792
Author(s):  
Flora Alarcon ◽  
Gregory Nuel

Detecting gene-environment (G × E) interactions in the context of genome-wide association studies (GWAS) is a challenging problem since standard methods generally present a lack of power. An additional difficulty arises from the fact that the causal exposure is seldom observed and only a proxy of this exposure is observed. This leads to an additional drop in terms of power and it explains the failure of standard methods in detecting interactions, even very strong ones. In this article, we consider the latent exposure as a source of heterogeneity and we propose a new powerful method, named “Breakpoint Model for Logistic Regression” (BMLR), based on a breakpoint model, in order to detect G × E interactions when causal exposure is unobserved. First, the BMLR method is compared to the ordered-subset analysis for case-control method, which has been developed for the same purpose, through simulations. This highlights the ability of BMLR to detect the heterogeneity, and therefore, to detect interaction with latent exposure. Finally, the BMLR method is compared to standard methods, such as Plink, to perform a GWAS on a published realistic benchmark.


2009 ◽  
Vol 25 (6) ◽  
pp. 714-721 ◽  
Author(s):  
Tong Tong Wu ◽  
Yi Fang Chen ◽  
Trevor Hastie ◽  
Eric Sobel ◽  
Kenneth Lange

2012 ◽  
Vol 72 (7) ◽  
pp. 1249-1254 ◽  
Author(s):  
Gang Xie ◽  
Yue Lu ◽  
Ye Sun ◽  
Steven Shiyang Zhang ◽  
Edward Clark Keystone ◽  
...  

ObjectiveTo fine-map the NF-κB activating protein-like (NKAPL) locus identified in a prior genome-wide study as a possible rheumatoid arthritis (RA) risk locus and thereby delineate additional variants with stronger and/or independent disease association.MethodsGenotypes for 101 SNPs across the NKAPL locus on chromosome 6p22.1 were obtained on 1368 Canadian RA cases and 1471 controls. Single marker associations were examined using logistic regression and the most strongly associated NKAPL locus SNPs then typed in another Canadian and a US-based RA case/control cohort.ResultsFine-mapping analyses identified six NKAPL locus variants in a single haplotype block showing association with p≤5.6×10−8 in the combined Canadian cohort. Among these SNPs, rs35656932 in the zinc finger 193 gene and rs13208096 in the NKAPL gene remained significant after conditional logistic regression, contributed independently to risk for disease, and were replicated in the US cohort (Pcomb=4.24×10−10 and 2.44×10−9, respectively). These associations remained significant after conditioning on SNPs tagging the HLA-shared epitope (SE) DRB1*0401 allele and were significantly stronger in the HLA-SE negative versus positive subgroup, with a significant negative interaction apparent between HLA-DRB1 SE and NKAPL risk alleles.ConclusionsBy illuminating additional NKAPL variants with highly significant effects on risk that are distinct from, but interactive with those arising from the HLA-DRB1 locus, our data conclusively identify NKAPL as an RA susceptibility locus.


Sign in / Sign up

Export Citation Format

Share Document