A mixed model reduces spurious genetic associations produced by population stratification in genome-wide association studies

Abstract Background Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. Results In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. Conclusions After validating the performance of our method using simulation experiments, we further apply it to Alzheimer’s disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer’s disease.

Download Full-text

A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2017-0054 ◽

2018 ◽

Vol 17 (6) ◽

Cited By ~ 2

Author(s):

Huaqing Zhao ◽

Nandita Mitra ◽

Peter A. Kanetsky ◽

Katherine L. Nathanson ◽

Timothy R. Rebbeck

Keyword(s):

Principal Components ◽

Population Stratification ◽

Propensity Scores ◽

Association Studies ◽

Germ Cell Tumors ◽

Gwas Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Testicular Germ Cell ◽

Genome Wide

Abstract Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.

Download Full-text

GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

10.1101/783100 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jan A. Freudenthal ◽

Markus J. Ankenbrand ◽

Dominik G. Grimm ◽

Arthur Korte

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Large Datasets ◽

Genome Wide Association ◽

Small Data ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Non Gaussian

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

Methods ◽

10.1016/j.ymeth.2018.04.021 ◽

2018 ◽

Vol 145 ◽

pp. 2-9 ◽

Cited By ~ 1

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

Genome-Wide Association Studies Reveal Susceptibility Loci for Digital Dermatitis in Holstein Cattle

Animals ◽

10.3390/ani10112009 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2009

Author(s):

Ellen Lai ◽

Alexa L. Danner ◽

Thomas R. Famula ◽

Anita M. Oberbauer

Keyword(s):

Predictive Value ◽

Mixed Model ◽

Linear Mixed Model ◽

Bos Taurus ◽

Association Studies ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Digital Dermatitis ◽

Genome Wide

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.

Download Full-text

Genetic Associations With Serum Total IgE: A Meta-Analysis Of Genome-Wide Association Studies From North American Population Groups

10.1164/ajrccm-conference.2012.185.1_meetingabstracts.a4893 ◽

2012 ◽

Author(s):

Albert M. Levin ◽

Rasika A. Mathias ◽

Lili Huang ◽

Mao Yang ◽

Kathleen C. Barnes ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

North American Population ◽

Genome Wide Association Studies ◽

American Population ◽

Genetic Associations ◽

Population Groups ◽

Genome Wide ◽

Serum Total Ige

Download Full-text

Genetic associations with radiological damage in rheumatoid arthritis: Meta-analysis of seven genome-wide association studies of 2,775 cases

PLoS ONE ◽

10.1371/journal.pone.0223246 ◽

2019 ◽

Vol 14 (10) ◽

pp. e0223246 ◽

Cited By ~ 2

Author(s):

Matthew Traylor ◽

Rachel Knevel ◽

Jing Cui ◽

John Taylor ◽

Westra Harm-Jan ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide ◽

Radiological Damage

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217687 ◽

2017 ◽

Cited By ~ 9

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

Accounting for Population Stratification in Practice: A Comparison of the Main Strategies Dedicated to Genome-Wide Association Studies

PLoS ONE ◽

10.1371/journal.pone.0028845 ◽

2011 ◽

Vol 6 (12) ◽

pp. e28845 ◽

Cited By ~ 35

Author(s):

Matthieu Bouaziz ◽

Christophe Ambroise ◽

Mickael Guedj

Keyword(s):

Population Stratification ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Challenges and progress in interpretation of non-coding genetic variants associated with human disease

Experimental Biology and Medicine ◽

10.1177/1535370217713750 ◽

2017 ◽

Vol 242 (13) ◽

pp. 1325-1334 ◽

Cited By ~ 19

Author(s):

Yizhou Zhu ◽

Cagdas Tazearslan ◽

Yousin Suh

Keyword(s):

Molecular Mechanisms ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Functional Interpretation ◽

Genome Wide ◽

Coding Variants

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.

Download Full-text