On the use of GBLUP and its extension for GWAS with additive and epistatic effects

Abstract Genomic best linear unbiased prediction (GBLUP) is the most widely used model for genome-wide predictions. Interestingly, it is also possible to perform genome-wide association studies (GWAS) based on GBLUP. Although the estimated marker effects in GBLUP are shrunken and the conventional test based on such effects has low power, it was observed that a modified test statistic can be produced and the result of test was identical to a standard GWAS model. Later, a mathematical proof was given for the special case that there is no fixed covariate in GBLUP. Since then, the new approach has been called “GWAS by GBLUP”. Nevertheless, covariates such as environmental and subpopulation effects are very common in GBLUP. Thus, it is necessary to confirm the equivalence in the general case. Recently, the concept was generalized to GWAS for epistatic effects and the new approach was termed rapid epistatic mixed-model association analysis (REMMA) because it greatly improved the computational efficiency. However, the relationship between REMMA and the standard GWAS model has not been investigated. In this study, we first provided a general mathematical proof of the equivalence between” GWAS by GBLUP” and the standard GWAS model for additive effects. Then, we compared REMMA with the standard GWAS model for epistatic effects by a theoretical investigation and by empirical data analyses. We hypothesized that the similarity of the two models is influenced by the relative contribution of additive and epistatic effects to the phenotypic variance, which was verified by empirical and simulation studies.

Download Full-text

A new approach of dissecting genetic effects for complex traits

10.1101/2020.10.16.336180 ◽

2020 ◽

Cited By ~ 1

Author(s):

Meng Luo ◽

Shiliang Gu

Keyword(s):

Population Structure ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Computationally Efficient ◽

New Approach ◽

Genome Wide ◽

Outbred Mice

AbstractDuring the past decades, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits included in humans, animals, and plants. All common genome-wide association (GWA) methods rely on population structure correction to avoid false genotype and phenotype associations. However, population structure correction is a stringent penalization, which also impedes the identification of real associations. Here, we used recent statistical advances and proposed iterative screen regression (ISR), which enables simultaneous multiple marker associations and shown to appropriately correction population stratification and cryptic relatedness in GWAS. Results from analyses of simulated suggest that the proposed ISR method performed well in terms of power (sensitivity) versus FDR (False Discovery Rate) and specificity, also less bias (higher accuracy) in effect (PVE) estimation than the existing multi-loci (mixed) model and the single-locus (mixed) model. We also show the practicality of our approach by applying it to rice, outbred mice, and A.thaliana datasets. It identified several new causal loci that other methods did not detect. Our ISR provides an alternative for multi-loci GWAS, and the implementation was computationally efficient, analyzing large datasets practicable (n>100,000).

Download Full-text

Genome-wide association studies identified loci contribute to phenotypic variance of gastric cancer

Gut ◽

10.1136/gutjnl-2017-315230 ◽

2017 ◽

Vol 67 (7) ◽

pp. 1366-1368 ◽

Cited By ~ 1

Author(s):

Caiwang Yan ◽

Meng Zhu ◽

Tongtong Huang ◽

Fei Yu ◽

Guangfu Jin

Keyword(s):

Gastric Cancer ◽

Association Studies ◽

Genome Wide Association ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

297 GWAS for complex models accounting for populations structure with GBLUP and ssGBLUP

Journal of Animal Science ◽

10.1093/jas/skaa278.057 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 32-32

Author(s):

Juan P Steibel ◽

Ignacio Aguilar

Keyword(s):

Hypothesis Testing ◽

Large Scale ◽

Mixed Model ◽

Prediction Models ◽

Association Studies ◽

Least Square ◽

Type I ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Formal Hypothesis Testing

Abstract Genomic Best Linear Unbiased Prediction (GBLUP) is the method of choice for incorporating genomic information into the genetic evaluation of livestock species. Furthermore, single step GBLUP (ssGBLUP) is adopted by many breeders’ associations and private entities managing large scale breeding programs. While prediction of breeding values remains the primary use of genomic markers in animal breeding, a secondary interest focuses on performing genome-wide association studies (GWAS). The goal of GWAS is to uncover genomic regions that harbor variants that explain a large proportion of the phenotypic variance, and thus become candidates for discovering and studying causative variants. Several methods have been proposed and successfully applied for embedding GWAS into genomic prediction models. Most methods commonly avoid formal hypothesis testing and resort to estimation of SNP effects, relying on visual inspection of graphical outputs to determine candidate regions. However, with the advent of high throughput phenomics and transcriptomics, a more formal testing approach with automatic discovery thresholds is more appealing. In this work we present the methodological details of a method for performing formal hypothesis testing for GWAS in GBLUP models. First, we present the method and its equivalencies and differences with other GWAS methods. Moreover, we demonstrate through simulation analyses that the proposed method controls type I error rate at the nominal level. Second, we demonstrate two possible computational implementations based on mixed model equations for ssGBLUP and based on the generalized least square equations (GLS). We show that ssGBLUP can deal with datasets with extremely large number of animals and markers and with multiple traits. GLS implementations are well suited for dealing with smaller number of animals with tens of thousands of phenotypes. Third, we show several useful extensions, such as: testing multiple markers at once, testing pleiotropic effects and testing association of social genetic effects.

Download Full-text

Genome-wide association study reveals candidate genes for flowering time in cowpea (Vigna unguiculata [L.] Walp)

10.1101/2021.04.01.438123 ◽

2021 ◽

Author(s):

Dev Paudel ◽

Rocheteau Dareus ◽

Julia Rosenwald ◽

Maria Munoz-Amatriain ◽

Esteban Rios

Keyword(s):

Flowering Time ◽

Candidate Genes ◽

Vigna Unguiculata ◽

Association Studies ◽

Snp Markers ◽

Genome Wide Association ◽

Human Consumption ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Genome Wide

Cowpea (Vigna unguiculata [L.] Walp., diploid, 2n = 22) is a major crop used as a protein source for human consumption as well as a quality feed for livestock. It is drought and heat tolerant and has been bred to develop varieties that are resilient to changing climates. Plant adaptation to new climates and their yield are strongly affected by flowering time. Therefore, understanding the genetic basis of flowering time is critical to advance cowpea breeding. The aim of this study was to perform genome-wide association studies (GWAS) to identify marker trait associations for flowering time in cowpea using single nucleotide polymorphism (SNP) markers. A total of 367 accessions from a cowpea mini-core collection were evaluated in Ft. Collins, CO in 2019 and 2020, and 292 accessions were evaluated in Citra, FL in 2018. These accessions were genotyped using the Cowpea iSelect Consortium Array that contained 51,128 SNPs. GWAS revealed seven reliable SNPs for flowering time that explained 8-12% of the phenotypic variance. Candidate genes including FT, GI, CRY2, LSH3, UGT87A2, LIF2, and HTA9 that are associated with flowering time were identified for the significant SNP markers. Further efforts to validate these loci will help to understand their role in flowering time in cowpea, and it could facilitate the transfer of some of this knowledge to other closely related legume species.

Download Full-text

GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

10.1101/783100 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jan A. Freudenthal ◽

Markus J. Ankenbrand ◽

Dominik G. Grimm ◽

Arthur Korte

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Large Datasets ◽

Genome Wide Association ◽

Small Data ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Non Gaussian

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.

Download Full-text

An approach to gene-based testing accounting for dependence of tests among nearby genes

10.1101/2021.05.24.445494 ◽

2021 ◽

Author(s):

Ronald J Yurko ◽

Kathryn Roeder ◽

Bernie Devlin ◽

Max G'Sell

Keyword(s):

Multiple Testing ◽

Association Studies ◽

Autism Spectrum ◽

P Value ◽

Genome Wide Association Studies ◽

Strongly Correlated ◽

Test Statistics ◽

Test Statistic ◽

Genome Wide ◽

Insight Into

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

Methods ◽

10.1016/j.ymeth.2018.04.021 ◽

2018 ◽

Vol 145 ◽

pp. 2-9 ◽

Cited By ~ 1

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

Genome-Wide Association Studies Reveal Susceptibility Loci for Digital Dermatitis in Holstein Cattle

Animals ◽

10.3390/ani10112009 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2009

Author(s):

Ellen Lai ◽

Alexa L. Danner ◽

Thomas R. Famula ◽

Anita M. Oberbauer

Keyword(s):

Predictive Value ◽

Mixed Model ◽

Linear Mixed Model ◽

Bos Taurus ◽

Association Studies ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Digital Dermatitis ◽

Genome Wide

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.

Download Full-text

Heritability jointly explained by host genotype and microbiome: will improve traits prediction?

Briefings in Bioinformatics ◽

10.1093/bib/bbaa175 ◽

2020 ◽

Author(s):

Denis Awany ◽

Emile R Chimusa

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Heritability Estimate ◽

Substantial Part ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Host Genotype ◽

Genome Wide ◽

Heritability Estimation

Abstract As we observe the $70$th anniversary of the publication by Robertson that formalized the notion of ‘heritability’, geneticists remain puzzled by the problem of missing/hidden heritability, where heritability estimates from genome-wide association studies (GWASs) fall short of that from twin-based studies. Many possible explanations have been offered for this discrepancy, including existence of genetic variants poorly captured by existing arrays, dominance, epistasis and unaccounted-for environmental factors; albeit these remain controversial. We believe a substantial part of this problem could be solved or better understood by incorporating the host’s microbiota information in the GWAS model for heritability estimation and may also increase human traits prediction for clinical utility. This is because, despite empirical observations such as (i) the intimate role of the microbiome in many complex human phenotypes, (ii) the overlap between genetic variants associated with both microbiome attributes and complex diseases and (iii) the existence of heritable bacterial taxa, current GWAS models for heritability estimate do not take into account the contributory role of the microbiome. Furthermore, heritability estimate from twin-based studies does not discern microbiome component of the observed total phenotypic variance. Here, we summarize the concept of heritability in GWAS and microbiome-wide association studies, focusing on its estimation, from a statistical genetics perspective. We then discuss a possible statistical method to incorporate the microbiome in the estimation of heritability in host GWAS.

Download Full-text