scholarly journals 297 GWAS for complex models accounting for populations structure with GBLUP and ssGBLUP

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 32-32
Author(s):  
Juan P Steibel ◽  
Ignacio Aguilar

Abstract Genomic Best Linear Unbiased Prediction (GBLUP) is the method of choice for incorporating genomic information into the genetic evaluation of livestock species. Furthermore, single step GBLUP (ssGBLUP) is adopted by many breeders’ associations and private entities managing large scale breeding programs. While prediction of breeding values remains the primary use of genomic markers in animal breeding, a secondary interest focuses on performing genome-wide association studies (GWAS). The goal of GWAS is to uncover genomic regions that harbor variants that explain a large proportion of the phenotypic variance, and thus become candidates for discovering and studying causative variants. Several methods have been proposed and successfully applied for embedding GWAS into genomic prediction models. Most methods commonly avoid formal hypothesis testing and resort to estimation of SNP effects, relying on visual inspection of graphical outputs to determine candidate regions. However, with the advent of high throughput phenomics and transcriptomics, a more formal testing approach with automatic discovery thresholds is more appealing. In this work we present the methodological details of a method for performing formal hypothesis testing for GWAS in GBLUP models. First, we present the method and its equivalencies and differences with other GWAS methods. Moreover, we demonstrate through simulation analyses that the proposed method controls type I error rate at the nominal level. Second, we demonstrate two possible computational implementations based on mixed model equations for ssGBLUP and based on the generalized least square equations (GLS). We show that ssGBLUP can deal with datasets with extremely large number of animals and markers and with multiple traits. GLS implementations are well suited for dealing with smaller number of animals with tens of thousands of phenotypes. Third, we show several useful extensions, such as: testing multiple markers at once, testing pleiotropic effects and testing association of social genetic effects.

2017 ◽  
Author(s):  
Wei Zhou ◽  
Jonas B. Nielsen ◽  
Lars G. Fritsche ◽  
Rounak Dey ◽  
Maiken E. Gabrielsen ◽  
...  

AbstractIn genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly – producing large type I error rates – in the analysis of phenotypes with unbalanced case-control ratios. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 white British European-ancestry samples for >1400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.


Author(s):  
Myles Lewis ◽  
Tim Vyse

The advent of genome-wide association studies (GWAS) has been an exciting breakthrough in our understanding of the genetic aetiology of autoimmune diseases. Substantial overlap has been found in susceptibility genes across multiple diseases, from connective tissue diseases and rheumatoid arthritis (RA) to inflammatory bowel disease, coeliac disease, and psoriasis. Major technological advances now permit genotyping of millions of single nucleotide polymorphisms (SNPs). Group analysis of SNPs by haplotypes, aided by completion of the Hapmap project, has improved our ability to pinpoint causal genetic variants. International collaboration to pool large-scale cohorts of patients has enabled GWAS in systemic lupus erythematosus (SLE), systemic sclerosis and Behçet's disease, with studies in progress for ANCA-associated vasculitis. These 'hypothesis-free' studies have revealed many novel disease-associated genes. In both SLE and systemic sclerosis, identified genes map to known pathways including antigen presentation (MHC, TNFSF4), autoreactivity of B and T lymphocytes (BLK, BANK1), type I interferon production (STAT4, IRF5) and the NFκ‎B pathway (TNIP1). In SLE alone, additional genes appear to be involved in dysregulated apoptotic cell clearance (ITGAM, TREX1, C1q, C4) and recognition of immune complexes (FCGR2A, FCGR3B). Future developments include whole-genome sequencing to identify rare variants, and efforts to understand functional consequences of susceptibility genes. Putative environmental triggers for connective tissue diseases include infectious agents, especially Epstein-Barr virus; cigarette smoking; occupational exposure to toxins including silica; and low vitamin D, due to its immunomodulatory effects. Despite numerous studies looking at toxin exposure and connective tissue diseases, conclusive evidence is lacking, due to either rarity of exposure or rarity of disease.


2015 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
John A. Woolliams ◽  
Albert Tenesa

Genome-wide association studies (GWAS) promised to translate their findings into clinically beneficial improvements of patient management by tailoring disease management to the individual through the prediction of disease risk. However, the ability to translate genetic findings from GWAS into predictive tools that are of clinical utility and which may inform clinical practice has, so far, been encouraging but limited. Here we propose to use a more powerful statistical approach that enables the prediction of multiple medically relevant phenotypes without the costs associated with developing a genetic test for each of them. As a proof of principle, we used a common panel of 319,038 SNPs to train the prediction models in 114,264 unrelated White-British for height and four obesity related traits (body mass index, basal metabolic rate, body fat percentage, and waist-to-hip ratio). We obtained prediction accuracies that ranged between 46% and 75% of the maximum achievable given their explained heritable component. This represents an improvement of up to 75% over the phenotypic variance explained by the predictors developed through large collaborations, which used more than twice as many training samples. Across-population predictions in White non-British individuals were similar to those of White-British whilst those in Asian and Black individuals were informative but less accurate. The genotyping of circa 500,000 UK Biobank participants will yield predictions ranging between 66% and 83% of the maximum. We anticipate that our models and a common panel of genetic markers, which can be used across multiple traits and diseases, will be the starting point to tailor disease management to the individual. Ultimately, we will be able to capitalise on whole-genome sequence and environmental risk factors to realise the full potential of genomic medicine.


Author(s):  
Greg Dyson ◽  
Charles F. Sing

AbstractWe have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are under-estimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome-wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models.


2021 ◽  
Author(s):  
Suyash S Shringarpure ◽  
Wei Wang ◽  
Yunxuan Jiang ◽  
Alison Acevedo ◽  
Devika Dhamija ◽  
...  

A key challenge in the study of rare disease genetics is assembling large case cohorts for well- powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.


2020 ◽  
Author(s):  
Wenjian Bi ◽  
Wei Zhou ◽  
Rounak Dey ◽  
Bhramar Mukherjee ◽  
Joshua N Sampson ◽  
...  

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.


Author(s):  
Jie Zhang ◽  
Fang Liu ◽  
Jochen C Reif ◽  
Yong Jiang

Abstract Genomic best linear unbiased prediction (GBLUP) is the most widely used model for genome-wide predictions. Interestingly, it is also possible to perform genome-wide association studies (GWAS) based on GBLUP. Although the estimated marker effects in GBLUP are shrunken and the conventional test based on such effects has low power, it was observed that a modified test statistic can be produced and the result of test was identical to a standard GWAS model. Later, a mathematical proof was given for the special case that there is no fixed covariate in GBLUP. Since then, the new approach has been called “GWAS by GBLUP”. Nevertheless, covariates such as environmental and subpopulation effects are very common in GBLUP. Thus, it is necessary to confirm the equivalence in the general case. Recently, the concept was generalized to GWAS for epistatic effects and the new approach was termed rapid epistatic mixed-model association analysis (REMMA) because it greatly improved the computational efficiency. However, the relationship between REMMA and the standard GWAS model has not been investigated. In this study, we first provided a general mathematical proof of the equivalence between” GWAS by GBLUP” and the standard GWAS model for additive effects. Then, we compared REMMA with the standard GWAS model for epistatic effects by a theoretical investigation and by empirical data analyses. We hypothesized that the similarity of the two models is influenced by the relative contribution of additive and epistatic effects to the phenotypic variance, which was verified by empirical and simulation studies.


2014 ◽  
Author(s):  
Minsun Song ◽  
Wei Hao ◽  
John D. Storey

We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as that measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a genotype-conditional association test (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and environmental contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed model and principal component approaches.


Sign in / Sign up

Export Citation Format

Share Document