scholarly journals RL-SKAT: An exact and efficient score test for heritability and set tests

2017 ◽  
Author(s):  
Regev Schweiger ◽  
Omer Weissbrod ◽  
Elior Rahmani ◽  
Martina Müller-Nurasyid ◽  
Sonja Kunze ◽  
...  

AbstractTesting for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of p-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n=13,950) study, and in particular when the individuals in the sample are unrelated. In these cases the SKAT approximation tends to be highly over-conservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact p-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.

2008 ◽  
Vol 90 (4) ◽  
pp. 363-374 ◽  
Author(s):  
SUZANNE J. ROWE ◽  
PONG-WONG RICARDO ◽  
CHRISTOPHER S. HALEY ◽  
SARA A. KNOTT ◽  
DIRK-JAN DE KONING

SummaryDominance is an important source of variation in complex traits. Here, we have carried out the first thorough investigation of quantitative trait locus (QTL) detection using variance component (VC) models extended to incorporate both additive and dominant QTL effects. Simulation results showed that the empirical distribution of the test statistic when testing for dominant QTL effects did not behave in accordance with existing theoretical expectations and varied with pedigree structure. Extensive simulations were carried out to assess accuracy of estimates, type 1 error and statistical power in two-generation human-, poultry- and pig-type pedigrees each with 1900 progeny in small-, medium- and large-sized families, respectively. The distribution of the likelihood-ratio test statistic was heavily dependent on family structure, with empirical thresholds lower for human pedigrees. Power to detect QTL was high (0·84–1·0) in pig and poultry scenarios for dominance effects accounting for >7% of phenotypic variance but much lower (0·42) in human-type pedigrees. Maternal or common environment effects can be partially confounded with dominance and must be fitted in the QTL model. Including dominance in the QTL model did not affect power to detect additive QTL effects. Also, detection of spurious dominance QTL effects only occurred when maternal effects were not included in the QTL model. When dominance effects were present in the data but not in the analysis model, this resulted in spurious detection of additive QTL or inflated estimates of additive QTL effects. The study demonstrates that dominance can be included routinely in QTL analysis of general pedigrees; however, optimal power is dependent on selection of the appropriate thresholds for pedigree structure.


Author(s):  
Jie Zhang ◽  
Fang Liu ◽  
Jochen C Reif ◽  
Yong Jiang

Abstract Genomic best linear unbiased prediction (GBLUP) is the most widely used model for genome-wide predictions. Interestingly, it is also possible to perform genome-wide association studies (GWAS) based on GBLUP. Although the estimated marker effects in GBLUP are shrunken and the conventional test based on such effects has low power, it was observed that a modified test statistic can be produced and the result of test was identical to a standard GWAS model. Later, a mathematical proof was given for the special case that there is no fixed covariate in GBLUP. Since then, the new approach has been called “GWAS by GBLUP”. Nevertheless, covariates such as environmental and subpopulation effects are very common in GBLUP. Thus, it is necessary to confirm the equivalence in the general case. Recently, the concept was generalized to GWAS for epistatic effects and the new approach was termed rapid epistatic mixed-model association analysis (REMMA) because it greatly improved the computational efficiency. However, the relationship between REMMA and the standard GWAS model has not been investigated. In this study, we first provided a general mathematical proof of the equivalence between” GWAS by GBLUP” and the standard GWAS model for additive effects. Then, we compared REMMA with the standard GWAS model for epistatic effects by a theoretical investigation and by empirical data analyses. We hypothesized that the similarity of the two models is influenced by the relative contribution of additive and epistatic effects to the phenotypic variance, which was verified by empirical and simulation studies.


2018 ◽  
Vol 28 (10-11) ◽  
pp. 2937-2951
Author(s):  
James J Yang ◽  
Elisa M Trucco ◽  
Anne Buu

Permutation tests are very useful when parametric assumptions are violated or distributions of test statistics are mathematically intractable. The major advantage of permutation tests is that the procedure is so general that it is applicable to most test statistics. The computational expense is, however, impractical in high-dimensional settings such as genomewide association studies. This study provides a comprehensive review of existing methods that can compute very small p-values efficiently. A common issue with existing methods is that they can only be applied to a specific test statistic. To fill in the knowledge gap, we propose a hybrid method of the sequential Monte Carlo and the Edgeworth expansion approximation for a studentized statistic, which is applicable to a variety of test statistics. The simulation results show that the proposed method performs better than competing methods. Furthermore, applications of the proposed method are demonstrated by statistical analysis on the genomewide association studies data from the Study of Addiction: Genetics and Environment (SAGE).


Author(s):  
Jiabin Zhou ◽  
Shitao Li ◽  
Ying Zhou ◽  
Xiaona Sheng

Abstract Identifying gene × environment (G × E) interactions, especially when rare variants are included in genome-wide association studies, is a major challenge in statistical genetics. However, the detection of G × E interactions is very important for understanding the etiology of complex diseases. Although currently some statistical methods have been developed to detect the interactions between genes and environment, the detection of the interactions for the case of rare variants is still limited. Therefore, it is particularly important to develop a new method to detect the interactions between genes and environment for rare variants. In this paper, we extend an existing method of adaptive combination of P-values (ADA) and design a novel strategy (called iSADA) for testing the effects of G × E interactions for rare variants. We propose a new two-stage test to detect the interactions between genes and environment in a certain region of a chromosome or even for the whole genome. First, the score statistic is used to test the associations between trait value and the interaction terms of genes and environment and obtain the original P-values. Then, based on the idea of the ADA method, we further construct a full test statistic via the P-values of the preliminary tests in the first stage, so that we can comprehensively test the interactions between genes and environment in the considered genome region. Simulation studies are conducted to compare our proposed method with other existing methods. The results show that the iSADA has higher power than other methods in each case. A GAW17 data set is also applied to illustrate the applicability of the new method.


2001 ◽  
Vol 26 (2) ◽  
pp. 133-152 ◽  
Author(s):  
Johannes Berkhof ◽  
Tom A. B. Snijders

Available variance component tests are reviewed and three new score tests are presented. In the first score test, the asymptotic normal distribution of the test statistic is used as a reference distribution. In the other two score tests, a Satterthwaite approximation is used for the null distribution of the test statistic. We evaluate the performance of the score tests and other available tests by means of a Monte Carlo study. The new tests are computationally relatively cheap and have good power properties.


2019 ◽  
Vol 27 (9) ◽  
pp. 1445-1455 ◽  
Author(s):  
Ron Nudel ◽  
Michael E. Benros ◽  
Morten Dybdahl Krebs ◽  
Rosa Lundbye Allesøe ◽  
Camilla Koldbæk Lemvigh ◽  
...  

AbstractHuman leukocyte antigen (HLA) genes encode proteins with important roles in the regulation of the immune system. Many studies have also implicated HLA genes in psychiatric and neurodevelopmental disorders. However, these studies usually focus on one disorder and/or on one HLA candidate gene, often with small samples. Here, we access a large dataset of 65,534 genotyped individuals consisting of controls (N = 19,645) and cases having one or more of autism spectrum disorder (N = 12,331), attention deficit hyperactivity disorder (N = 14,397), schizophrenia (N = 2401), bipolar disorder (N = 1391), depression (N = 18,511), anorexia (N = 2551) or intellectual disability (N = 3175). We imputed participants’ HLA alleles to investigate the involvement of HLA genes in these disorders using regression models. We found a pronounced protective effect of DPB1*1501 on susceptibility to autism (p = 0.0094, OR = 0.72) and intellectual disability (p = 0.00099, OR = 0.41), with an increased protective effect on a comorbid diagnosis of both disorders (p = 0.003, OR = 0.29). We also identified a risk allele for intellectual disability, B*5701 (p = 0.00016, OR = 1.33). Associations with both alleles survived FDR correction and a permutation procedure. We did not find significant evidence for replication of previously-reported associations for autism or schizophrenia. Our results support an implication of HLA genes in autism and intellectual disability, which requires replication by other studies. Our study also highlights the importance of large sample sizes in HLA association studies.


Gut ◽  
2017 ◽  
Vol 67 (7) ◽  
pp. 1366-1368 ◽  
Author(s):  
Caiwang Yan ◽  
Meng Zhu ◽  
Tongtong Huang ◽  
Fei Yu ◽  
Guangfu Jin

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 32-32
Author(s):  
Juan P Steibel ◽  
Ignacio Aguilar

Abstract Genomic Best Linear Unbiased Prediction (GBLUP) is the method of choice for incorporating genomic information into the genetic evaluation of livestock species. Furthermore, single step GBLUP (ssGBLUP) is adopted by many breeders’ associations and private entities managing large scale breeding programs. While prediction of breeding values remains the primary use of genomic markers in animal breeding, a secondary interest focuses on performing genome-wide association studies (GWAS). The goal of GWAS is to uncover genomic regions that harbor variants that explain a large proportion of the phenotypic variance, and thus become candidates for discovering and studying causative variants. Several methods have been proposed and successfully applied for embedding GWAS into genomic prediction models. Most methods commonly avoid formal hypothesis testing and resort to estimation of SNP effects, relying on visual inspection of graphical outputs to determine candidate regions. However, with the advent of high throughput phenomics and transcriptomics, a more formal testing approach with automatic discovery thresholds is more appealing. In this work we present the methodological details of a method for performing formal hypothesis testing for GWAS in GBLUP models. First, we present the method and its equivalencies and differences with other GWAS methods. Moreover, we demonstrate through simulation analyses that the proposed method controls type I error rate at the nominal level. Second, we demonstrate two possible computational implementations based on mixed model equations for ssGBLUP and based on the generalized least square equations (GLS). We show that ssGBLUP can deal with datasets with extremely large number of animals and markers and with multiple traits. GLS implementations are well suited for dealing with smaller number of animals with tens of thousands of phenotypes. Third, we show several useful extensions, such as: testing multiple markers at once, testing pleiotropic effects and testing association of social genetic effects.


Sign in / Sign up

Export Citation Format

Share Document