scholarly journals Genotype-covariate correlation and interaction disentangled by a whole-genome multivariate reaction norm model

2018 ◽  
Author(s):  
Guiyan Ni ◽  
Julius van der Werf ◽  
Xuan Zhou ◽  
Elina Hyppönen ◽  
Naomi R. Wray ◽  
...  

ABSTRACTThe genomics era has brought useful tools to dissect the genetic architecture of complex traits. We propose a reaction norm model (RNM) to tackle genotype-environment correlation and interaction problems in the context of genome-wide association analyses of complex traits. In our approach, an environmental risk factor affecting the trait of interest can be modeled as dependent on a continuous covariate that is itself regulated by genetic as well as environmental factors. Our multivariate RNM approach allows the joint modelling of the relation between the genotype (G) and the covariate (C), so that both their correlation (association) and interaction (effect modification) can be estimated. Hence we jointly estimate genotype-covariate correlation and interaction (GCCI). We demonstrate using simulation that the proposed multivariate RNM performs better than the current state-of-the-art methods that ignore G-C correlation. We apply the method to data from the UK Biobank (N= 66,281) in analysis of body mass index using smoking quantity as a covariate. We find a highly significant G-C correlation, but a negligible G-C interaction. In contrast, when a conventional G-C interaction analysis is applied (i.e., G-C correlation is not included in the model), highly significant G-C interaction estimates are found. It is also notable that we find a significant heterogeneity in the estimated residual variances across different covariate levels probably due to residual-covariate interaction. Using simulation we also show that the residual variances estimated by genomic restricted maximum likelihood (GREML) or linkage disequilibrium score regression (LDSC) can be inflated in the presence of interactions, implying that the currently reported SNP-heritability estimates from these methods should be interpreted with caution. We conclude that it is essential to correctly account for both interaction and correlation in complex trait analyses and that the failure to do so may lead to substantial biases in inferences relating to genetic architecture of complex traits, including estimated SNP-heritability.

2018 ◽  
Author(s):  
Palle Duun Rohde ◽  
Izel Fourie Sørensen ◽  
Peter Sørensen

AbstractSummaryStudies of complex traits and diseases are strongly dependent on the availability of user-friendly software designed to handle large-scale genetic and phenotypic data. Here, we present the R package qgg, which provides an environment for large-scale genetic analyses of quantitative traits and disease phenotypes. The qgg package provides an infrastructure for efficient processing of large-scale genetic data and functions for estimating genetic parameters, performing single and multiple marker association analyses, and genomic-based predictions of phenotypes. In particular, we have developed novel predictive models that use information on functional features of the genome that enables more accurate predictions of complex trait phenotypes. We illustrates core facilities of the qgg package by analysing human standing height from the UK Biobank.Availability and implementationThe R package qgg is freely available. For latest updates, user guides and example scripts, consult the main page http://psoerensen.github.io/qgg/.


Author(s):  
Armin P. Schoech ◽  
Omer Weissbrod ◽  
Luke J. O’Connor ◽  
Nick Patterson ◽  
Huwenbo Shi ◽  
...  

AbstractMost models of complex trait genetic architecture assume that signed causal effect sizes of each SNP (defined with respect to the minor allele) are uncorrelated with those of nearby SNPs, but it is currently unknown whether this is the case. We develop a new method, autocorrelation LD regression (ACLR), for estimating the genome-wide autocorrelation of causal minor allele effect sizes as a function of genomic distance. Our method estimates these autocorrelations by regressing the products of summary statistics on distance-dependent LD scores. We determined that ACLR robustly assesses the presence or absence of nonzero autocorrelation, producing unbiased estimates with well-calibrated standard errors in null simulations regardless of genetic architecture; if true autocorrelation is nonzero, ACLR correctly detects its sign, although estimates of the autocorrelation magnitude are susceptible to bias in cases of certain genetic architectures. We applied ACLR to 31 diseases and complex traits from the UK Biobank (average N=331K), meta-analyzing results across traits. We determined that autocorrelations were significantly negative at distances of 1-50bp (P = 8 × 10−6, point estimate −0.35 ±0.08) and 50-100bp (P = 2 × 10−3, point estimate −0.33 ± 0.11). We show that the autocorrelation is primarily driven by pairs of SNPs in positive LD, which is consistent with the expectation that linked SNPs with opposite effects are less impacted by natural selection. Our findings suggest that this mechanism broadly affects complex trait genetic architectures, and we discuss implications for association mapping, heritability estimation, and genetic risk prediction.


2018 ◽  
Author(s):  
Carla Márquez-Luna ◽  
Steven Gazal ◽  
Po-Ru Loh ◽  
Samuel S. Kim ◽  
Nicholas Furlotte ◽  
...  

AbstractGenetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg N=373K) and samples of other European ancestries as validation data (avg N=22K), to minimize confounding. LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2=0.144; highest R2=0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N=1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Shuli Liu ◽  
Ying Yu ◽  
Shengli Zhang ◽  
John B. Cole ◽  
Albert Tenesa ◽  
...  

2017 ◽  
Vol 114 (32) ◽  
pp. 8602-8607 ◽  
Author(s):  
Loic Yengo ◽  
Zhihong Zhu ◽  
Naomi R. Wray ◽  
Bruce S. Weir ◽  
Jian Yang ◽  
...  

Quantifying the effects of inbreeding is critical to characterizing the genetic architecture of complex traits. This study highlights through theory and simulations the strengths and shortcomings of three SNP-based inbreeding measures commonly used to estimate inbreeding depression (ID). We demonstrate that heterogeneity in linkage disequilibrium (LD) between causal variants and SNPs biases ID estimates, and we develop an approach to correct this bias using LD and minor allele frequency stratified inference (LDMS). We quantified ID in 25 traits measured in ∼140,000 participants of the UK Biobank, using LDMS, and confirmed previously published ID for 4 traits. We find unique evidence of ID for handgrip strength, waist/hip ratio, and visual and auditory acuity (ID between −2.3 and −5.2 phenotypic SDs for complete inbreeding; P<0.001). Our results illustrate that a careful choice of the measure of inbreeding combined with LDMS stratification improves both detection and quantification of ID using SNP data.


2021 ◽  
Author(s):  
Zhixiu Li ◽  
Allan F McRae ◽  
Geng Wang ◽  
Jonathan J Ellis ◽  
Tony J Kenna ◽  
...  

Ankylosing Spondylitis (AS) is a highly heritable inflammatory arthritis which occurs more frequently in men than women. In their recent publication examining sex differences in the genetic aetiology of common complex traits and diseases, Bernabeu et al. (2021) observe differences in heritability of AS between sexes, and a genome-wide significant genotype by sex interaction in risk of AS at the major histocompatability (MHC) locus. The authors then present evidence suggesting that this genotype by sex interaction arises primarily as a result of differential expression of the gene MICA across the sexes in skeletal muscle tissue. Through a series of conditional association analyses in the UK Biobank, reanalysis of the GTEx gene expression resource and RNASeq experiments on peripheral blood cells from AS cases and controls, we show that the genotype by sex interaction the authors' report is unlikely to be a result of variation in MICA, but probably reflects a known interaction between the HLA-B gene, sex and risk of AS. We demonstrate that the diagnostic accuracy of AS in the UK Biobank is low, particularly amongst women, likely explaining some of the observed differences in heritability across the sexes and the difficulty in precisely locating association signals in the cohort.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xuan Zhou ◽  
S. Hong Lee

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI and height for N ~ 35,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome–exposome (gxe) and exposome–exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome and exposome). We also show, using established theories, that integrating genomic and exposomic data can be an effective way of attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Marion Patxot ◽  
Daniel Trejo Banos ◽  
Athanasios Kousathanas ◽  
Etienne J. Orliac ◽  
Sven E. Ojavee ◽  
...  

AbstractWe develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32–44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data.


Author(s):  
Kenneth E. Westerman ◽  
Duy T. Pham ◽  
Liang Hong ◽  
Ye Chen ◽  
Magdalena Sevilla-González ◽  
...  

ABSTRACTMotivationGene-environment interaction (GEI) studies are a general framework that can be used to identify genetic variants that modify the effects of environmental, physiological, lifestyle, or treatment effects on complex traits. Moreover, accounting for GEIs can enhance our understanding of the genetic architecture of complex diseases. However, commonly-used statistical software programs for GEI studies are either not applicable to testing certain types of GEI hypotheses or have not been optimized for use in large samples.ResultsHere, we develop a new software program, GEM (Gene-Environment interaction analysis in Millions of samples), which supports the inclusion of multiple GEI terms, adjustment for GEI covariates, and robust inference, while allowing multi-threading to reduce computation time. GEM can conduct GEI tests as well as joint tests of genetic effects for both continuous and binary phenotypes. Through simulations, we demonstrate that GEM scales to millions of samples while addressing limitations of existing software programs. We additionally conduct a gene-sex interaction analysis on waist-hip ratio in 352,768 unrelated individuals from the UK Biobank, identifying 39 novel loci in the joint test that have not previously been reported in combined or sex-specific analyses. Our results demonstrate that GEM can facilitate the next generation of large-scale GEI studies and help advance our understanding of genomic contributions to complex traits.AvailabilityGEM is freely available as an open source project at https://github.com/large-scale-gxe-methods/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Ali Pazokitoroudi ◽  
Alec M. Chiu ◽  
Kathryn S. Burch ◽  
Bogdan Pasaniuc ◽  
Sriram Sankararaman

AbstractThe proportion of variation in complex traits that can be attributed to non-additive genetic effects has been a topic of intense debate. The availability of Biobank-scale datasets of genotype and trait data from unrelated individuals opens up the possibility of obtaining precise estimates of the contribution of non-additive genetic effects. We present an efficient method that can partition the variation in complex traits into variance that can be attributed to additive (additive heritability) and dominance (dominance heritability) effects across all genotyped SNPs in a large collection of unrelated individuals. Over a wide range of genetic architectures, our method yields unbiased estimates of heritability. We applied our method, in turn, to array genotypes as well as imputed genotypes (at common SNPs with minor allele frequency, MAF > 1%) and 50 quantitative traits measured in 291, 273 unrelated white British individuals in the UK Biobank. Averaged across these 50 traits, we find that additive heritability on array SNPs is 21.86% while dominance heritability is 0.13% (about 0.48% of the additive heritability) with qualitatively similar results for imputed genotypes. We find no evidence for dominance heritability ( accounting for the number of traits tested) and estimate that dominance heritability is unlikely to exceed 1% for the traits analyzed. Our analyses indicate a limited contribution of dominance heritability to complex trait variation.


Sign in / Sign up

Export Citation Format

Share Document