Coordinated Interaction: A model and test for globally signed epistasis in complex traits

AbstractInteractions between genetic variants – epistasis – is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work we develop a model for structured polygenic epistasis, called Coordinated Interaction (CI), and prove that several recent theories of genetic architecture fall under the formal umbrella of CI. Unlike standard polygenic epistasis models that assume interaction and main effects are independent, in the CI model, sets of SNPs broadly interact positively or negatively, on balance skewing the penetrance of main genetic effects. To test for the existence of CI we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CI in 14 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CI is a new dimension of genetic architecture that can capture structured, systemic interactions in complex human traits.

Download Full-text

A model and test for coordinated polygenic epistasis in complex traits

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1922305118 ◽

2021 ◽

Vol 118 (15) ◽

pp. e1922305118

Author(s):

Brooke Sheppard ◽

Nadav Rappoport ◽

Po-Ru Loh ◽

Stephan J. Sanders ◽

Noah Zaitlen ◽

...

Keyword(s):

Genetic Variants ◽

Complex Traits ◽

Genetic Architecture ◽

Model Systems ◽

Disease Dynamics ◽

Uk Biobank ◽

Biological Models ◽

The Uk ◽

Complex Human Traits ◽

Main Effects

Interactions between genetic variants—epistasis—is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue–trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.

Download Full-text

Detection and quantification of inbreeding depression for complex traits from SNP data

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1621096114 ◽

2017 ◽

Vol 114 (32) ◽

pp. 8602-8607 ◽

Cited By ~ 15

Author(s):

Loic Yengo ◽

Zhihong Zhu ◽

Naomi R. Wray ◽

Bruce S. Weir ◽

Jian Yang ◽

...

Keyword(s):

Inbreeding Depression ◽

Complex Traits ◽

Genetic Architecture ◽

Handgrip Strength ◽

Uk Biobank ◽

Snp Data ◽

Causal Variants ◽

Auditory Acuity ◽

The Uk ◽

Detection And Quantification

Quantifying the effects of inbreeding is critical to characterizing the genetic architecture of complex traits. This study highlights through theory and simulations the strengths and shortcomings of three SNP-based inbreeding measures commonly used to estimate inbreeding depression (ID). We demonstrate that heterogeneity in linkage disequilibrium (LD) between causal variants and SNPs biases ID estimates, and we develop an approach to correct this bias using LD and minor allele frequency stratified inference (LDMS). We quantified ID in 25 traits measured in ∼140,000 participants of the UK Biobank, using LDMS, and confirmed previously published ID for 4 traits. We find unique evidence of ID for handgrip strength, waist/hip ratio, and visual and auditory acuity (ID between −2.3 and −5.2 phenotypic SDs for complete inbreeding; P<0.001). Our results illustrate that a careful choice of the measure of inbreeding combined with LDMS stratification improves both detection and quantification of ID using SNP data.

Download Full-text

Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture

10.1101/526855 ◽

2019 ◽

Cited By ~ 3

Author(s):

Kangcheng Hou ◽

Kathryn S. Burch ◽

Arunabha Majumdar ◽

Huwenbo Shi ◽

Nicholas Mancuso ◽

...

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Accurate Estimation ◽

Phenotypic Variance ◽

Uk Biobank ◽

Genome Wide ◽

Wide Range ◽

Fundamental Quantity ◽

The Uk ◽

Scale Data

AbstractThe proportion of phenotypic variance attributable to the additive effects of a given set of genotyped SNPs (i.e. SNP-heritability) is a fundamental quantity in the study of complex traits. Recent works have shown that existing methods to estimate genome-wide SNP-heritability often yield biases when their assumptions are violated. While various approaches have been proposed to account for frequency- and LD-dependent genetic architectures, it remains unclear which estimates of SNP-heritability reported in the literature are reliable. Here we show that genome-wide SNP-heritability can be accurately estimated from biobank-scale data irrespective of the underlying genetic architecture of the trait, without specifying a heritability model or partitioning SNPs by minor allele frequency and/or LD. We use theoretical justifications coupled with extensive simulations starting from real genotypes from the UK Biobank (N=337K) to show that, unlike existing methods, our closed-form estimator for SNP-heritability is highly accurate across a wide range of architectures. We provide estimates of SNP-heritability for 22 complex traits and diseases in the UK Biobank and show that, consistent with our results in simulations, existing biobank-scale methods yield estimates up to 30% different from our theoretically-justified approach.

Download Full-text

Probabilistic inference of the genetic architecture of functional enrichment of complex traits

10.1101/2020.09.04.20188433 ◽

2020 ◽

Author(s):

Marion Patxot ◽

Daniel Trejo Banos ◽

Athanasios Kousathanas ◽

Etienne J Orliac ◽

Sven E Ojavee ◽

...

Keyword(s):

Effect Size ◽

Complex Traits ◽

Genetic Architecture ◽

Penalized Regression ◽

Functional Enrichment ◽

Effect Sizes ◽

Uk Biobank ◽

Regulatory Regions ◽

Coding Regions ◽

The Uk

Due to the complexity of linkage disequilibrium (LD) and gene regulation, understanding the genetic basis of common complex traits remains a major challenge. We develop a Bayesian model (BayesRR-RC) implemented in a hybrid-parallel algorithm that scales to whole-genome sequence data on many hundreds of thousands of individuals, taking 22 seconds per iteration to estimate the inclusion probabilities and effect sizes of 8.4 million markers and 78 SNP-heritability parameters in the UK Biobank. Unlike naive penalized regression or mixed-linear model approaches, BayesRR-RC accurately estimates annotation-specific genetic architecture, determines the underlying joint effect size distribution and provides a probabilistic determination of association within marker groups in a single step. Of the genetic variation captured for height, body mass index, cardiovascular disease, and type-2 diabetes in the UK Biobank, only ≤ 10% is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, up to 40% to intronic regions, and 22-28% to distal 10-500kb upstream regions. ≥60% of the variance contributed by these exonic, intronic and distal 10-500kb regions is underlain by many thousands of common variants, each with larger average effect sizes compared to the rest of the genome. We also find differences in the relationship between effect size and heterozygosity across annotation groups and across traits. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance for just these four traits. In the Estonian Biobank, we show improved prediction accuracy over other approaches and generate a posterior predictive distribution for each individual.

Download Full-text

Analysis of genetic dominance in the UK Biobank

10.1101/2021.08.15.456387 ◽

2021 ◽

Author(s):

Duncan S Palmer ◽

Wei Zhou ◽

Liam Abbott ◽

Nik Baya ◽

Claire Churchhouse ◽

...

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Model Organisms ◽

Systematic Evaluation ◽

Hair Color ◽

Phenotypic Variance ◽

Additive Effects ◽

Uk Biobank ◽

Genome Wide ◽

The Uk

In classical statistical genetic theory, a dominance effect is defined as the deviation from a purely additive genetic effect for a biallelic variant. Dominance effects are well documented in model organisms. However, evidence in humans is limited to a handful of traits, particularly those with strong single locus effects such as hair color. We carried out the largest systematic evaluation of dominance effects on phenotypic variance in the UK Biobank. We curated and tested over 1,000 phenotypes for dominance effects through GWAS scans, identifying 175 loci at genome-wide significance correcting for multiple testing (P < 4.7 × 10-11). Power to detect non-additive loci is much lower than power to detect additive effects for complex traits: based on the relative effect sizes at genome-wide significant additive loci, we estimate a factor of 20-30 increase in sample size will be necessary to capture clear evidence of dominance similar to those currently observed for additive effects. However, these localised dominance hits do not extend to a significant aggregate contribution to phenotypic variance genome-wide. By deriving a version of LD-score regression to detect dominance effects tagged by common variation genome-wide (minor allele frequency > 0.05), we found no strong evidence of a contribution to phenotypic variance when accounting for multiple testing. Across the 267 continuous and 793 binary traits the median contribution was 5.73 × 10-4, with unbiased point estimates ranging from -0.261 to 0.131. Finally, we introduce dominance fine-mapping to explore whether the more rapid decay of dominance LD can be leveraged to find causal variants. These results provide the most comprehensive assessment of dominance trait variation in humans to date.

Download Full-text

Pairwise genetic interactions modulate lipid plasma levels and cellular uptake

10.1101/2020.10.29.360818 ◽

2020 ◽

Author(s):

Magdalena Zimon ◽

Yunfeng Huang ◽

Anthi Trasta ◽

Jimmy Z. Liu ◽

Chia-Yen Chen ◽

...

Keyword(s):

Complex Traits ◽

Drug Target ◽

Human Genetics ◽

Large Population ◽

Genetic Interactions ◽

Model Systems ◽

Lipid Lowering ◽

Gene Pairs ◽

Population Sizes ◽

The Uk

SUMMARYGenetic interactions (GIs), the joint impact of different genes or variants on a phenotype, are foundational to the genetic architecture of complex traits. However, identifying GIs through human genetics is challenging since it necessitates very large population sizes, while findings from model systems not always translate to humans. Here, we combined exome-sequencing and genotyping in the UK Biobank with combinatorial RNA-interference (coRNAi) screening to systematically test for pairwise GIs between 30 lipid GWAS genes. Gene-based protein-truncating variant (PTV) burden analyses from 240,970 exomes revealed additive GIs for APOB with PCSK9 and LPL, respectively. Both, genetics and coRNAi identified additive GIs for 12 additional gene pairs. Overlapping non-additive GIs were detected only for TOMM40 at the APOE locus with SORT1 and NCAN. Our study identifies distinct gene pairs that modulate both, plasma and cellular lipid levels via additive and non-additive effects and nominates drug target pairs for improved lipid-lowering combination therapies.

Download Full-text

158. Exploring the Common Genetic Architecture of PTSD Symptoms in the UK Biobank

Biological Psychiatry ◽

10.1016/j.biopsych.2018.02.176 ◽

2018 ◽

Vol 83 (9) ◽

pp. S64 ◽

Cited By ~ 1

Author(s):

Gerome Breen ◽

Jonathan Coleman

Keyword(s):

Genetic Architecture ◽

Ptsd Symptoms ◽

Uk Biobank ◽

The Common ◽

The Uk

Download Full-text

Sex differences in genetic architecture in the UK Biobank

Nature Genetics ◽

10.1038/s41588-021-00912-0 ◽

2021 ◽

Vol 53 (9) ◽

pp. 1283-1289

Author(s):

Elena Bernabeu ◽

Oriol Canela-Xandri ◽

Konrad Rawlik ◽

Andrea Talenti ◽

James Prendergast ◽

...

Keyword(s):

Sex Differences ◽

Genetic Architecture ◽

Uk Biobank ◽

The Uk

Download Full-text

LDpred-funct: incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets

10.1101/375337 ◽

2018 ◽

Cited By ~ 21

Author(s):

Carla Márquez-Luna ◽

Steven Gazal ◽

Po-Ru Loh ◽

Samuel S. Kim ◽

Nicholas Furlotte ◽

...

Keyword(s):

Complex Traits ◽

Prediction Accuracy ◽

Causal Effect ◽

Complex Trait ◽

Training Data ◽

Data Sets ◽

Uk Biobank ◽

Validation Data ◽

Functional Regions ◽

The Uk

AbstractGenetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg N=373K) and samples of other European ancestries as validation data (avg N=22K), to minimize confounding. LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2=0.144; highest R2=0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N=1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.

Download Full-text

Accurate Genomic Prediction Of Human Height

10.1101/190124 ◽

2017 ◽

Cited By ~ 6

Author(s):

Louis Lello ◽

Steven G. Avery ◽

Laurent Tellier ◽

Ana I. Vazquez ◽

Gustavo de los Campos ◽

...

Keyword(s):

Genetic Architecture ◽

Uk Biobank ◽

Multiple Phenotypes ◽

Human Height ◽

Out Of Sample ◽

Heel Bone ◽

The Common ◽

Actual Height ◽

The Uk ◽

Missing Heritability Problem

AbstractWe construct genomic predictors for heritable and extremely complex human quan-titative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

Download Full-text