Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits

Mapping Intimacies ◽

10.1101/188086 ◽

2017 ◽

Cited By ~ 19

Author(s):

Armin P Schoech ◽

Daniel Jordan ◽

Po-Ru Loh ◽

Steven Gazal ◽

Luke O’Connor ◽

...

Keyword(s):

Rare Variant ◽

Complex Traits ◽

Negative Selection ◽

Rare Variants ◽

Effect Sizes ◽

Significant Heterogeneity ◽

Uk Biobank ◽

Model Framework ◽

Allele Effect ◽

Variant Effect

AbstractUnderstanding the role of rare variants is important in elucidating the genetic basis of human diseases and complex traits. It is widely believed that negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1−p)]α, where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α by maximizing its profile likelihood in a linear mixed model framework using imputed genotypes, including rare variants (MAF >0.07%). We applied this method to 25 UK Biobank diseases and complex traits (N = 113,851). All traits produced negative α estimates with 20 significantly negative, implying larger rare variant effect sizes. The inferred best-fit distribution of true α values across traits had mean −0.38 (s.e. 0.02) and standard deviation 0.08 (s.e. 0.03), with statistically significant heterogeneity across traits (P = 0.0014). Despite larger rare variant effect sizes, we show that for most traits analyzed, rare variants (MAF <1%) explain less than 10% of total SNP-heritability. Using evolutionary modeling and forward simulations, we validated the α model of MAF-dependent trait effects and estimated the level of coupling between fitness effects and trait effects. Based on this analysis an average genome-wide negative selection coefficient on the order of 10−4 or stronger is necessary to explain the α values that we inferred.

Polygenicity of complex traits is explained by negative selection

10.1101/420497 ◽

2018 ◽

Cited By ~ 6

Author(s):

Luke J. O’Connor ◽

Armin P. Schoech ◽

Farhad Hormozdiari ◽

Steven Gazal ◽

Nick Patterson ◽

...

Keyword(s):

Complex Traits ◽

Negative Selection ◽

Genetic Architecture ◽

Low Frequency ◽

Effect Sizes ◽

Common Disease ◽

Common Variants ◽

Robust Statistical Method ◽

Genetic Signal ◽

Definition Of

Complex traits and common disease are highly polygenic: thousands of common variants are causal, and their effect sizes are almost always small. Polygenicity could be explained by negative selection, which constrains common-variant effect sizes and may reshape their distribution across the genome. We refer to this phenomenon as flattening, as genetic signal is flattened relative to the underlying biology. We introduce a mathematical definition of polygenicity, the effective number of associated SNPs, and a robust statistical method to estimate it. This definition of polygenicity differs from the number of causal SNPs, a standard definition; it depends strongly on SNPs with large effects. In analyses of 33 complex traits (average N=361k), we determined that common variants are ∼4x more polygenic than low-frequency variants, consistent with pervasive flattening. Moreover, functionally important regions of the genome have increased polygenicity in proportion to their increased heritability, implying that heritability enrichment reflects differences in the number of associations rather than their magnitude (which is constrained by selection). We conclude that negative selection constrains the genetic signal of biologically important regions and genes, reshaping genetic architecture.

RAREsim: A simulation method for very rare genetic variants

10.1101/2021.04.13.439644 ◽

2021 ◽

Author(s):

Megan Null ◽

Josée Dupuis ◽

Christopher R. Gignoux ◽

Audrey E. Hendricks

Keyword(s):

Rare Variant ◽

Complex Traits ◽

Rare Variants ◽

Simulated Data ◽

Real Data ◽

Simulation Method ◽

Sequencing Data ◽

Variant Annotation ◽

Causal Variants ◽

Rare Genetic Variants

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.

Effect sizes of causal variants for gene expression and complex traits differ between populations

10.1101/2021.12.06.471235 ◽

2021 ◽

Author(s):

Roshni A. Patel ◽

Shaila A. Musharoff ◽

Jeffrey P. Spence ◽

Harold Pimentel ◽

Catherine Tcheandjieu ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Association Studies ◽

Causal Variant ◽

Effect Sizes ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Polygenic Scores ◽

Causal Variants ◽

Variant Effect

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.

Negative selection on complex traits limits genetic risk prediction accuracy between populations

10.1101/721936 ◽

2019 ◽

Cited By ~ 2

Author(s):

Arun Durvasula ◽

Kirk E. Lohmueller

Keyword(s):

Risk Prediction ◽

Genetic Risk ◽

Complex Traits ◽

Negative Selection ◽

Association Studies ◽

Demographic History ◽

Effect Sizes ◽

Genetic Risk Prediction ◽

The Impact ◽

European Populations

Accurate genetic risk prediction is a key goal for medical genetics and great progress has been made toward identifying individuals with extreme risk across several traits and diseases (Collins and Varmus, 2015). However, many of these studies are done in predominantly European populations (Bustamante et al., 2011; Popejoy and Fullerton, 2016). Although GWAS effect sizes correlate across ancestries (Wojcik et al., 2019), risk scores show substantial reductions in accuracy when applied to non-European populations (Kim et al., 2018; Martin et al., 2019; Scutari et al., 2016). We use simulations to show that human demographic history and negative selection on complex traits result in population specific genetic architectures. For traits under moderate negative selection, ~50% of the heritability can be accounted for by variants in Europe that are absent from Africa. We show that this directly leads to poor performance in risk prediction when using variants discovered in Europe to predict risk in African populations, especially in the tails of the risk distribution. To evaluate the impact of this effect in genomic data, we built a Bayesian model to stratify heritability between European-specific and shared variants and applied it to 43 traits and diseases in the UK Biobank. Across these phenotypes, we find ~50% of the heritability comes from European-specific variants, setting an upper bound on the accuracy of genetic risk prediction in non-European populations using effect sizes discovered in European populations. We conclude that genetic association studies need to include more diverse populations to enable to utility of genetic risk prediction in all populations.

rareSurvival: rare variant association analysis for time-to-event outcomes.

10.1101/2021.12.19.473338 ◽

2021 ◽

Author(s):

Hamzah Syed ◽

Andrea Jorgensen ◽

Andrew Morris

Keyword(s):

Rare Variant ◽

Complex Traits ◽

High Performance ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Rare Variants ◽

Cox Proportional Hazards ◽

Cox Proportional Hazards Model ◽

Time To Event ◽

Medicine Research

SRare variants have been proposed as contributing to the "missing heritability" of complex human traits. There has been much recent development of methodology to investigate association of complex traits with multiple rare variants within pre-defined "units" from sequence and array-based studies of the exome or genome. However, software for modelling time to event outcomes for rare variant associations has been under developed in comparison with binary and quantitative traits. We introduce a new command line application, rareSurvival, used for the analysis of rare variants with time to event outcomes. The program is compatible with high performance computing (HPC) clusters for batch processing. rareSurvival implements statistical methodology, which are a combination of widely used survival and gene-based analysis techniques such as the Cox proportional hazards model and the burden test. We introduce a novel piece of software that will be at the forefront of efforts to discover rare variants associated with a variety of complex diseases with survival endpoints. Availability & Implementation: rareSurvival is implemented in C#, available on Linux, Windows and Mac OS X operating systems. It is freely available (GNU General Public License, version 3) to download from https://www.liverpool.ac.uk/translational-medicine/research/statistical-genetics/software/. Download Mono for Linux or Mac OS X to run software.

Cohort-wide deep whole genome sequencing and the allelic architecture of complex traits

10.1101/283481 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arthur Gilly ◽

Daniel Suveges ◽

Karoline Kuchenbaecker ◽

Martin Pollard ◽

Lorraine Southam ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variant ◽

Complex Traits ◽

Rare Variants ◽

Whole Genome ◽

Isolated Population ◽

Gamma Glutamyltransferase ◽

Cardiometabolic Traits

The role of rare variants in complex traits remains uncharted. Here, we conduct deep whole genome sequencing of 1,457 individuals from an isolated population, and test for rare variant burdens across six cardiometabolic traits. We identify a role for rare regulatory variation, which has hitherto been missed. We find evidence of rare variant burdens overlapping with, and mostly independent of established common variant signals (ADIPOQ and adiponectin, P=4.2×10−8; APOC3 and triglyceride levels, P=1.58×10−26; GGT1 and gamma-glutamyltransferase, P=2.3×10−6; UGT1A9 and bilirubin, P=1.9×10−8), and identify replicating evidence for a burden associated with triglyceride levels in FAM189A (P=2.26×10−8), indicating a role for this gene in lipid metabolism.

Highly pleiotropic variants of human traits are enriched in genomic regions with strong background selection

10.21203/rs.3.rs-345603/v1 ◽

2021 ◽

Author(s):

Irene Novo ◽

Eugenio López-Cortegano ◽

Armando Caballero

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Effect Sizes ◽

Frequency Effect ◽

Background Selection ◽

The Mean ◽

Using Data ◽

Genomic Regions ◽

Human Complex ◽

Gwas Catalog

Abstract Recent studies have shown the ubiquity of pleiotropy for variants affecting human complex traits. These studies also show that rare variants tend to be less pleiotropic than common ones, suggesting that purifying natural selection acts against highly pleiotropic variants of large effect. Here we investigate the mean frequency, effect size and recombination rate associated with pleiotropic variants, and focus particularly on whether highly pleiotropic variants are enriched in regions with putative strong background selection. We evaluate variants for 41 human traits using data from the NHGRI-EBI GWAS Catalog, as well as data from other three studies. Our results show that variants involving a higher degree of pleiotropy tend to be more common, have larger mean effect sizes, and contribute more to heritability than variants with a lower degree of pleiotropy. Using data from four different studies, we show that more pleiotropic variants are enriched in genome regions with stronger background selection than less pleiotropic variants. Thus, we conclude that even though highly pleiotropic variants found so far have larger average effect sizes and frequencies than less pleiotropic ones, they are likely to be subjected to stronger background selection.

Selection and explosive growth may hamper the performance of rare variant association tests

10.1101/015917 ◽

2015 ◽

Cited By ~ 2

Author(s):

Lawrence H. Uricchio ◽

John S. Witte ◽

Ryan D. Hernandez

Keyword(s):

Natural Selection ◽

Rare Variant ◽

Complex Traits ◽

Statistical Power ◽

Rare Variants ◽

Model Parameters ◽

Phenotypic Variance ◽

Additive Variance ◽

Rare Variant Association ◽

Association Tests

Much recent debate has focused on the role of rare variants in complex phenotypes. However, it is well known that rare alleles can only contribute a substantial proportion of the phenotypic variance when they have much larger effect sizes than common variants, which is most easily explained by natural selection constraining trait-altering alleles to low frequency. It is also plausible that demographic events will influence the genetic architecture of complex traits. Unfortunately, most rare variant association tests do not explicitly model natural selection or non-equilibrium demography. Here, we develop a novel evolutionary model of complex traits. We perform numerical calculations and simulate phenotypes under this model using inferred human demographic and selection parameters. We show that rare variants only contribute substantially to complex traits under very strong assumptions about the relationship between effect size and selection strength. We then assess the performance of state-of-the-art rare variant tests using our simulations across a broad range of model parameters. Counterintuitively, we find that statistical power is lowest when rare variants make the greatest contribution to the additive variance, and that power is substantially lower under our model than previously studied models. While many empirical studies have attempted to identify causal loci using rare variant association methods, few have reported novel associations. Some authors have interpreted this to mean that rare variants contribute little to heritability, but our results show that an alternative explanation is that rare variant tests have less power than previously estimated.

Genetic architecture of smoking: Evaluating rare variant contribution from deep whole-genome sequencing of up to 26,000 individuals

10.21203/rs.3.rs-475149/v1 ◽

2021 ◽

Author(s):

Seon-Kyeong Jang ◽

Luke Evans ◽

Allison Fialkowski ◽

Donna Arnett ◽

Diane Becker ◽

...

Keyword(s):

Population Structure ◽

Rare Variant ◽

Tobacco Use ◽

Upper Bound ◽

Complex Traits ◽

Rare Variants ◽

European Ancestry ◽

Substantial Contribution ◽

Whole Genome ◽

Close Relatives

Abstract Background Across complex traits, common variants explain only a modest amount of variance, with SNP-heritability consistently below heritability estimates from close relatives. Here, we examined the contribution of rare variant to tobacco use risk in up to 26,000 individuals of European ancestry in the Trans-Omics for Precision Medicine (TOPMed) program with whole genome sequence (WGS;~30X coverage).Method We grouped about 35 million genetic variants by their minor allele frequencies (MAF) and linkage disequilibrium (LD) and estimated SNP-heritability for age of smoking initiation (N = 14,747), cigarettes smoked per day (N = 15,425), smoking cessation (N = 17,871) and initiation (N = 26,340) using linear mixed model. Rare variant population structure is detected and adjusted for by permutation procedure. We estimated an upper bound for narrow-sense heritability for tobacco use using available pedigrees consisting of close relatives in TOPMed.Results Rare variants with MAF 0.1–0.01%, mostly from non-protein altering region, accounted for 26% of variation in age of initiation and 15% for cessation. Follow-up analysis indicated that about one-third of these rare variants contribtion is potentially confounded with rare variants structure even after adjusting for principal components. After further conservative adjustment of population structure, we estimated SNP-based heritability to be 0.21 (SE = 0.08) for age of initiation, 0.15 (0.06) for cigarettes per day, 0.21 (0.09) for cessation, and 0.24 (0.07) for initiation, 1.8–4.5 times higher than previous SNP-based estimates. Our pedigree-based upper-bound for SNP-based heritability ranged from 0.18–0.35.Conclusion The substantial contribution of rare variants for several smoking phenotypes sheds light on the missing heritability and genetic etiology of tobacco use. It also informs fine-mapping strategies since the majority of the rare variant contribution was located in non-coding regulatory regions.

Probabilistic inference of the genetic architecture of functional enrichment of complex traits

10.1101/2020.09.04.20188433 ◽

2020 ◽

Author(s):

Marion Patxot ◽

Daniel Trejo Banos ◽

Athanasios Kousathanas ◽

Etienne J Orliac ◽

Sven E Ojavee ◽

...

Keyword(s):

Effect Size ◽

Complex Traits ◽

Genetic Architecture ◽

Penalized Regression ◽

Functional Enrichment ◽

Effect Sizes ◽

Uk Biobank ◽

Regulatory Regions ◽

Coding Regions ◽

The Uk

Due to the complexity of linkage disequilibrium (LD) and gene regulation, understanding the genetic basis of common complex traits remains a major challenge. We develop a Bayesian model (BayesRR-RC) implemented in a hybrid-parallel algorithm that scales to whole-genome sequence data on many hundreds of thousands of individuals, taking 22 seconds per iteration to estimate the inclusion probabilities and effect sizes of 8.4 million markers and 78 SNP-heritability parameters in the UK Biobank. Unlike naive penalized regression or mixed-linear model approaches, BayesRR-RC accurately estimates annotation-specific genetic architecture, determines the underlying joint effect size distribution and provides a probabilistic determination of association within marker groups in a single step. Of the genetic variation captured for height, body mass index, cardiovascular disease, and type-2 diabetes in the UK Biobank, only ≤ 10% is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, up to 40% to intronic regions, and 22-28% to distal 10-500kb upstream regions. ≥60% of the variance contributed by these exonic, intronic and distal 10-500kb regions is underlain by many thousands of common variants, each with larger average effect sizes compared to the rest of the genome. We also find differences in the relationship between effect size and heterozygosity across annotation groups and across traits. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance for just these four traits. In the Estonian Biobank, we show improved prediction accuracy over other approaches and generate a posterior predictive distribution for each individual.