Widespread allelic heterogeneity in complex traits

AbstractRecent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWAS and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4-23% in eQTLs, 35% in GWAS of High-Density Lipoprotein (HDL), and 23% in schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2=0.85, P = 2.2e-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.

Download Full-text

Identifying causal variants by fine mapping across multiple studies

PLoS Genetics ◽

10.1371/journal.pgen.1009733 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009733

Author(s):

Nathan LaPierre ◽

Kodi Taraszka ◽

Helen Huang ◽

Rosemary He ◽

Farhad Hormozdiari ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Association Studies ◽

Density Lipoprotein ◽

Genome Wide Association Studies ◽

Multivariate Normal ◽

Multiple Study ◽

Genome Wide ◽

Causal Variants ◽

Different Populations

Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).

Download Full-text

Across-cohort QC analyses of genome-wide association study summary statistics from complex traits

10.1101/033787 ◽

2015 ◽

Author(s):

Guo-Bo Chen ◽

Sang Hong Lee ◽

Matthew R Robinson ◽

Maciej Trzaskowski ◽

Zhi-Xiang Zhu ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

False Negative ◽

Genome Wide Association ◽

Effect Sizes ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Unknown Sample ◽

Genome Wide

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

Download Full-text

An atlas of genetic associations in UK Biobank

10.1101/176834 ◽

2017 ◽

Cited By ~ 18

Author(s):

Oriol Canela-Xandri ◽

Konrad Rawlik ◽

Albert Tenesa

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Genome Wide ◽

Related Individuals ◽

Sufficient Statistical Power

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.

Download Full-text

SparsePro: an efficient genome-wide fine-mapping method integrating summary statistics and functional annotations

10.1101/2021.10.04.463133 ◽

2021 ◽

Author(s):

Wenmin Zhang ◽

Hamed S Najafabadi ◽

Yue Li

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Genetic Architecture ◽

Association Studies ◽

Computational Cost ◽

Mapping Method ◽

Genome Wide Association Studies ◽

Functional Annotations ◽

Genome Wide ◽

Causal Variants

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.

Download Full-text

Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies

10.1101/374983 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lorin Crawford ◽

Xiang Zhou

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Computational Cost ◽

Case Control ◽

Type I ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Control Data ◽

Genome Wide

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.

Download Full-text

Integrating gene expression with summary association statistics to identify susceptibility genes for 30 complex traits

10.1101/072967 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nicholas Mancuso ◽

Huwenbo Shi ◽

Pagé Goddard ◽

Gleb Kichaev ◽

Alexander Gusev ◽

...

Keyword(s):

Gene Expression ◽

Genetic Correlation ◽

Complex Traits ◽

Association Studies ◽

Susceptibility Genes ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Causal Variants

AbstractAlthough genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. We leverage recently introduced methods to integrate gene expression measurements from 45 expression panels with summary GWAS data to perform 30 transcriptome-wide association studies (TWASs). We identify 1,196 susceptibility genes whose expression is associated with these traits; of these, 168 reside more than 0.5Mb away from any previously reported GWAS significant variant, thus providing new risk loci. Second, we find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, 8 are not found through genetic correlation at the SNP level. Third, we use bi-directional regression to find evidence for BMI causally influencing triglyceride levels, and triglyceride levels causally influencing LDL. Taken together, our results provide insights into the role of expression to susceptibility of complex traits and diseases.

Download Full-text

Identifying Causal Variants by Fine Mapping Across Multiple Studies

10.1101/2020.01.15.908517 ◽

2020 ◽

Cited By ~ 2

Author(s):

Nathan LaPierre ◽

Kodi Taraszka ◽

Helen Huang ◽

Rosemary He ◽

Farhad Hormozdiari ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Multiple Study ◽

Current State ◽

Genome Wide ◽

Causal Variants ◽

Different Populations

AbstractIncreasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. In a trans-ethnic, trans-biobank Type 2 Diabetes analysis, we show that MsCAVIAR returns causal set sizes that are over 20% smaller than those given by current state of the art methods for trans-ethnic fine-mapping.

Download Full-text

Admixed Populations Improve Power for Variant Discovery and Portability in Genome-wide Association Studies

10.1101/2021.03.09.434643 ◽

2021 ◽

Author(s):

Meng Lin ◽

Danny S. Park ◽

Noah A. Zaitlen ◽

Brenna M. Henn ◽

Christopher R. Gignoux

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Genetic Architecture ◽

Association Studies ◽

Genome Wide Association ◽

Ancestral Population ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

Source Populations

AbstractGenome-wide association studies (GWAS) are primarily conducted in single-ancestry settings. The low transferability of results has limited our understanding of human genetic architecture across a range of complex traits. In contrast to homogeneous populations, admixed populations provide an opportunity to capture genetic architecture contributed from multiple source populations and thus improve statistical power. Here, we provide a mechanistic simulation framework to investigate the statistical power and transferability of GWAS under directional polygenic selection or varying divergence. We focus on a two-way admixed population and show that GWAS in admixed populations can be enriched for power in discovery by up to 2-fold compared to the ancestral populations under similar sample size. Moreover, higher accuracy of cross-population polygenic score estimates is also observed if variants and weights are trained in the admixed group rather than in the ancestral groups. Common variant associations are also more likely to replicate if first discovered in the admixed group and then transferred to an ancestral population, than the other way around (across 50 iterations with 1,000 causal SNPs, training on 10,000 individuals, testing on 1,000 in each population, p=3.78e-6, 6.19e-101, ~0 for FST = 0.2, 0.5, 0.8, respectively). While some of these FST values may appear extreme, we demonstrate that they are found across the entire phenome in the GWAS catalog. This framework demonstrates that investigation of admixed populations harbors significant advantages over GWAS in single-ancestry cohorts for uncovering the genetic architecture of traits and will improve downstream applications such as personalized medicine across diverse populations.

Download Full-text

A unifying framework for joint trait analysis under a non-infinitesimal model

10.1101/293803 ◽

2018 ◽

Author(s):

Ruth Johnson ◽

Huwenbo Shi ◽

Bogdan Pasaniuc ◽

Sriram Sankararaman

Keyword(s):

Complex Traits ◽

Association Studies ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Posterior Density ◽

Trait Analysis ◽

Genome Wide ◽

Genetic Overlap ◽

Causal Variants ◽

Supplementary Material

AbstractMotivationA large proportion of risk regions identified by genome-wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases.ResultsIn this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non-Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non-infinitesimal genetic architecture underlying each trait. We propose a Metropolis-Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and BMI GWAS to show that it produces estimates consistent with the known genetic makeup of both traits.AvailabilityThe UNITY software is made freely available to the research community at: https://github.com/bogdanlab/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Quantitative and Qualitative Role of Antagonistic Heterogeneity in Genetics of Blood Lipids

The Journals of Gerontology Series A ◽

10.1093/gerona/glz225 ◽

2019 ◽

Vol 75 (10) ◽

pp. 1811-1819

Author(s):

Alexander M Kulminski ◽

Yury Loika ◽

Alireza Nazarian ◽

Irina Culminskaya

Keyword(s):

Genetic Predisposition ◽

Complex Traits ◽

Blood Lipids ◽

Association Studies ◽

Density Lipoprotein ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Trade Offs ◽

Genome Wide

Abstract Prevailing strategies in genome-wide association studies (GWAS) mostly rely on principles of medical genetics emphasizing one gene, one function, one phenotype concept. Here, we performed GWAS of blood lipids leveraging a new systemic concept emphasizing complexity of genetic predisposition to such phenotypes. We focused on total cholesterol, low- and high-density lipoprotein cholesterols, and triglycerides available for 29,902 individuals of European ancestry from seven independent studies, men and women combined. To implement the new concept, we leveraged the inherent heterogeneity in genetic predisposition to such complex phenotypes and emphasized a new counter intuitive phenomenon of antagonistic genetic heterogeneity, which is characterized by misalignment of the directions of genetic effects and the phenotype correlation. This analysis identified 37 loci associated with blood lipids but only one locus, FBXO33, was not reported in previous top GWAS. We, however, found strong effect of antagonistic heterogeneity that leaded to profound (quantitative and qualitative) changes in the associations with blood lipids in most, 25 of 37 or 68%, loci. These changes suggested new roles for some genes, which functions were considered as well established such as GCKR, SIK3 (APOA1 locus), LIPC, LIPG, among the others. The antagonistic heterogeneity highlighted a new class of genetic associations emphasizing beneficial and adverse trade-offs in predisposition to lipids. Our results argue that rigorous analyses dissecting heterogeneity in genetic predisposition to complex traits such as lipids beyond those implemented in current GWAS are required to facilitate translation of genetic discoveries into health care.

Download Full-text