Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.

Download Full-text

A General Framework for Two-Stage Analysis of Genome-wide Association Studies and Its Application to Case-Control Studies

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2012.03.007 ◽

2012 ◽

Vol 90 (5) ◽

pp. 760-773 ◽

Cited By ~ 15

Author(s):

James M.S. Wason ◽

Frank Dudbridge

Keyword(s):

General Framework ◽

Association Studies ◽

Case Control ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Two Stage ◽

Genome Wide ◽

Stage Analysis

Download Full-text

Across-cohort QC analyses of genome-wide association study summary statistics from complex traits

10.1101/033787 ◽

2015 ◽

Author(s):

Guo-Bo Chen ◽

Sang Hong Lee ◽

Matthew R Robinson ◽

Maciej Trzaskowski ◽

Zhi-Xiang Zhu ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

False Negative ◽

Genome Wide Association ◽

Effect Sizes ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Unknown Sample ◽

Genome Wide

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

Download Full-text

An atlas of genetic associations in UK Biobank

10.1101/176834 ◽

2017 ◽

Cited By ~ 18

Author(s):

Oriol Canela-Xandri ◽

Konrad Rawlik ◽

Albert Tenesa

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Genome Wide ◽

Related Individuals ◽

Sufficient Statistical Power

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.

Download Full-text

Genetic factors of polygenic urolithiasis

Urologia Journal ◽

10.1177/0391560319898375 ◽

2020 ◽

Vol 87 (2) ◽

pp. 57-64

Author(s):

Filippova Tamara Vladimirovna ◽

Khafizov Кamil Faridovich ◽

Rudenko Vadim Igorevich ◽

Rapoport Leonid Mikhailovich ◽

Tsarichenko Dmitry Georgievich ◽

...

Keyword(s):

Genetic Factors ◽

Preventive Measures ◽

Association Studies ◽

Case Control ◽

International Studies ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Genome Wide ◽

Genetic Technologies ◽

High Throughput Dna Sequencing

The article summarizes the findings of Russian and international studies of the genetic aspects of polygenic urolithiasis associated with impairment of calcium metabolism. The article analyzes the genetic risk factors of polygenic nephrolithiasis that show significant association with the disease in case-control studies and Genome-Wide Association Studies (16 genes). We described the gene functions involved in concrement formation in polygenic nephrolithiasis. The modern molecular and genetic technologies (DNA microarray, high-throughput DNA sequencing, etc.) enable identification of the genetic predisposition to a specific disease, realization of the individualized treatment of the patient, and carrying out timely preventive measures among the proband’s relatives.

Download Full-text

SparsePro: an efficient genome-wide fine-mapping method integrating summary statistics and functional annotations

10.1101/2021.10.04.463133 ◽

2021 ◽

Author(s):

Wenmin Zhang ◽

Hamed S Najafabadi ◽

Yue Li

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Genetic Architecture ◽

Association Studies ◽

Computational Cost ◽

Mapping Method ◽

Genome Wide Association Studies ◽

Functional Annotations ◽

Genome Wide ◽

Causal Variants

Identifying causal variants from genome-wide association studies (GWASs) is challenging due to widespread linkage disequilibrium (LD). Functional annotations of the genome may help prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. However, classical fine-mapping methods have a high computational cost, particularly when the underlying genetic architecture and LD patterns are complex. Here, we propose a novel approach, SparsePro, to efficiently conduct functionally informed statistical fine-mapping. Our method enjoys two major innovations: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations used in existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors. We evaluate SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved more accurate and well-calibrated posterior inference with greatly reduced computation time. We demonstrate the utility of SparsePro by investigating the genetic architecture of five functional biomarkers of vital organs. We identify potential causal variants contributing to the genetically encoded coordination mechanisms between vital organs and pinpoint target genes with potential pleiotropic effects. In summary, we have developed an efficient genome-wide fine-mapping method with the ability to integrate functional annotations. Our method may have wide utility in understanding the genetics of complex traits as well as in increasing the yield of functional follow-up studies of GWASs.

Download Full-text

Technical Note: Efficient and accurate estimation of genotype odds ratios in biobank-based unbalanced case-control studies

10.1101/646018 ◽

2019 ◽

Author(s):

Rounak Dey ◽

Seunggeun Lee

Keyword(s):

Large Scale ◽

Association Studies ◽

Case Control ◽

Accurate Estimation ◽

Genome Wide Association Studies ◽

Odds Ratios ◽

Case Control Studies ◽

Genome Wide ◽

A Genome ◽

Wide Scale

AbstractIn genome-wide association studies (GWASs), genotype log-odds ratios (LORs) quantify the effects of the variants on the binary phenotypes, and calculating the genotype LORs for all of the markers is required for several downstream analyses. Calculating genotype LORs at a genome-wide scale is computationally challenging, especially when analyzing large-scale biobank data, which involves performing thousands of GWASs phenome-wide. Since most of the binary phenotypes in biobank-based studies have unbalanced (case : control = 1 : 10) or often extremely unbalanced (case : control = 1 : 100) case-control ratios, the existing methods cannot provide a scalable and accurate way to estimate the genotype LORs. The traditional logistic regression provides biased LOR estimates in such situations. Although the Firth bias correction method can provide unbiased LOR estimates, it is not scalable for genome-wide or phenome-wide scale association analyses typically used in biobank-based studies, especially when the number of non-genetic covariates is large. On the other hand, the saddlepoint approximation-based test (fastSPA), which can provide accurate p values and is scalable to analyse large-scale biobank data, does not provide the genotype LOR estimates as it is a score-based test. Here, we propose a scalable method based on score statistics, to accurately estimate the genotype LORs, adjusting for non-genetic covariates. Comparing to the Firth method, our proposed method reduces the computational complexity from O(nK2 + K3) to O(n), where n is the sample-size, and K is the number of non-genetic covariates. Our method is ~ 10x faster than the Firth method when 15 covariates are being adjusted for. Through extensive numerical simulations, we show that the proposed method is both scalable and accurate in estimating the genotype ORs in genome-wide or phenome-wide scale.

Download Full-text

Widespread allelic heterogeneity in complex traits

10.1101/076984 ◽

2016 ◽

Cited By ~ 3

Author(s):

Farhad Hormozdiari ◽

Anthony Zhu ◽

Gleb Kichaev ◽

Ayellet V. Segrè ◽

Chelsea J.-T. Ju ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Density Lipoprotein ◽

Computational Method ◽

Allelic Heterogeneity ◽

Genome Wide Association Studies ◽

New Methods ◽

Genome Wide ◽

Causal Variants

AbstractRecent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWAS and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4-23% in eQTLs, 35% in GWAS of High-Density Lipoprotein (HDL), and 23% in schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2=0.85, P = 2.2e-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.

Download Full-text

Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data

Genes ◽

10.3390/genes12050736 ◽

2021 ◽

Vol 12 (5) ◽

pp. 736

Author(s):

Xiaotian Dai ◽

Guifang Fu ◽

Shaofei Zhao ◽

Yifei Zeng

Keyword(s):

Type I Error ◽

Association Studies ◽

Case Control ◽

Error Rates ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Learning Approaches ◽

Genome Wide ◽

Control Disease

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.

Download Full-text

FiMAP: A Fast Identity-by-Descent Mapping Test for Biobank-scale Cohorts

10.1101/2021.06.30.21259773 ◽

2021 ◽

Author(s):

Han Chen ◽

Ardalan Naseri ◽

Degui Zhi

Keyword(s):

Complex Traits ◽

Association Studies ◽

Copy Number Variations ◽

Error Rates ◽

Chromosome 8 ◽

Type I ◽

Genome Wide Association Studies ◽

Identity By Descent ◽

Genome Wide ◽

A Genome

Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS approaches and variant set tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a 4 cM region on chromosome 8 associated with multiple traits related to body fat distribution or weight.

Download Full-text

Admixed Populations Improve Power for Variant Discovery and Portability in Genome-wide Association Studies

10.1101/2021.03.09.434643 ◽

2021 ◽

Author(s):

Meng Lin ◽

Danny S. Park ◽

Noah A. Zaitlen ◽

Brenna M. Henn ◽

Christopher R. Gignoux

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Genetic Architecture ◽

Association Studies ◽

Genome Wide Association ◽

Ancestral Population ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

Source Populations

AbstractGenome-wide association studies (GWAS) are primarily conducted in single-ancestry settings. The low transferability of results has limited our understanding of human genetic architecture across a range of complex traits. In contrast to homogeneous populations, admixed populations provide an opportunity to capture genetic architecture contributed from multiple source populations and thus improve statistical power. Here, we provide a mechanistic simulation framework to investigate the statistical power and transferability of GWAS under directional polygenic selection or varying divergence. We focus on a two-way admixed population and show that GWAS in admixed populations can be enriched for power in discovery by up to 2-fold compared to the ancestral populations under similar sample size. Moreover, higher accuracy of cross-population polygenic score estimates is also observed if variants and weights are trained in the admixed group rather than in the ancestral groups. Common variant associations are also more likely to replicate if first discovered in the admixed group and then transferred to an ancestral population, than the other way around (across 50 iterations with 1,000 causal SNPs, training on 10,000 individuals, testing on 1,000 in each population, p=3.78e-6, 6.19e-101, ~0 for FST = 0.2, 0.5, 0.8, respectively). While some of these FST values may appear extreme, we demonstrate that they are found across the entire phenome in the GWAS catalog. This framework demonstrates that investigation of admixed populations harbors significant advantages over GWAS in single-ancestry cohorts for uncovering the genetic architecture of traits and will improve downstream applications such as personalized medicine across diverse populations.

Download Full-text