Gene-Based Association Tests Using GWAS Summary Statistics and Incorporating eQTL

Abstract Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by the genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose two powerful and computationally efficient gene-based association tests, Overall and Copula. These two tests aggregate information from three traditional types of gene-based association tests and also incorporate expression quantitative trait locus (eQTL) data into GWAS using GWAS summary statistics. Overall utilizes the extended Simes procedure and Copula utilizes the Gaussian copula approximation-based method. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the P values of these two methods can be calculated analytically. Simulation studies show that these two tests can control type I error rate very well and have higher power than the tests that we compared. We also apply these two methods to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that these two newly developed methods can identify more significant genes than other methods we compared with.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

Partitioning heritability by functional category using GWAS summary statistics

10.1101/014241 ◽

2015 ◽

Cited By ~ 9

Author(s):

Hilary Kiyo Finucane ◽

Brendan Bulik-Sullivan ◽

Alexander Gusev ◽

Gosia Trynka ◽

Yakir Reshef ◽

...

Keyword(s):

Association Studies ◽

Smoking Behavior ◽

Complex Diseases ◽

New Method ◽

Age At Menarche ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Cell Type ◽

Genome Wide ◽

Cell Type Specific

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.

Download Full-text

MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa028 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Qing Cheng ◽

Yi Yang ◽

Xingjie Shi ◽

Kar-Fu Yeung ◽

Can Yang ◽

...

Keyword(s):

Risk Factors ◽

Linkage Disequilibrium ◽

Genetic Variants ◽

Mendelian Randomization ◽

Association Studies ◽

Alternative Methods ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Causal Relationships ◽

Disease Outcomes

Abstract The proliferation of genome-wide association studies (GWAS) has prompted the use of two-sample Mendelian randomization (MR) with genetic variants as instrumental variables (IVs) for drawing reliable causal relationships between health risk factors and disease outcomes. However, the unique features of GWAS demand that MR methods account for both linkage disequilibrium (LD) and ubiquitously existing horizontal pleiotropy among complex traits, which is the phenomenon wherein a variant affects the outcome through mechanisms other than exclusively through the exposure. Therefore, statistical methods that fail to consider LD and horizontal pleiotropy can lead to biased estimates and false-positive causal relationships. To overcome these limitations, we proposed a probabilistic model for MR analysis in identifying the causal effects between risk factors and disease outcomes using GWAS summary statistics in the presence of LD and to properly account for horizontal pleiotropy among genetic variants (MR-LDP) and develop a computationally efficient algorithm to make the causal inference. We then conducted comprehensive simulation studies to demonstrate the advantages of MR-LDP over the existing methods. Moreover, we used two real exposure–outcome pairs to validate the results from MR-LDP compared with alternative methods, showing that our method is more efficient in using all-instrumental variants in LD. By further applying MR-LDP to lipid traits and body mass index (BMI) as risk factors for complex diseases, we identified multiple pairs of significant causal relationships, including a protective effect of high-density lipoprotein cholesterol on peripheral vascular disease and a positive causal effect of BMI on hemorrhoids.

Download Full-text

Multiple phenotype association tests using summary statistics in genome-wide association studies

Biometrics ◽

10.1111/biom.12735 ◽

2017 ◽

Vol 74 (1) ◽

pp. 165-175 ◽

Cited By ~ 19

Author(s):

Zhonghua Liu ◽

Xihong Lin

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Association Tests ◽

Genome Wide ◽

Multiple Phenotype

Download Full-text

CoMM-S4: A Collaborative Mixed Model Using Summary-Level eQTL and GWAS Datasets in Transcriptome-Wide Association Studies

Frontiers in Genetics ◽

10.3389/fgene.2021.704538 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yi Yang ◽

Kar-Fu Yeung ◽

Jin Liu

Keyword(s):

Likelihood Ratio ◽

Genetic Variants ◽

Association Studies ◽

Ratio Test ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Expression Trait ◽

Individual Level ◽

Trait Association ◽

Eqtl Data

Motivation: Genome-wide association studies (GWAS) have achieved remarkable success in identifying SNP-trait associations in the last decade. However, it is challenging to identify the mechanisms that connect the genetic variants with complex traits as the majority of GWAS associations are in non-coding regions. Methods that integrate genomic and transcriptomic data allow us to investigate how genetic variants may affect a trait through their effect on gene expression. These include CoMM and CoMM-S2, likelihood-ratio-based methods that integrate GWAS and eQTL studies to assess expression-trait association. However, their reliance on individual-level eQTL data render them inapplicable when only summary-level eQTL results, such as those from large-scale eQTL analyses, are available.Result: We develop an efficient probabilistic model, CoMM-S4, to explore the expression-trait association using summary-level eQTL and GWAS datasets. Compared with CoMM-S2, which uses individual-level eQTL data, CoMM-S4 requires only summary-level eQTL data. To test expression-trait association, an efficient variational Bayesian EM algorithm and a likelihood ratio test were constructed. We applied CoMM-S4 to both simulated and real data. The simulation results demonstrate that CoMM-S4 can perform as well as CoMM-S2 and S-PrediXcan, and analyses using GWAS summary statistics from Biobank Japan and eQTL summary statistics from eQTLGen and GTEx suggest novel susceptibility loci for cardiovascular diseases and osteoporosis.Availability and implementation: The developed R package is available at https://github.com/gordonliu810822/CoMM.

Download Full-text

The length of the expressed 3’ UTR is an intermediate molecular phenotype linking genetic variants to complex diseases

10.1101/540088 ◽

2019 ◽

Author(s):

Elisa Mariella ◽

Federico Marotta ◽

Elena Grassi ◽

Stefano Gilotto ◽

Paolo Provero

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Alternative Polyadenylation ◽

Complex Diseases ◽

Genome Wide Association Studies ◽

Sequencing Data ◽

Transcriptional Regulatory Mechanism ◽

Transcriptional Regulatory ◽

Common Genetic Variants ◽

Molecular Phenotypes

AbstractIn the last decades, genome wide association studies (GWAS) have uncovered tens of thousands of associations between common genetic variants and complex diseases. However, these statistical associations can rarely be interpreted functionally and mechanistically. As the majority of the disease-associated variants are located far from coding sequences, even the relevant gene is often unclear. A way to gain insight into the relevant mechanisms is to study the genetic determinants of intermediate molecular phenotypes, such as gene expression and transcript structure. We propose a computational strategy to discover genetic variants affecting the relative expression of alternative 3’ untranslated region (UTR) isoforms, generated through alternative polyadenylation, a widespread post-transcriptional regulatory mechanism known to have relevant functional consequences. When applied to a large dataset in which whole genome and RNA sequencing data are available for 373 European individuals, 2,530 genes with alternative polyadenylation quantitative trait loci (apaQTL) were identified. We analyze and discuss possible mechanisms of action of these variants, and we show that they are significantly enriched in GWAS hits, in particular those concerning immune-related and neurological disorders. Our results point to an important role for genetically determined alternative polyadenylation in affecting predisposition to complex diseases, and suggest new ways to extract functional information from GWAS data.

Download Full-text

Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1406143112 ◽

2015 ◽

Vol 112 (4) ◽

pp. 1019-1024 ◽

Cited By ~ 11

Author(s):

Yi-Juan Hu ◽

Yun Li ◽

Paul L. Auer ◽

Dan-Yu Lin

Keyword(s):

Type I Error ◽

Rare Variants ◽

Extreme Values ◽

Association Studies ◽

Cost Effective ◽

Type I ◽

Genome Wide Association Studies ◽

Score Statistic ◽

Sequencing Data ◽

Association Tests

In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncertainties of imputed rare variants in downstream association analysis will inflate the type I error when sequenced subjects are not a random subset of the GWAS subjects. In this article, we provide a valid and efficient approach to combining observed and imputed data on rare variants. We consider commonly used gene-level association tests, all of which are constructed from the score statistic for assessing the effects of individual variants on the trait of interest. We show that the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for nonsequenced subjects is unbiased. We derive a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, such that the corresponding association tests always have correct type I error. We demonstrate through extensive simulation studies that the proposed tests are substantially more powerful than the use of accurately imputed variants only and the use of sequencing data alone. We provide an application to the Women’s Health Initiative. The relevant software is freely available.

Download Full-text

Operating Characteristics of the Rank-Based Inverse Normal Transformation for Quantitative Trait Analysis in Genome-Wide Association Studies

10.1101/635706 ◽

2019 ◽

Cited By ~ 2

Author(s):

Zachary R. McCaw ◽

Jacqueline M. Lane ◽

Richa Saxena ◽

Susan Redline ◽

Xihong Lin

Keyword(s):

Type I Error ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Operating Characteristics ◽

Association Tests ◽

Genome Wide ◽

Normal Transformation ◽

Normally Distributed

SummaryQuantitative traits analyzed in Genome-Wide Association Studies (GWAS) are often non-normally distributed. For such traits, association tests based on standard linear regression are subject to reduced power and inflated type I error in finite samples. Applying the rank-based Inverse Normal Transformation (INT) to non-normally distributed traits has become common practice in GWAS. However, the different variations on INT-based association testing have not been formally defined, and guidance is lacking on when to use which approach. In this paper, we formally define and systematically compare the direct (D-INT) and indirect (I-INT) INT-based association tests. We discuss their assumptions, underlying generative models, and connections. We demonstrate that the relative powers of D-INT and I-INT depend on the underlying data generating process. Since neither approach is uniformly most powerful, we combine them into an adaptive omnibus test (O-INT). O-INT is robust to model misspecification, protects the type I error, and is well powered against a wide range of non-normally distributed traits. Extensive simulations were conducted to examine the finite sample operating characteristics of these tests. Our results demonstrate that, for non-normally distributed traits, INT-based tests outperform the standard untransformed association test (UAT), both in terms of power and type I error rate control. We apply the proposed methods to GWAS of spirometry traits in the UK Biobank. O-INT has been implemented in the R package RNOmni, which is available on CRAN.

Download Full-text

DESE: estimating driver tissues by selective expression of genes associated with complex diseases or traits

Genome Biology ◽

10.1186/s13059-019-1801-5 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

Lin Jiang ◽

Chao Xue ◽

Sheng Dai ◽

Shangzhen Chen ◽

Peikai Chen ◽

...

Keyword(s):

Association Studies ◽

Complex Diseases ◽

Cell Types ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Unified Framework ◽

Expression Of Genes ◽

Genome Wide ◽

Selective Expression ◽

Disease Associated Genes

Abstract The driver tissues or cell types in which susceptibility genes initiate diseases remain elusive. We develop a unified framework to detect the causal tissues of complex diseases or traits according to selective expression of disease-associated genes in genome-wide association studies (GWASs). This framework consists of three components which run iteratively to produce a converged prioritization list of driver tissues. Additionally, this framework also outputs a list of prioritized genes as a byproduct. We apply the framework to six representative complex diseases or traits with GWAS summary statistics, which leads to the estimation of the lung as an associated tissue of rheumatoid arthritis.

Download Full-text

Evaluation and application of summary statistic imputation to discover new height-associated loci

10.1101/204560 ◽

2017 ◽

Author(s):

Sina Rüeger ◽

Aaron McDaid ◽

Zoltán Kutalik

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Low Frequency ◽

Cost Effective ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Genome Wide ◽

The Uk

AbstractAs most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.Author summaryGenome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.

Download Full-text