scholarly journals An Omnibus Test for Detecting Multiple Phenotype Associations Based on GWAS Summary Level Data

2021 ◽  
Vol 12 ◽  
Author(s):  
Wei Liu ◽  
Yunshan Guo ◽  
Zhonghua Liu

Abundant Genome-wide association study (GWAS) findings have reflected the sharing of genetic variants among multiple phenotypes. Exploring the association between genetic variants and multiple traits can provide novel insights into the biological mechanism of complex human traits. In this article, we proposed to apply the generalized Berk-Jones (GBJ) test and the generalized higher criticism (GHC) test to identify the genetic variants that affect multiple traits based on GWAS summary statistics. To be more robust to different gene-multiple traits association patterns across the whole genome, we proposed an omnibus test (OMNI) by using the aggregated Cauchy association test. We conducted extensive simulation studies to investigate the type one error rates and compare the powers of the proposed tests (i.e., the GBJ, GHC and OMNI tests) and the existing tests (i.e., the minimum of the p-values (MinP) and the cross-phenotype association test (CPASSOC) in a wide range of simulation settings. We found that all of these methods could control the type one error rates well and the proposed OMNI test has robust power. We applied those methods to the summary statistics dataset from Global Lipids Genetics Consortium and identified 19 new genetic variants that were missed by the original single trait association analysis.

2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Hanna Julienne ◽  
Pierre Lechat ◽  
Vincent Guillemot ◽  
Carla Lasry ◽  
Chunzi Yao ◽  
...  

Abstract Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.


2018 ◽  
Author(s):  
Bin Guo ◽  
Baolin Wu

AbstractGenetics hold great promise to precision medicine by tailoring treatment to the individual patient based on their genetic profiles. Toward this goal, many large-scale genome-wide association studies (GWAS) have been performed in the last decade to identify genetic variants associated with various traits and diseases. They have successfully identified tens of thousands of disease-related variants. However they have explained only a small proportion of the overall trait heritability for most traits and are of very limited clinical use. This is partly owing to the small effect sizes of most genetic variants, and the common practice of “testing association between one trait and one genetic variant at a time” in most GWAS, even when multiple related traits are often measured for each individual. Increasing evidence suggests that many genetic variants can influence multiple traits simultaneously, and we can gain more power by testing association of multiple traits simultaneously. It is appealing to develop novel multi-trait association test methods that need only GWAS summary data, since it is generally very hard to access the individual-level GWAS phenotype and genotype data.Most existing GWAS summary data based association test methods have relied on ad hoc approach or crude Monte Carlo approximation. In this paper we develop rigorous statistical methods for efficient and powerful multi-trait association test. We develop robust and efficient methods to accurately estimate the marginal trait correlation matrix using only GWAS summary data. We construct the principal component (PC) based association test from the summary statistics. PC based test has optimal power when the underlying multi-trait signal can be captured by the first PC, and otherwise it will have suboptimal performance. We develop an adaptive test by optimally weighting the PC based test and the omnibus chi-square test to achieve robust performance under various scenarios. We develop efficient numerical algorithms to compute the analytical p-values for all the proposed tests without the need of Monte Carlo sampling. We illustrate the utility of proposed methods through application to the GWAS meta-analysis summary data for multiple lipids and glycemic traits. We identify multiple novel loci that were missed by individual trait based association test.All the proposed methods are implemented in an R package available at http://www.github.com/baolinwu/MTAR. The developed R programs are extremely efficient: it takes less than two minutes to compute the list of genome-wide significant SNPs for all proposed multi-trait tests for the lipids GWAS summary data with 2.5 million SNPs on a single Linux desktop.


Author(s):  
Matthew Lyon ◽  
Shea J Andrews ◽  
Ben Elsworth ◽  
Tom R Gaunt ◽  
Gibran Hemani ◽  
...  

Genome-wide association study (GWAS) summary statistics are a fundamental resource for a variety of research applications 1–6. Yet despite their widespread utility, no common storage format has been widely adopted, hindering tool development and data sharing, analysis and integration. Existing tabular formats 7,8 often ambiguously or incompletely store information about genetic variants and their associations, and also lack essential metadata increasing the possibility of errors in data interpretation and post-GWAS analyses. Additionally, data in these formats are typically not indexed, requiring the whole file to be read which is computationally inefficient. To address these issues, we propose an adaptation of the variant call format9 (GWAS-VCF) and have produced a suite of open-source tools for using this format in downstream analyses. Simulation studies determine GWAS-VCF is 9-46x faster than tabular alternatives when extracting variant(s) by genomic position. Our results demonstrate the GWAS-VCF provides a robust and performant solution for sharing, analysis and integration of GWAS data. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (available from: https://gwas.mrcieu.ac.uk).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Peitao Wu ◽  
Biqi Wang ◽  
Steven A. Lubitz ◽  
Emelia J. Benjamin ◽  
James B. Meigs ◽  
...  

AbstractBecause single genetic variants may have pleiotropic effects, one trait can be a confounder in a genome-wide association study (GWAS) that aims to identify loci associated with another trait. A typical approach to address this issue is to perform an additional analysis adjusting for the confounder. However, obtaining conditional results can be time-consuming. We propose an approximate conditional phenotype analysis based on GWAS summary statistics, the covariance between outcome and confounder, and the variant minor allele frequency (MAF). GWAS summary statistics and MAF are taken from GWAS meta-analysis results while the traits covariance may be estimated by two strategies: (i) estimates from a subset of the phenotypic data; or (ii) estimates from published studies. We compare our two strategies with estimates using individual level data from the full GWAS sample (gold standard). A simulation study for both binary and continuous traits demonstrates that our approximate approach is accurate. We apply our method to the Framingham Heart Study (FHS) GWAS and to large-scale cardiometabolic GWAS results. We observed a high consistency of genetic effect size estimates between our method and individual level data analysis. Our approach leads to an efficient way to perform approximate conditional analysis using large-scale GWAS summary statistics.


2019 ◽  
Author(s):  
Jianjun Zhang ◽  
Qiuying Sha ◽  
Guanfu Liu ◽  
Xuexia Wang

AbstractThere is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases for which multiple correlated traits are often measured. Joint analysis of multiple traits could increase statistical power by aggregating multiple weak effects. Existing methods for multiple trait association tests usually study each of the multiple traits separately and then combine the univariate test statistics or combine p-values of the univariate tests for identifying disease associated genetic variants. However, ignoring correlation between phenotypes may cause power loss. Additionally, the genetic variants in one gene (including common and rare variants) are often viewed as a whole that affects the underlying disease since the basic functional unit of inheritance is a gene rather than a genetic variant. Thus, results from gene level association test can be more readily integrated with downstream functional and pathogenic investigation, whereas many existing methods for multiple trait association tests only focus on testing a single common variant rather than a gene. In this article, we propose a statistical method by Testing an Optimally Weighted Combination of Multiple traits (TOW-CM) to test the association between multiple traits and multiple variants in a genomic region (a gene or pathway). We investigate the performance of the proposed method through extensive simulation studies. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. In addition, we illustrate the usefulness of TOW-CM by analyzing a whole-genome genotyping data from a COPDGene study.


2016 ◽  
Author(s):  
Il-Youp Kwak ◽  
Wei Pan

AbstractTo identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene-or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP-and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or p-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods.The methods are implemented in R package aSPU, freely and publicly available on CRAN at: https://cran.r-project.org/web/packages/aSPU/.


2018 ◽  
Vol 35 (13) ◽  
pp. 2251-2257 ◽  
Author(s):  
Bin Guo ◽  
Baolin Wu

Abstract Motivation Genetics hold great promise to precision medicine by tailoring treatment to the individual patient based on their genetic profiles. Toward this goal, many large-scale genome-wide association studies (GWAS) have been performed in the last decade to identify genetic variants associated with various traits and diseases. They have successfully identified tens of thousands of disease-related variants. However they have explained only a small proportion of the overall trait heritability for most traits and are of very limited clinical use. This is partly owing to the small effect sizes of most genetic variants, and the common practice of testing association between one trait and one genetic variant at a time in most GWAS, even when multiple related traits are often measured for each individual. Increasing evidence suggests that many genetic variants can influence multiple traits simultaneously, and we can gain more power by testing association of multiple traits simultaneously. It is appealing to develop novel multi-trait association test methods that need only GWAS summary data, since it is generally very hard to access the individual-level GWAS phenotype and genotype data. Results Many existing GWAS summary data-based association test methods have relied on ad hoc approach or crude Monte Carlo approximation. In this article, we develop rigorous statistical methods for efficient and powerful multi-trait association test. We develop robust and efficient methods to accurately estimate the marginal trait correlation matrix using only GWAS summary data. We construct the principal component (PC)-based association test from the summary statistics. PC-based test has optimal power when the underlying multi-trait signal can be captured by the first PC, and otherwise it will have suboptimal performance. We develop an adaptive test by optimally weighting the PC-based test and the omnibus chi-square test to achieve robust performance under various scenarios. We develop efficient numerical algorithms to compute the analytical P-values for all the proposed tests without the need of Monte Carlo sampling. We illustrate the utility of proposed methods through application to the GWAS meta-analysis summary data for multiple lipids and glycemic traits. We identify multiple novel loci that were missed by individual trait-based association test. Availability and implementation All the proposed methods are implemented in an R package available at http://www.github.com/baolinwu/MTAR. The developed R programs are extremely efficient: it takes less than 2 min to compute the list of genome-wide significant single nucleotide polymorphisms (SNPs) for all proposed multi-trait tests for the lipids GWAS summary data with 2.5 million SNPs on a single Linux desktop. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 130-130
Author(s):  
Yury Loika ◽  
Alexander Kulminski

Abstract The connections between genes and multifactorial polygenic age-related traits are not trivial due to complexity of metabolic networks in an organism, which were primarily adapted to maximize fitness at reproductive age in ancient environments. Given this complexity, pleiotropy in predisposition to complex traits appears to be common phenomenon. Identifying mechanisms of pleiotropic predisposition to multiple age-related traits can be a key factor in developing strategies for extending health-span and lifespan. Correlation between complex traits may be a factor shedding light on these mechanisms. Recently, we used an omnibus test leveraging correlation between multiple age-related traits to gain insights into pleiotropic predisposition to them. The analysis using individual-level data identified large number of new pleiotropic loci and highlighted a novel phenomenon of antagonistic genetic heterogeneity, which was characterized by antagonistic directions of genetic effects for directly correlated traits. Here, we demonstrate feasibility of our approach using summary statistics from univariate genome-wide (GW) association studies (GWAS). Our analysis focused on the results for high density lipoprotein cholesterol (HDL-C) and triglycerides (TG) from the Global Lipids Genetic Consortium, which reported 94 GW significant loci (p≤5×10-8). The traits’ correlation was estimated from the individual level data. Our approach identified 28 loci with pleiotropic predisposition to HDL-C and TG at p≤5×10-8, which did not attain univariate GW significance with either of these traits. Fifteen of them (53%) demonstrated antagonistic heterogeneity. These results show that our approach can be efficiently used in the analysis of summary statistics from published studies to identify novel pleiotropic loci.


2016 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.


2021 ◽  
Vol 12 ◽  
Author(s):  
Liwan Fu ◽  
Yuquan Wang ◽  
Tingting Li ◽  
Yue-Qing Hu

As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.


Sign in / Sign up

Export Citation Format

Share Document