scholarly journals Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies

2016 ◽  
pp. btw690 ◽  
Author(s):  
Wei Jiang ◽  
Weichuan Yu
2017 ◽  
Author(s):  
Rong W. Zablocki ◽  
Richard A. Levine ◽  
Andrew J. Schork ◽  
Shujing Xu ◽  
Yunpeng Wang ◽  
...  

While genome-wide association studies (GWAS) have discovered thousands of risk loci for heritable disorders, so far even very large meta-analyses have recovered only a fraction of the heritability of most complex traits. Recent work utilizing variance components models has demonstrated that a larger fraction of the heritability of complex phenotypes is captured by the additive effects of SNPs than is evident only in loci surpassing genome-wide significance thresholds, typically set at a Bonferroni-inspired p ≤ 5 x 10-8. Procedures that control false discovery rate can be more powerful, yet these are still under-powered to detect the majority of non-null effects from GWAS. The current work proposes a novel Bayesian semi-parametric two-group mixture model and develops a Markov Chain Monte Carlo (MCMC) algorithm for a covariate-modulated local false discovery rate (cmfdr). The probability of being non-null depends on a set of covariates via a logistic function, and the non-null distribution is approximated as a linear combination of B-spline densities, where the weight of each B-spline density depends on a multinomial function of the covariates. The proposed methods were motivated by work on a large meta-analysis of schizophrenia GWAS performed by the Psychiatric Genetics Consortium (PGC). We show that the new cmfdr model fits the PGC schizophrenia GWAS test statistics well, performing better than our previously proposed parametric gamma model for estimating the non-null density and substantially improving power over usual fdr. Using loci declared significant at cmfdr ≤ 0.20, we perform follow-up pathway analyses using the Kyoto Encyclopedia of Genes and Genomes (KEGG) homo sapiens pathways database. We demonstrate that the increased yield from the cmfdr model results in an improved ability to test for pathways associated with schizophrenia compared to using those SNPs selected according to usual fdr.


2017 ◽  
Vol 11 (4) ◽  
pp. 2252-2269 ◽  
Author(s):  
Rong W. Zablocki ◽  
Richard A. Levine ◽  
Andrew J. Schork ◽  
Shujing Xu ◽  
Yunpeng Wang ◽  
...  

2014 ◽  
Vol 30 (15) ◽  
pp. 2098-2104 ◽  
Author(s):  
Rong W. Zablocki ◽  
Andrew J. Schork ◽  
Richard A. Levine ◽  
Ole A. Andreassen ◽  
Anders M. Dale ◽  
...  

Crop Science ◽  
2011 ◽  
Vol 51 (1) ◽  
pp. 52-59 ◽  
Author(s):  
Peter Bradbury ◽  
Thomas Parker ◽  
Martha T. Hamblin ◽  
Jean-Luc Jannink

2017 ◽  
Vol 27 (9) ◽  
pp. 2795-2808 ◽  
Author(s):  
Wei Jiang ◽  
Weichuan Yu

In genome-wide association studies, we normally discover associations between genetic variants and diseases/traits in primary studies, and validate the findings in replication studies. We consider the associations identified in both primary and replication studies as true findings. An important question under this two-stage setting is how to determine significance levels in both studies. In traditional methods, significance levels of the primary and replication studies are determined separately. We argue that the separate determination strategy reduces the power in the overall two-stage study. Therefore, we propose a novel method to determine significance levels jointly. Our method is a reanalysis method that needs summary statistics from both studies. We find the most powerful significance levels when controlling the false discovery rate in the two-stage study. To enjoy the power improvement from the joint determination method, we need to select single nucleotide polymorphisms for replication at a less stringent significance level. This is a common practice in studies designed for discovery purpose. We suggest this practice is also suitable in studies with validation purpose in order to identify more true findings. Simulation experiments show that our method can provide more power than traditional methods and that the false discovery rate is well-controlled. Empirical experiments on datasets of five diseases/traits demonstrate that our method can help identify more associations. The R-package is available at: http://bioinformatics.ust.hk/RFdr.html .


2018 ◽  
Author(s):  
Holly Trochet ◽  
Matti Pirinen ◽  
Gavin Band ◽  
Luke Jostins ◽  
Gilean McVean ◽  
...  

AbstractGenome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared to standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case-Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for, a range of possible true patterns of association across studies in a computationally efficient framework.


2020 ◽  
Author(s):  
Reza Nasirigerdeh ◽  
Reihaneh Torkzadehmahani ◽  
Julian Matschinske ◽  
Tobias Frisch ◽  
Markus List ◽  
...  

ABSTRACTGenome-wide association studies (GWAS) have been widely used to unravel connections between genetic variants and diseases. Larger sample sizes in GWAS can lead to discovering more associations and more accurate genetic predictors. However, sharing and combining distributed genomic data to increase the sample size is often challenging or even impossible due to privacy concerns and privacy protection laws such as the GDPR. While meta-analysis has been established as an effective approach to combine summary statistics of several GWAS, its accuracy can be attenuated in the presence of cross-study heterogeneity. Here, we present sPLINK (safe PLINK), a user-friendly tool, which performs federated GWAS on distributed datasets while preserving the privacy of data and the accuracy of the results. sPLINK neither exchanges raw data nor does it rely on summary statistics. Instead, it performs model training in a federated manner, communicating only model parameters between cohorts and a central server. We verify that the federated results from sPLINK are the same as those from aggregated analyses conducted with PLINK. We demonstrate that sPLINK is robust against heterogeneous data (phenotype and confounding factors) distributions across cohorts while existing meta-analysis tools considerably lose accuracy in such scenarios. We also show that sPLINK achieves practical runtime, in order of minutes or hours, and acceptable network bandwidth consumption for chi-square and linear/logistic regression tests. Federated analysis with sPLINK, thus, has the potential to replace meta-analysis as the gold standard for collaborative GWAS. The user-friendly, readily usable sPLINK tool is available at https://exbio.wzw.tum.de/splink.


2007 ◽  
Vol 65 (4) ◽  
pp. 183-194 ◽  
Author(s):  
Karl Forner ◽  
Marc Lamarine ◽  
Mickaël Guedj ◽  
Jérôme Dauvillier ◽  
Jérôme Wojcik

Author(s):  
Ismaïl Ahmed ◽  
Anna-Liisa Hartikainen ◽  
Marjo-Riitta Järvelin ◽  
Sylvia Richardson

Stability Selection, which combines penalized regression with subsampling, is a promising algorithm to perform variable selection in ultra high dimension. This work is motivated by its evaluation in the context of genome-wide association studies (GWAS). One critical aspect for its use lies in the choice of a decision rule that accounts for the massive number of comparisons realised. The current decision rule relies on the control of the Family Wise Error Rate (FWER) by means of an upper bound derived theoretically. Alternatively, we propose to set the detection threshold according to the more liberal false discovery rate (FDR) criterion. The procedure we propose for its estimation relies on permutations. This procedure is evaluated by simulations according to several scenarios mimicking various correlation structures of genetic data and is compared to the original FWER upper bound. The proposed procedure is shown to be less conservative, and able to pick up more true signals than the FWER upper bound. Finally, the proposed methodology is illustrated on a GWAS analysis of a lipid phenotype (high-density lipoproteins, HDL) in the Northern Finland Birth Cohort.


Sign in / Sign up

Export Citation Format

Share Document