scholarly journals Efficient and Powerful Method for Combining P-Values in Genome-Wide Association Studies

2016 ◽  
Vol 13 (6) ◽  
pp. 1100-1106 ◽  
Author(s):  
Natalia Vilor-Tejedor ◽  
Juan R. Gonzalez ◽  
M. Luz Calle
2017 ◽  
Author(s):  
Shrayashi Biswas ◽  
Soumen Pal ◽  
Samsiddhi Bhattacharjee

AbstractTraditional unbiased genome-wide association studies (GWAS) have successfully identified thousands of loci associated with various complex diseases but there is evidence to suggest that many variants were missed at stringent genome-wide thresholds. Fortunately, there is a rapidly increasing amount of prior knowledge in publicly available genomic datasets and biological databases that can be harnessed to enhance the power of discovering SNPs/Genes from existing or new GWAS datasets. For most diseases, many of the identified loci tend to cluster into a few specific biological pathways/networks. From the point of view of disease etiology, such clustering is generally to be expected. This phenomenon can be exploited to conduct a more powerful genome-wide scan that is tailored to identify loci that are interconnected in pathways. We propose a scalable regression-based analytical framework to enable such a pathway-guided GWAS and demonstrate that it provides significant gains in power to detect disease associated SNPs. Our method requires two inputs, namely a) genome-wide summary level data (e.g., SNP p-values) and b) a grouping of genes into biologically meaningful categories (e.g., a database of pathways). It automatically adjusts the input p-values by incorporating the knowledge derived adaptively from the data and the pathways specified. The method involves a regularized logistic regression analysis to derive priors of each SNP and then re-weights the p-values of SNPs so as to maximize overall power of making discoveries. It increases the power to discover SNPs co-clustering into some of these pathways, while maintaining the global type-1 error (FWER) at the desired level. We used whole-genome simulations and summary data from real GWA studies of psoriasis, SLE, coronary artery disease and type-2 diabetes to illustrate the power improvement achieved by pathway-guided search. Our pipeline implemented as an R package can flexibly handle large number of prior annotations possibly derived from multiple databases.


2015 ◽  
Author(s):  
Xia Shen

Motivation: Genome-wide association studies have been conducted in inbred populations where the sample size is small. The ordinary association p-values and multiple testing correction therefore become questionable, as the detected genetic effect may or may not be due to chance, depending on the minor allele frequency distribution across the genome. Instead of permutation testing, marker-specific false positive rate can be analytically calculated in inbred populations without heterozygotes. Results: Solutions of exact p-values for genome-wide association studies in inbred populations were derived and implemented. An example is presented to illustrate that the marker-specific experiment-wise p-value varies as the genome-wide minor allele frequency distribution changes. A simulation using real Arabidopsis thaliana genome indicates that the use of exact p-values improves detection power and reduces inflation due to population structure. An analysis of a defense-related case-control phenotype using the exact p-values revealed the causal locus, where markers with higher MAFs had smaller p-values than the top variants with lower MAFs in ordinary genome-wide association analysis. Availability and Implementation: Project URL: https://r-forge.r-project.org/projects/statomics/. The R package p.exact: https://r-forge.r-project.org/R/?group_id=2030.


2015 ◽  
Author(s):  
Inti Inal Pedroso ◽  
Michael R Barnes ◽  
Anbarasu Lourdusamy ◽  
Ammar Al-Chalabi ◽  
Gerome Breen

Genome-wide association studies (GWAS) have proven a valuable tool to explore the genetic basis of many traits. However, many GWAS lack statistical power and the commonly used single-point analysis method needs to be complemented to enhance power and interpretation. Multivariate region or gene-wide association are an alternative, allowing for identification of disease genes in a manner more robust to allelic heterogeneity. Gene-based association also facilitates systems biology analyses by generating a single p-value per gene. We have designed and implemented FORGE, a software suite which implements a range of methods for the combination of p-values for the individual genetic variants within a gene or genomic region. The software can be used with summary statistics (marker ids and p-values) and accepts as input the result file formats of commonly used genetic association software. When applied to a study of Crohn's disease susceptibility, it identified all genes found by single SNP analysis and additional genes identified by large independent meta-analysis. FORGE p-values on gene-set analyses highlighted association with the Jak-STAT and cytokine signalling pathways, both previously associated with CD. We highlight the software's main features, its future development directions and provide a comparison with alternative available software tools. FORGE can be freely accessed at https://github.com/inti/FORGE.


2018 ◽  
Vol 28 (6) ◽  
pp. 1781-1792
Author(s):  
Flora Alarcon ◽  
Gregory Nuel

Detecting gene-environment (G × E) interactions in the context of genome-wide association studies (GWAS) is a challenging problem since standard methods generally present a lack of power. An additional difficulty arises from the fact that the causal exposure is seldom observed and only a proxy of this exposure is observed. This leads to an additional drop in terms of power and it explains the failure of standard methods in detecting interactions, even very strong ones. In this article, we consider the latent exposure as a source of heterogeneity and we propose a new powerful method, named “Breakpoint Model for Logistic Regression” (BMLR), based on a breakpoint model, in order to detect G × E interactions when causal exposure is unobserved. First, the BMLR method is compared to the ordered-subset analysis for case-control method, which has been developed for the same purpose, through simulations. This highlights the ability of BMLR to detect the heterogeneity, and therefore, to detect interaction with latent exposure. Finally, the BMLR method is compared to standard methods, such as Plink, to perform a GWAS on a published realistic benchmark.


2010 ◽  
Vol 49 (06) ◽  
pp. 632-640 ◽  
Author(s):  
J. Hebebrand ◽  
H.-E. Wichmann ◽  
K.-H. Jöckel ◽  
A. Scherag

Summary Background: Genome-wide association studies (GWAS) were highly successful in identifying new susceptibility loci of complex traits. Such studies usually start with genotyping fixed arrays of genetic markers in an initial sample. Out of these markers, some are selected which will be further genotyped in independentsamples. Due tothevery low a priori probability of a true positive association, the vast majority of all marker signals will turn out to be false positive. Thus, several methods to sort marker data have been proposed which will be evaluated here. Objectives: We compared statistical properties of ranking by p-values, q-values, the False Positive Report Probability (FPRP) and the Bayesian False-Discovery Probability (BFDP). Methods: We performed simulation studies for a genomic region derived from GWAS data sets and calculated descriptive statistics as well as mean square errors with regard to the true marker ranking. Additionally, we applied all measures to a GWAS for early onset extreme obesity superimposing a priori information on candidate genes. Results: Despite the known, more extreme probability results for traditional p-values, we observed that both p-values and the BFDP were more precise in reconstructing the “true” order of the markers in a region. In addition, the BFDP was useful to attenuate unexpected effects at a genome-wide scale. Conclusions: For the purpose of selecting markers from an initial GWAS and within the limits of this study, we recommend either ranking by p-values or the application of a full Bayesian approach for which the BFDP is a first approximation.


2019 ◽  
Author(s):  
Ignacio Aguilar ◽  
Andres Legarra ◽  
Fernando Cardoso ◽  
Yutaka Masuda ◽  
Daniela Lourenco ◽  
...  

ABSTRACTBACKGROUNDSingle Step GBLUP (SSGBLUP) is the most comprehensive method for genomic prediction. Point estimates of marker effects from SSGBLUP are often used for Genome Wide Association Studies (GWAS) without a formal framework of hypothesis testing. Our objective was to implement p-values for GWAS studies in the ssGBLUP framework, showing algorithms, computational procedures, and an application to a large beef cattle population.METHODSP-values were obtained based on the prediction error (co)variance for SNP, which uses the inverse of the coefficient matrix and formulas to compute SNP effects.RESULTSComputation of p-values took a negligible time for a dataset with almost 2 million animals in the pedigree and 1424 genotyped sires, and no inflation was observed. The SNP passing the Bonferroni threshold of 5.9 in the −log10 scale were the same as those that explained the highest proportion of additive genetic variance, but the latter was penalized (as GWAS signal) by low allele frequency.CONCLUSIONThe exact p-value for SSGWAS is a very general and efficient strategy for QTL detection and testing. It can be used in complex data sets such as used in animal breeding, where only a proportion of pedigreed animals are genotyped.


Sign in / Sign up

Export Citation Format

Share Document