scholarly journals Controlling the Rate of GWAS False Discoveries

2016 ◽  
Author(s):  
Damian Brzyski ◽  
Christine B. Peterson ◽  
Piotr Sobczyk ◽  
Emmanuel J. Candés ◽  
Malgorzata Bogdan ◽  
...  

AbstractWith the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on pre-screening to identify the level of resolution of distinct hypotheses. We show how FDR controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single marker and multivariate regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the NFBC66 cohort study.

2016 ◽  
Author(s):  
Valentina Iotchkova ◽  
Graham R.S. Ritchie ◽  
Matthias Geihs ◽  
Sandro Morganella ◽  
Josine L. Min ◽  
...  

Loci discovered by genome-wide association studies (GWAS) predominantly map outside protein-coding genes. The interpretation of functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages GWAS findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding that current methods do not offer. We further assess enrichment statistics for 27 GWAS traits within regulatory regions from the ENCODE and Roadmap projects. We characterise unique enrichment patterns for traits and annotations, driving novel biological insights. The method is implemented in standalone software and R package to facilitate its application by the research community.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Matthias Munz ◽  
Inken Wohlers ◽  
Eric Simon ◽  
Tobias Reinberger ◽  
Hauke Busch ◽  
...  

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).


2010 ◽  
Vol 49 (06) ◽  
pp. 625-631
Author(s):  
H. Schäfer ◽  
B. H. Greene

Summary Background: Genome-wide association studies (GWAS) have been used successfully to identify genetic loci associated with complex diseases and phenotypes. Often this association takes the form of several significant signals (such as small p-values) in a univariate analysis at various markers within a single genetic region. Once confirmed, these associations lead to the question if a single marker tags the association signal of another, functionally relevant variant or if the single marker tags a functionally relevant haplo-type. To deal with this question, methods for family data based on logistic regression, adaptations of the transmission/disequilibrium test (TDT) or weighted haplotype likelihood (WHL) methods have been proposed in the literature. Objectives: Objectives were to examine the effect of parameters such as sample size, inheritance model, and the effects of linkage disequilibrium (LD) in the region on the ability of a selection of methods to detect an independent effect from an additional locus. Methods: All methods tested were applied to simulated genetic data of trios comprising a single affected offspring and two parents. Results: While regression-based methods have advantages such as model flexibility, potentially increasing power, the WHL method was more robust against increasing LD in the scenarios analyzed. Conclusions: Simulation results suggest that the regression and WHL methods are better able with regard to statistical power than the adaptation of the TDT analyzed here to detect genetic effects at an additional locus while controlling for confounding at another locus.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1300 ◽  
Author(s):  
Elisabetta Manca ◽  
Alberto Cesarani ◽  
Giustino Gaspa ◽  
Silvia Sorbolini ◽  
Nicolò P.P. Macciotta ◽  
...  

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.


2010 ◽  
Vol 25 (5) ◽  
pp. 307-309 ◽  
Author(s):  
J. Lasky-Su ◽  
C. Lange

AbstractThe etiology of suicide is complex in nature with both environmental and genetic causes that are extremely diverse. This extensive heterogeneity weakens the relationship between genotype and phenotype and as a result, we face many challenges when studying the genetic etiology of suicide. We are now in the midst of a genetics revolution, where genotyping costs are decreasing and genotyping speed is increasing at a fast rate, allowing genetic association studies to genotype thousands to millions of SNPs that cover the entire human genome. As such, genome-wide association studies (GWAS) are now the norm. In this article we address several statistical challenges that occur when studying the genetic etiology of suicidality in the age of the genetics revolution. These challenges include: (1) the large number of statistical tests; (2) complex phenotypes that are difficult to quantify; and (3) modest genetic effect sizes. We address these statistical issues in the context of family-based study designs. Specifically, we discuss several statistical extensions of family-based association tests (FBATs) that work to alleviate these challenges. As our intention is to describe how statistical methodology may work to identify disease variants for suicidality, we avoid the mathematical details of the methodologies presented.


2017 ◽  
Vol 28 (7) ◽  
pp. 1927-1941
Author(s):  
Jiyuan Hu ◽  
Wei Zhang ◽  
Xinmin Li ◽  
Dongdong Pan ◽  
Qizhai Li

In the past decade, genome-wide association studies have identified thousands of susceptible variants associated with complex human diseases and traits. Conducting follow-up genetic association studies has become a standard approach to validate the findings of genome-wide association studies. One problem of high interest in genetic association studies is to accurately estimate the strength of the association, which is often quantified by odds ratios in case-control studies. However, estimating the association directly by follow-up studies is inefficient since this approach ignores information from the genome-wide association studies. In this article, an estimator called GFcom, which integrates information from genome-wide association studies and follow-up studies, is proposed. The estimator includes both the point estimate and corresponding confidence interval. GFcom is more efficient than competing estimators regarding MSE and the length of confidence intervals. The superiority of GFcom is particularly evident when the genome-wide association study suffers from severe selection bias. Comprehensive simulation studies and applications to three real follow-up studies demonstrate the performance of the proposed estimator. An R package, “GFcom”, implementing our method is publicly available at https://github.com/JiyuanHu/GFcom .


2015 ◽  
Vol 134 (1) ◽  
pp. 28-39 ◽  
Author(s):  
Inka Gawenda ◽  
Patrick Thorwarth ◽  
Torsten Günther ◽  
Frank Ordon ◽  
Karl J. Schmid

Sign in / Sign up

Export Citation Format

Share Document