Optimal two-stage genome-wide association designs based on false discovery rate

In genome-wide association studies, we normally discover associations between genetic variants and diseases/traits in primary studies, and validate the findings in replication studies. We consider the associations identified in both primary and replication studies as true findings. An important question under this two-stage setting is how to determine significance levels in both studies. In traditional methods, significance levels of the primary and replication studies are determined separately. We argue that the separate determination strategy reduces the power in the overall two-stage study. Therefore, we propose a novel method to determine significance levels jointly. Our method is a reanalysis method that needs summary statistics from both studies. We find the most powerful significance levels when controlling the false discovery rate in the two-stage study. To enjoy the power improvement from the joint determination method, we need to select single nucleotide polymorphisms for replication at a less stringent significance level. This is a common practice in studies designed for discovery purpose. We suggest this practice is also suitable in studies with validation purpose in order to identify more true findings. Simulation experiments show that our method can provide more power than traditional methods and that the false discovery rate is well-controlled. Empirical experiments on datasets of five diseases/traits demonstrate that our method can help identify more associations. The R-package is available at: http://bioinformatics.ust.hk/RFdr.html .

Download Full-text

Comparison of one-stage and two-stage genome-wide association studies

10.1101/099291 ◽

2017 ◽

Author(s):

Shang Xue ◽

Funda Ogut ◽

Zachary Miller ◽

Janu Verma ◽

Peter J. Bradbury ◽

...

Keyword(s):

False Discovery Rate ◽

Association Studies ◽

Experimental Designs ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Two Stage ◽

False Discovery ◽

Genome Wide ◽

Stage Analysis

AbstractLinear mixed models are widely used in humans, animals, and plants to conduct genome-wide association studies (GWAS). A characteristic of experimental designs for plants is that experimental units are typically multiple-plant plots of families or lines that are replicated across environments. This structure can present computational challenges to conducting a genome scan on raw (plot-level) data. Two-stage methods have been proposed to reduce the complexity and increase the computational speed of whole-genome scans. The first stage of the analysis fits raw data to a model including environment and line effects, but no individual marker effects. The second stage involves the whole genome scan of marker tests using summary values for each line as the dependent variable. Missing data and unbalanced experimental designs can result in biased estimates of marker association effects from two-stage analyses. In this study, we developed a weighted two-stage analysis to reduce bias and improve power of GWAS while maintaining the computational efficiency of two-stage analyses. Simulation based on real marker data of a diverse panel of maize inbred lines was used to compare power and false discovery rate of the new weighted two-stage method to single-stage and other two-stage analyses and to compare different two-stage models. In the case of severely unbalanced data, only the weighted two-stage GWAS has power and false discovery rate similar to the one-stage analysis. The weighted GWAS method has been implemented in the open-source software TASSEL.

Download Full-text

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Statistical Methods & Applications ◽

10.1007/s10260-021-00560-y ◽

2021 ◽

Author(s):

Ali Karimnezhad

Keyword(s):

Data Analysis ◽

False Discovery Rate ◽

Efficient Method ◽

Genome Wide Association ◽

Local False Discovery Rate ◽

Rate Estimation ◽

False Discovery ◽

Genome Wide ◽

False Discovery Rate Estimation ◽

Association Data

Download Full-text

Hidden Markov Models for Controlling False Discovery Rate in Genome-Wide Association Analysis

Next Generation Microarray Bioinformatics - Methods in Molecular Biology ◽

10.1007/978-1-61779-400-1_22 ◽

2011 ◽

pp. 337-344 ◽

Cited By ~ 2

Author(s):

Zhi Wei

Keyword(s):

False Discovery Rate ◽

Hidden Markov Models ◽

Association Analysis ◽

Markov Models ◽

Hidden Markov ◽

Genome Wide Association ◽

Genome Wide Association Analysis ◽

False Discovery ◽

Genome Wide

Download Full-text

Assessment of Power and False Discovery Rate in Genome-Wide Association Studies using the BarleyCAP Germplasm

Crop Science ◽

10.2135/cropsci2010.02.0064 ◽

2011 ◽

Vol 51 (1) ◽

pp. 52-59 ◽

Cited By ~ 35

Author(s):

Peter Bradbury ◽

Thomas Parker ◽

Martha T. Hamblin ◽

Jean-Luc Jannink

Keyword(s):

False Discovery Rate ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

False Discovery ◽

Genome Wide

Download Full-text

SEMI-PARAMETRIC COVARIATE-MODULATED LOCAL FALSE DISCOVERY RATE FOR GENOME-WIDE ASSOCIATION STUDIES

10.1101/183384 ◽

2017 ◽

Author(s):

Rong W. Zablocki ◽

Richard A. Levine ◽

Andrew J. Schork ◽

Shujing Xu ◽

Yunpeng Wang ◽

...

Keyword(s):

False Discovery Rate ◽

Complex Traits ◽

Association Studies ◽

Logistic Function ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Local False Discovery Rate ◽

B Spline ◽

False Discovery ◽

Genome Wide

While genome-wide association studies (GWAS) have discovered thousands of risk loci for heritable disorders, so far even very large meta-analyses have recovered only a fraction of the heritability of most complex traits. Recent work utilizing variance components models has demonstrated that a larger fraction of the heritability of complex phenotypes is captured by the additive effects of SNPs than is evident only in loci surpassing genome-wide significance thresholds, typically set at a Bonferroni-inspired p ≤ 5 x 10-8. Procedures that control false discovery rate can be more powerful, yet these are still under-powered to detect the majority of non-null effects from GWAS. The current work proposes a novel Bayesian semi-parametric two-group mixture model and develops a Markov Chain Monte Carlo (MCMC) algorithm for a covariate-modulated local false discovery rate (cmfdr). The probability of being non-null depends on a set of covariates via a logistic function, and the non-null distribution is approximated as a linear combination of B-spline densities, where the weight of each B-spline density depends on a multinomial function of the covariates. The proposed methods were motivated by work on a large meta-analysis of schizophrenia GWAS performed by the Psychiatric Genetics Consortium (PGC). We show that the new cmfdr model fits the PGC schizophrenia GWAS test statistics well, performing better than our previously proposed parametric gamma model for estimating the non-null density and substantially improving power over usual fdr. Using loci declared significant at cmfdr ≤ 0.20, we perform follow-up pathway analyses using the Kyoto Encyclopedia of Genes and Genomes (KEGG) homo sapiens pathways database. We demonstrate that the increased yield from the cmfdr model results in an improved ability to test for pathways associated with schizophrenia compared to using those SNPs selected according to usual fdr.

Download Full-text

Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btw690 ◽

2016 ◽

pp. btw690 ◽

Cited By ~ 2

Author(s):

Wei Jiang ◽

Weichuan Yu

Keyword(s):

False Discovery Rate ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Local False Discovery Rate ◽

False Discovery ◽

Genome Wide

Download Full-text

Universal False Discovery Rate Estimation Methodology for Genome-Wide Association Studies

Human Heredity ◽

10.1159/000112365 ◽

2007 ◽

Vol 65 (4) ◽

pp. 183-194 ◽

Cited By ~ 11

Author(s):

Karl Forner ◽

Marc Lamarine ◽

Mickaël Guedj ◽

Jérôme Dauvillier ◽

Jérôme Wojcik

Keyword(s):

False Discovery Rate ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Rate Estimation ◽

False Discovery ◽

Genome Wide ◽

False Discovery Rate Estimation

Download Full-text

False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1663 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 4

Author(s):

Ismaïl Ahmed ◽

Anna-Liisa Hartikainen ◽

Marjo-Riitta Järvelin ◽

Sylvia Richardson

Keyword(s):

False Discovery Rate ◽

Decision Rule ◽

Upper Bound ◽

Association Studies ◽

Penalized Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Stability Selection ◽

False Discovery ◽

Genome Wide

Stability Selection, which combines penalized regression with subsampling, is a promising algorithm to perform variable selection in ultra high dimension. This work is motivated by its evaluation in the context of genome-wide association studies (GWAS). One critical aspect for its use lies in the choice of a decision rule that accounts for the massive number of comparisons realised. The current decision rule relies on the control of the Family Wise Error Rate (FWER) by means of an upper bound derived theoretically. Alternatively, we propose to set the detection threshold according to the more liberal false discovery rate (FDR) criterion. The procedure we propose for its estimation relies on permutations. This procedure is evaluated by simulations according to several scenarios mimicking various correlation structures of genetic data and is compared to the original FWER upper bound. The proposed procedure is shown to be less conservative, and able to pick up more true signals than the FWER upper bound. Finally, the proposed methodology is illustrated on a GWAS analysis of a lipid phenotype (high-density lipoproteins, HDL) in the Northern Finland Birth Cohort.

Download Full-text