Optimum two-stage designs in case–control association studies using false discovery rate

In genome-wide association studies, we normally discover associations between genetic variants and diseases/traits in primary studies, and validate the findings in replication studies. We consider the associations identified in both primary and replication studies as true findings. An important question under this two-stage setting is how to determine significance levels in both studies. In traditional methods, significance levels of the primary and replication studies are determined separately. We argue that the separate determination strategy reduces the power in the overall two-stage study. Therefore, we propose a novel method to determine significance levels jointly. Our method is a reanalysis method that needs summary statistics from both studies. We find the most powerful significance levels when controlling the false discovery rate in the two-stage study. To enjoy the power improvement from the joint determination method, we need to select single nucleotide polymorphisms for replication at a less stringent significance level. This is a common practice in studies designed for discovery purpose. We suggest this practice is also suitable in studies with validation purpose in order to identify more true findings. Simulation experiments show that our method can provide more power than traditional methods and that the false discovery rate is well-controlled. Empirical experiments on datasets of five diseases/traits demonstrate that our method can help identify more associations. The R-package is available at: http://bioinformatics.ust.hk/RFdr.html .

Download Full-text

Optimal DNA Pooling-Based Two-Stage Designs in Case-Control Association Studies

Human Heredity ◽

10.1159/000164398 ◽

2008 ◽

Vol 67 (1) ◽

pp. 46-56 ◽

Cited By ~ 10

Author(s):

Yihong Zhao ◽

Shuang Wang

Keyword(s):

Association Studies ◽

Case Control ◽

Dna Pooling ◽

Two Stage ◽

Control Association

Download Full-text

Comparison of one-stage and two-stage genome-wide association studies

10.1101/099291 ◽

2017 ◽

Author(s):

Shang Xue ◽

Funda Ogut ◽

Zachary Miller ◽

Janu Verma ◽

Peter J. Bradbury ◽

...

Keyword(s):

False Discovery Rate ◽

Association Studies ◽

Experimental Designs ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Two Stage ◽

False Discovery ◽

Genome Wide ◽

Stage Analysis

AbstractLinear mixed models are widely used in humans, animals, and plants to conduct genome-wide association studies (GWAS). A characteristic of experimental designs for plants is that experimental units are typically multiple-plant plots of families or lines that are replicated across environments. This structure can present computational challenges to conducting a genome scan on raw (plot-level) data. Two-stage methods have been proposed to reduce the complexity and increase the computational speed of whole-genome scans. The first stage of the analysis fits raw data to a model including environment and line effects, but no individual marker effects. The second stage involves the whole genome scan of marker tests using summary values for each line as the dependent variable. Missing data and unbalanced experimental designs can result in biased estimates of marker association effects from two-stage analyses. In this study, we developed a weighted two-stage analysis to reduce bias and improve power of GWAS while maintaining the computational efficiency of two-stage analyses. Simulation based on real marker data of a diverse panel of maize inbred lines was used to compare power and false discovery rate of the new weighted two-stage method to single-stage and other two-stage analyses and to compare different two-stage models. In the case of severely unbalanced data, only the weighted two-stage GWAS has power and false discovery rate similar to the one-stage analysis. The weighted GWAS method has been implemented in the open-source software TASSEL.

Download Full-text