Integrating co-expression networks with GWAS to prioritize causal genes in maize
AbstractBackgroundGenome wide association studies (GWAS) have identified thousands of loci linked to hundreds of traits in many different species. However, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in non-human, non-model species where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes.ResultsTo address this issue, we developed a computational approach called Camoco (Co-Analysis of Molecular Components) that systematically integrates loci identified by GWAS with gene co-expression networks to prioritize putative causal genes. We applied Camoco to prioritize candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize seeds. Camoco identified statistically significant subnetworks for the majority of traits examined, producing a prioritized list of high-confidence causal genes for several agronomically important maize traits. Two candidate genes identified by our approach were validated through analysis of mutant phenotypes. Strikingly, we observed a strong dependence in the performance of our approach on the type of co-expression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, maize roots) outperformed other alternatives.ConclusionsOur study demonstrates that co-expression networks can provide a powerful basis for prioritizing candidate causal genes from GWAS loci, but suggests that the success of such strategies can highly depend on the gene expression data context. Both the Camoco software and the lessons on integrating GWAS data with co-expression networks generalize to species beyond maize.