Optimizing the identification of causal variants across varying genetic architectures in crops
AbstractBackgroundAssociation studies use statistical links between genetic markers and variation in a phenotype’s value across many individuals to identify genes controlling variation in the target phenotype. However, this approach, particularly conducted on a genome-wide scale (GWAS), has limited power to identify the genes responsible for variation in traits controlled by complex genetic architectures.ResultsHere we employ simulation studies utilizing real-world genotype datasets from association populations in four species with distinct minor allele frequency distributions, population structures, and patterns linkage disequilibrium to evaluate the impact of variation in both heritability and trait complexity on both conventional mixed linear model based GWAS and two new approaches specifically developed for complex traits. Mixed linear model based GWAS rapidly losses power for more complex traits. FarmCPU, a method based on multi-locus mixed linear models, provides the greatest statistical power for moderately complex traits. A Bayesian approach adopted from genomic prediction provides the greatest statistical power to identify causal genetic loci for extremely complex traits.ConclusionsUsing estimates of the complexity of the genetic architecture of target traits can guide the selection of appropriate statistical methods and improve the overall accuracy and power of GWAS.