Abstract
The ability to identify true-positive variants increases as more genotyped animals are available. Although thousands of animals can be genotyped, the dimensionality of the genomic information is limited. Therefore, there is a certain number of animals that represent all chromosome segments (Me) segregating in the population. The number of Me can be approximated from the eigenvalue decomposition of the genomic relationship matrix (G). Thus, the limited dimensionality may help to identify the number of animals to be used in genome-wide association (GWA). The first objective of this study was to examine different discovery set sizes for GWA, with set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in G. Additionally, we investigated the impact of incorporating variants selected from different set sizes to regular SNP chip used for genomic prediction. Sequence data were simulated that contained 500k SNP and 2k QTL, where the genetic variance was fully explained by QTL. The GWA was conducted using the number of genotyped animals equal to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99 percent of the variance in G. Significant SNP had a p-value lower than 0.05 with Bonferroni correction. Further, SNP with the largest effect size (top10, 100, 500, 1k, 2k, and 4k) were also selected to be incorporated into the 50k regular chip. Genomic predictions using the 50k combined with selected SNP were conducted using single-step GBLUP (ssGBLUP). Using the number of animals corresponding to at least EIG98 enabled the identification of the largest effect size QTL. The greatest accuracy of prediction was obtained when the top 2k SNP was combined to the 50k chip. The dimensionality of genomic information should be taken into account for variant selection in GWAS.