Fish and chips: the origin of human gene families is a predictor of the location of GWAS signals
Abstract GWAS have identified thousands of loci associated with human complex diseases and traits. How these loci are distributed through the genome has not been systematically evaluated. We hypothesised that the location of GWAS loci differ between ancestral linkage groups (ALGs) related to the paralogy and function of genes. We used data from the NHGRI-EBI GWAS catalog to determine whether the density of GWAS loci relative to HapMap variants in each ALG differed, and whether ALG’s were enriched for experimental factor ontological (EFO) terms assigned to the GWAS traits. In a gene-level analyses we explored the characteristics of genes linked to GWAS loci and those mapping to the ALG’s. We find that GWAS loci were enriched or deficient in 9 and 7 of the 17 ALG’s respectively, while there was no difference in the number of GWAS loci in regions of the human genome unassigned to an ALG. All but 2 ALG’s were significantly enriched or deficient for one or more EFO terms. Lastly, we find that genes assigned to an ALG are under higher levels of selective constraint, have longer coding sequences and higher median expression in the tissue of highest expression than genes not mapping to an ALG. On the other hand, genes associated with GWAS loci have longer genomic length and exhibit higher levels of selective constraint relative to non-GWAS genes.Collectively, this suggests that understanding the location and ancestral origins of GWAS signals may be informative for the development of tools for variant prioritization and interpretation.