Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data
AbstractPigs (Sus scrofa) exhibit diverse phenotypes in different breeds shaped by the combined effects of various local adaptation and artificial selection. To comprehensively characterize the genetic diversity of pigs, we construct a pig pan-genome by comparing genome assemblies of 11 representative pig breeds with the reference genome (Sscrofa11.1). Approximately 72.5 Mb non-redundant sequences were identified as pan-sequences which were absent from the Sscrofa11.1. On average, 41.7 kb of spurious heterozygous SNPs per individual are removed and 12.9 kb novel SNPs per individual are recovered using pan-genome as the reference for SNP calling, thereby providing enhanced resolution for genetic diversity in pigs. Homolog annotation and analysis using RNA-seq and Hi-C data indicate that these pan-sequences contain protein-coding regions and regulatory elements. These pan-sequences can further improve the interpretation of local 3D structure. The pan-genome as well as the accompanied web-based database will serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.