11 Million SNP Reference Profiles for Identity Searching Across Ethnicities, Kinship, and Admixture
AbstractHigh throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) provides additional applications for DNA forensics including identification, mixture analysis, kinship prediction, and biogeographic ancestry prediction. Public repositories of human genetic data are being rapidly generated and released, but the majorities of these samples are de-identified to protect privacy, and have little or no individual metadata such as appearance (photos), ethnicity, relatives, etc. A reference in silico dataset has been generated to enable development and testing of new DNA forensics algorithms. This dataset provides 11 million SNP profiles for individuals with defined ethnicities and family relationships spanning eight generations with admixture for a panel with 39,108 SNPs.