Genotype imputation accuracy in multiple equine breeds from medium- to high-density genotypes

Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterized by large full-sibling families has yet to be fully assessed. The aim of this study was to optimize the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.

Download Full-text

A multi-breed reference panel and additional rare variants maximize imputation accuracy in cattle

Genetics Selection Evolution ◽

10.1186/s12711-019-0519-x ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 8

Author(s):

Troy N. Rowan ◽

Jesse L. Hoff ◽

Tamar E. Crum ◽

Jeremy F. Taylor ◽

Robert D. Schnabel ◽

...

Keyword(s):

Rare Variants ◽

Imputation Accuracy ◽

High Density ◽

Genotype Imputation ◽

Reference Panel ◽

Low Density ◽

Single Nucleotide ◽

Functional Variants ◽

Causal Variants ◽

Downstream Analysis

Abstract Background During the last decade, the use of common-variant array-based single nucleotide polymorphism (SNP) genotyping in the beef and dairy industries has produced an astounding amount of medium-to-low density genomic data. Although low-density assays work well in the context of genomic prediction, they are less useful for detecting and mapping causal variants and the effects of rare variants are not captured. The objective of this project was to maximize the accuracies of genotype imputation from medium- and low-density assays to the marker set obtained by combining two high-density research assays (~ 850,000 SNPs), the Illumina BovineHD and the GGP-F250 assays, which contains a large proportion of rare and potentially functional variants and for which the assay design is described here. This 850 K SNP set is useful for both imputation to sequence-level genotypes and direct downstream analysis. Results We found that a large multi-breed composite imputation reference panel that includes 36,131 samples with either BovineHD and/or GGP-F250 genotypes significantly increased imputation accuracy compared with a within-breed reference panel, particularly at variants with low minor allele frequencies. Individual animal imputation accuracies were maximized when more genetically similar animals were represented in the composite reference panel, particularly with complete 850 K genotypes. The addition of rare variants from the GGP-F250 assay to our composite reference panel significantly increased the imputation accuracy of rare variants that are exclusively present on the BovineHD assay. In addition, we show that an assay marker density of 50 K SNPs balances cost and accuracy for imputation to 850 K. Conclusions Using high-density genotypes on all available individuals in a multi-breed reference panel maximized imputation accuracy for tested cattle populations. Admixed animals or those from breeds with a limited representation in the composite reference panel were still imputed at high accuracy, which is expected to further increase as the reference panel expands. We anticipate that the addition of rare variants from the GGP-F250 assay will increase the accuracy of imputation to sequence level.

Download Full-text

Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels

BMC Genetics ◽

10.1186/1471-2156-14-38 ◽

2013 ◽

Vol 14 (1) ◽

pp. 38 ◽

Cited By ~ 33

Author(s):

Jose L Gualdrón Duarte ◽

Ronald O Bates ◽

Catherine W Ernst ◽

Nancy E Raney ◽

Rodolfo JC Cantet ◽

...

Keyword(s):

Imputation Accuracy ◽

High Density ◽

Genotype Imputation ◽

Low Density ◽

Snp Panels

Download Full-text

Inference of Ancestries and Heterozygosity Proportion and Genotype Imputation in West African Cattle Populations

Frontiers in Genetics ◽

10.3389/fgene.2021.584355 ◽

2021 ◽

Vol 12 ◽

Author(s):

Netsanet Z. Gebrehiwot ◽

Hassan Aliloo ◽

Eva M. Strucken ◽

Karen Marshall ◽

Mohammad Al Kalaldeh ◽

...

Keyword(s):

Reference Data ◽

Bos Taurus ◽

Imputation Accuracy ◽

West African ◽

High Density ◽

Genotype Imputation ◽

Indigenous Populations ◽

Medium Density ◽

Crossbred Cattle ◽

Ancestral Origin

Several studies have evaluated computational methods that infer the haplotypes from population genotype data in European cattle populations. However, little is known about how well they perform in African indigenous and crossbred populations. This study investigates: (1) global and local ancestry inference; (2) heterozygosity proportion estimation; and (3) genotype imputation in West African indigenous and crossbred cattle populations. Principal component analysis (PCA), ADMIXTURE, and LAMP-LD were used to analyse a medium-density single nucleotide polymorphism (SNP) dataset from Senegalese crossbred cattle. Reference SNP data of East and West African indigenous and crossbred cattle populations were used to investigate the accuracy of imputation from low to medium-density and from medium to high-density SNP datasets using Minimac v3. The first two principal components differentiated Bos indicus from European Bos taurus and African Bos taurus from other breeds. Irrespective of assuming two or three ancestral breeds for the Senegalese crossbreds, breed proportion estimates from ADMIXTURE and LAMP-LD showed a high correlation (r ≥ 0.981). The observed ancestral origin heterozygosity proportion in putative F1 crosses was close to the expected value of 1.0, and clearly differentiated F1 from all other crosses. The imputation accuracies (estimated as correlation) between imputed and the real data in crossbred animals ranged from 0.142 to 0.717 when imputing from low to medium-density, and from 0.478 to 0.899 for imputation from medium to high-density. The imputation accuracy was generally higher when the reference data came from the same geographical region as the target population, and when crossbred reference data was used to impute crossbred genotypes. The lowest imputation accuracies were observed for indigenous breed genotypes. This study shows that ancestral origin heterozygosity can be estimated with high accuracy and will be far superior to the use of observed individual heterozygosity for estimating heterosis in African crossbred populations. It was not possible to achieve high imputation accuracy in West African crossbred or indigenous populations based on reference data sets from East Africa, and population-specific genotyping with high-density SNP assays is required to improve imputation.

Download Full-text

Family-specific genotype arrays increase the accuracy of pedigree based imputation at very low marker densities

10.1101/502989 ◽

2018 ◽

Author(s):

Andrew Whalen ◽

John M Hickey ◽

Gregor Gorjanc

Keyword(s):

Imputation Accuracy ◽

Population Based ◽

High Density ◽

Genotype Imputation ◽

Low Density ◽

Marker Selection ◽

Selection Strategies ◽

Statistical Regularities ◽

Missing Genotypes ◽

Density Marker

In this paper we evaluate the performance of using a family-specific low-density genotype arrays to increase the accuracy of pedigree based imputation. Genotype imputation is a widely used tool that decreases the costs of genotyping a population by genotyping the majority of individuals using a low-density array and using statistical regularities between the low-density and high-density individuals to fill in the missing genotypes. Previous work on population based imputation has found that it is possible to increase the accuracy of imputation by maximizing the number of informative markers on an array. In the context of pedigree based imputation, where the informativeness of a marker depends only on the genotypes of an individual's parents, it may be beneficial to select the markers on each low-density array on a family-by-family basis. In this paper we examined four family-specific low-density marker selection strategies, and evaluated their performance in the context of a real pig breeding dataset. We found that family-specific or sire-specific arrays could increase imputation accuracy by 0.11 at 1 marker per chromosome, by 0.027 at 25 markers per chromosome and by 0.007 at 100 markers per chromosome. These results suggest that there may be a room to use family-specific genotyping for very-low-density arrays particularly if a given sire or sire-dam pairing have a large number of offspring.

Download Full-text

Assessing single nucleotide polymorphism selection methods for the development of a low-density panel optimized for imputation in South African Drakensberger beef cattle

Journal of Animal Science ◽

10.1093/jas/skab118 ◽

2021 ◽

Author(s):

Simon F Lashmar ◽

Donagh P Berry ◽

Rian Pierneef ◽

Farai C Muchadeyi ◽

Carina Visser

Keyword(s):

South African ◽

Clustering Algorithm ◽

Imputation Accuracy ◽

Developed Countries ◽

Genotype Imputation ◽

Selection Strategy ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Single Nucleotide Polymorphism Selection ◽

The Mean

Abstract A major obstacle in applying genomic selection (GS) to uniquely adapted local breeds in less-developed countries has been the cost of genotyping at high densities of single nucleotide polymorphisms (SNP). Cost reduction can be achieved by imputing genotypes from lower to higher densities. Locally adapted breeds tend to be admixed and exhibit a high degree of genomic heterogeneity thus necessitating the optimization of SNP selection for downstream imputation. The aim of this study was to quantify the achievable imputation accuracy for a sample of 1,135 South African (SA) Drakensberger using several custom-derived lower-density panels varying in both SNP density and how the SNP were selected. From a pool of 120,608 genotyped SNP, subsets of SNP were chosen 1) at random, 2) with even genomic dispersion, 3) by maximizing the mean minor allele frequency (MAF), 4) using a combined score of MAF and linkage disequilibrium (LD), 5) using a partitioning-around-medoids (PAM) algorithm, and finally 6) using a hierarchical LD-based clustering algorithm. Imputation accuracy to higher density improved as SNP density increased; animal-wise imputation accuracy defined as the within-animal correlation between the imputed and actual alleles ranged from 0.625 to 0.990 when 2,500 randomly selected SNP were chosen versus a range of 0.918 to 0.999 when 50,000 randomly selected SNP were used. At a panel density of 10,000 SNP, the mean (standard deviation) animal-wise allele concordance rate was 0.976 (0.018) versus 0.982 (0.014) when the worst (i.e., random) as opposed to the best (i.e., combination of MAF and LD) SNP selection strategy was employed. A difference of 0.071 units was observed between the mean correlation-based accuracy of imputed SNP categorized as low (0.01<MAF≤0.1) versus high MAF (0.4<MAF≤0.5). Greater mean imputation accuracy was achieved for SNP located on autosomal extremes when these regions were populated with more SNP. The presented results suggested that genotype imputation can be a practical cost-saving strategy for indigenous breeds such as the South African Drakensberger. Based on the results, a genotyping panel consisting of approximately 10,000 SNP selected based on a combination of MAF and LD would suffice in achieving a less than 3% imputation error rate for a breed characterized by genomic admixture on the condition that these SNP are selected based on breed-specific selection criteria.

Download Full-text

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

10.1101/2022.01.11.475810 ◽

2022 ◽

Author(s):

Lars Wienbrandt ◽

David Ellinghaus

Keyword(s):

Memory Management ◽

Imputation Accuracy ◽

Simulated Data ◽

Genotype Imputation ◽

Whole Genome Sequencing Data ◽

Common Variants ◽

Sequencing Data ◽

1000 Genomes ◽

Genome Wide ◽

Reference Genomes

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.

Download Full-text

High-density marker imputation accuracy in sixteen French cattle breeds

Genetics Selection Evolution ◽

10.1186/1297-9686-45-33 ◽

2013 ◽

Vol 45 (1) ◽

Cited By ~ 64

Author(s):

Chris Hozé ◽

Marie-Noëlle Fouilloux ◽

Eric Venot ◽

François Guillaume ◽

Romain Dassonneville ◽

...

Keyword(s):

Imputation Accuracy ◽

High Density ◽

Cattle Breeds ◽

Density Marker

Download Full-text

Genotype-Imputation Accuracy across Worldwide Human Populations

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2009.01.013 ◽

2009 ◽

Vol 84 (2) ◽

pp. 235-250 ◽

Cited By ~ 150

Author(s):

Lucy Huang ◽

Yun Li ◽

Andrew B. Singleton ◽

John A. Hardy ◽

Gonçalo Abecasis ◽

...

Keyword(s):

Imputation Accuracy ◽

Genotype Imputation ◽

Human Populations

Download Full-text

Genotype Imputation in Case-Only Studies of Gene-Environment Interaction: Validity and Power

10.21203/rs.3.rs-274423/v1 ◽

2021 ◽

Author(s):

Milda Aleknonytė-Resch ◽

Silke Szymczak ◽

Sandra Freitag-Wolf ◽

Astrid Dempfle ◽

Michael Krawczak

Keyword(s):

Statistical Power ◽

Disease Risk ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Type I ◽

Environment Interaction ◽

Genetic Main Effect ◽

Gene Environment ◽

Main Effects ◽

Different Levels

Abstract Case-only (CO) studies are a powerful means to uncover gene-environment (G×E) interactions for complex human diseases. Moreover, such studies may in principle also draw upon genotype imputation to increase statistical power even further. However, genotype imputation usually employs healthy controls such as the Haplotype Reference Consortium (HRC) data as an imputation base, which may systematically perturb CO studies in genomic regions with main effects upon disease risk. Using genotype data from 719 German Crohn Disease (CD) patients, we investigated the level of imputation accuracy achievable for single nucleotide polymorphisms (SNPs) with or without a genetic main effect, and with varying minor allele frequency (MAF). Genotypes were imputed from neighbouring SNPs at different levels of linkage disequilibrium (LD) to the target SNP using the HRC data as an imputation base. Comparison of the true and imputed genotypes revealed lower imputation accuracy for SNPs with strong main effects. We also simulated different levels of G×E interaction to evaluate the potential loss of statistical validity and power incurred by the use of imputed genotypes. Simulations under the null hypothesis revealed that genotype imputation does not inflate the type I error rate of CO studies of G×E. However, the statistical power was found to be reduced by imputation, particularly for SNPs with low MAF, and a gradual loss of statistical power resulted when the level of LD to the SNPs driving the imputation decreased. Our study thus highlights that genotype imputation should be employed with great care in CO studies of G×E interaction.

Download Full-text