scholarly journals Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels

BMC Genetics ◽  
2013 ◽  
Vol 14 (1) ◽  
pp. 38 ◽  
Author(s):  
Jose L Gualdrón Duarte ◽  
Ronald O Bates ◽  
Catherine W Ernst ◽  
Nancy E Raney ◽  
Rodolfo JC Cantet ◽  
...  
2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Troy N. Rowan ◽  
Jesse L. Hoff ◽  
Tamar E. Crum ◽  
Jeremy F. Taylor ◽  
Robert D. Schnabel ◽  
...  

Abstract Background During the last decade, the use of common-variant array-based single nucleotide polymorphism (SNP) genotyping in the beef and dairy industries has produced an astounding amount of medium-to-low density genomic data. Although low-density assays work well in the context of genomic prediction, they are less useful for detecting and mapping causal variants and the effects of rare variants are not captured. The objective of this project was to maximize the accuracies of genotype imputation from medium- and low-density assays to the marker set obtained by combining two high-density research assays (~ 850,000 SNPs), the Illumina BovineHD and the GGP-F250 assays, which contains a large proportion of rare and potentially functional variants and for which the assay design is described here. This 850 K SNP set is useful for both imputation to sequence-level genotypes and direct downstream analysis. Results We found that a large multi-breed composite imputation reference panel that includes 36,131 samples with either BovineHD and/or GGP-F250 genotypes significantly increased imputation accuracy compared with a within-breed reference panel, particularly at variants with low minor allele frequencies. Individual animal imputation accuracies were maximized when more genetically similar animals were represented in the composite reference panel, particularly with complete 850 K genotypes. The addition of rare variants from the GGP-F250 assay to our composite reference panel significantly increased the imputation accuracy of rare variants that are exclusively present on the BovineHD assay. In addition, we show that an assay marker density of 50 K SNPs balances cost and accuracy for imputation to 850 K. Conclusions Using high-density genotypes on all available individuals in a multi-breed reference panel maximized imputation accuracy for tested cattle populations. Admixed animals or those from breeds with a limited representation in the composite reference panel were still imputed at high accuracy, which is expected to further increase as the reference panel expands. We anticipate that the addition of rare variants from the GGP-F250 assay will increase the accuracy of imputation to sequence level.


2018 ◽  
Author(s):  
Andrew Whalen ◽  
John M Hickey ◽  
Gregor Gorjanc

In this paper we evaluate the performance of using a family-specific low-density genotype arrays to increase the accuracy of pedigree based imputation. Genotype imputation is a widely used tool that decreases the costs of genotyping a population by genotyping the majority of individuals using a low-density array and using statistical regularities between the low-density and high-density individuals to fill in the missing genotypes. Previous work on population based imputation has found that it is possible to increase the accuracy of imputation by maximizing the number of informative markers on an array. In the context of pedigree based imputation, where the informativeness of a marker depends only on the genotypes of an individual's parents, it may be beneficial to select the markers on each low-density array on a family-by-family basis. In this paper we examined four family-specific low-density marker selection strategies, and evaluated their performance in the context of a real pig breeding dataset. We found that family-specific or sire-specific arrays could increase imputation accuracy by 0.11 at 1 marker per chromosome, by 0.027 at 25 markers per chromosome and by 0.007 at 100 markers per chromosome. These results suggest that there may be a room to use family-specific genotyping for very-low-density arrays particularly if a given sire or sire-dam pairing have a large number of offspring.


Aquaculture ◽  
2018 ◽  
Vol 491 ◽  
pp. 147-154 ◽  
Author(s):  
Grazyella M. Yoshida ◽  
Roberto Carvalheiro ◽  
Jean P. Lhorente ◽  
Katharina Correa ◽  
René Figueroa ◽  
...  

animal ◽  
2018 ◽  
Vol 12 (11) ◽  
pp. 2235-2245 ◽  
Author(s):  
D.A. Grossi ◽  
L.F. Brito ◽  
M. Jafarikia ◽  
F.S. Schenkel ◽  
Z. Feng

2019 ◽  
Vol 10 (2) ◽  
pp. 581-590 ◽  
Author(s):  
Smaragda Tsairidou ◽  
Alastair Hamilton ◽  
Diego Robledo ◽  
James E. Bron ◽  
Ross D. Houston

Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterized by large full-sibling families has yet to be fully assessed. The aim of this study was to optimize the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.


2020 ◽  
Vol 60 (8) ◽  
pp. 999
Author(s):  
Lianjie Hou ◽  
Wenshuai Liang ◽  
Guli Xu ◽  
Bo Huang ◽  
Xiquan Zhang ◽  
...  

Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.


2018 ◽  
Author(s):  
Serap Gonen ◽  
Valentin Wimmer ◽  
R. Chris Gaynor ◽  
Ed Byrne ◽  
Gregor Gorjanc ◽  
...  

AbstractThis paper presents a new heuristic method for phasing and imputation of genomic data in diploid plant species. Our method, called AlphaPlantImpute, explicitly leverages features of plant breeding programs to maximise the accuracy of imputation. The features are a small number of parents, which can be inbred and usually have high-density genomic data, and few recombinations separating parents and focal individuals genotyped at low-density (i.e. descendants that are the imputation targets). AlphaPlantImpute works roughly in three steps. First, it identifies informative low-density genotype markers in parents. Second, it tracks the inheritance of parental alleles and haplotypes to focal individuals at informative markers. Finally, it uses this low-density information as anchor points to impute focal individuals to high-density.We tested the imputation accuracy of AlphaPlantImpute in simulated bi-parental populations across different scenarios. We also compared its accuracy to existing software called PlantImpute. In general, AlphaPlantImpute had better or equal imputation accuracy as PlantImpute. The computational time and memory requirements of AlphaPlantImpute were tiny compared to PlantImpute. For example, accuracy of imputation was 0.96 for a scenario where both parents were inbred and genotyped at 25,000 markers per chromosome and a focal F2 individual was genotyped with 50 markers per chromosome. The maximum memory requirement for this scenario was 0.08 GB and took 37 seconds to complete.


2020 ◽  
Vol 60 (3) ◽  
pp. 333
Author(s):  
Ricardo V. Ventura ◽  
Luiz F. Brito ◽  
Gerson A. Oliveira ◽  
Hans D. Daetwyler ◽  
Flavio S. Schenkel ◽  
...  

There is evidence that some genotyping platforms might not work very well for Zebu cattle when compared with Taurine breeds. In addition, the availability of panels with low to moderate number of overlapping markers is a limitation for combining datasets for genomic evaluations, especially when animals are genotyped using different SNP panels. In the present study, we compared the performance of medium- and high-density (HD) commercially available panels and investigated the feasibility of developing an ultra-HD panel (SP) containing markers from an Illumina (HD_I) and an Affymetrix (HD_A) panels. The SP panel contained 1123442 SNPs. After performing SNP pruning on the basis of linkage disequilibrium, HD_A, HD_I and SP contained 429624, 365225 and 658770 markers distributed across the whole genome. The overall mean proportion of markers pruned out per chromosome for HD_A, HD_I and SP was 15.17%, 43.18%, 38.63% respectively. The HD_I panel presented the highest mean number of runs-of-homozygosity segments per animal (45.48%, an increment of 5.11% compared with SP) and longer segments, on average (3057.95 kb per segment), than did both HD_A and SP. HD_I also showed the highest mean number of SNPs per run-of-homozygosity segment. Consequently, the majority of animals presented the highest genomic inbreeding levels when genotyped using HD_I. The visual examination of marker distribution along the genome illustrated uncovered regions among the different panels. Haplotype-block comparison among panels and the average haplotype size constructed on the basis of HD_A were smaller than those from HD_I. The average number of SNPs per haplotype was different between HD_A and HD_I. Both HD_A and HD_I panels achieved high imputation accuracies when used as the lower-density panels for imputing to SP. However, imputation accuracy from HD_A to SP was greater than was imputation from HD_I to SP. Imputation from one HD panel to the other is also feasible. Low- and medium-density panels, composed of markers that are subsets of both HD_A and HD_I panels, should be developed to achieve better imputation accuracies to both HD levels. Therefore, the genomic analyses performed in the present study showed significant differences among the SNP panels used.


2019 ◽  
Author(s):  
Troy N. Rowan ◽  
Jesse L. Hoff ◽  
Tamar E. Crum ◽  
Jeremy F. Taylor ◽  
Robert D. Schnabel ◽  
...  

AbstractBackgroundThe use of array-based SNP genotyping in the beef and dairy industries has produced an astounding amount of medium-to-low density genomic data in the last decade. While low-density assays work exceptionally well in the context of genomic prediction, they are less useful in mapping and causal variant discovery. This project focuses on maximizing imputation accuracies to the marker set of two high-density research assays, the Illumina Bovine HD, and the GGP-F250 which contains a large proportion of rare and potentially functional variants (~850,000 total SNPs). This 850K SNP set is well-suited for both imputation to sequence-level genotypes and direct downstream analysis.ResultsWe find that a large multi-breed composite imputation reference comprised of 36,131 samples with either HD and/or F250 genotypes significantly increases imputation accuracy compared to a standard within-breed reference panel, particularly at low minor allele frequencies. Imputation accuracies were maximized when an individual’s ancestry was adequately represented in the composite reference, particularly with complete 850K genotypes. The addition of rare content from the F250 to our composite reference panel significantly increased the imputation accuracy of rare variants found exclusively on the HD. Additionally, we identify 50,000 variants as an ideal starting density for 850K imputation.ConclusionUsing high-density genotypes on all available individuals in a multi-breed reference panel maximizes imputation accuracy for all cattle populations. Admixed breeds or those sparsely represented in the composite reference are still imputed at high accuracy which will increase further as the reference panel grows. We expect that the addition of rare variation from the F250 will increase the accuracy of imputation at the sequence level.


2020 ◽  
Author(s):  
Serap Gonen ◽  
Valentin Wimmer ◽  
R. Chris Gaynor ◽  
Ed Byrne ◽  
Gregor Gorjanc ◽  
...  

AbstractThis paper presents an extension to a heuristic method for phasing and imputation of genotypes of descendants in bi-parental populations so that it can phase and impute genotypes of parents of bi-parental populations that are fully ungenotyped or partially genotyped. The imputed genotypes of the parent are then used to impute low-density genotyped descendants of the bi-parental population to high-density. The extension works in three steps. First, it identifies whether a parent has no or low-density genotypes available and it identifies all of its relatives that have high-density genotypes. Second, using the high-density information of relatives, it determines whether the parent is homozygous or heterozygous for a given locus. Third, it phases heterozygous positions of the parent by matching haplotypes to its relatives.We implemented the new algorithm in an extension of the AlphaPlantImptue software and tested its accuracy of imputing missing parent genotypes in simulated bi-parental populations from different scenarios. We also tested the accuracy of imputation of the missing parent’s descendants using the true genotype of the parent and compared this to using the imputed genotypes of the parent. Our results show that across all scenarios, the accuracy of imputation of a parent, measured as the correlation between true and imputed genotypes, was > 0.98 and did not drop below ∼ 0.96. The imputation accuracy of a parent was always higher when it was inbred than when it was outbred and when it had low-density genotypes. Including ancestors of the parent at HD, increasing the number of crosses and the number of high-density descendants all increased the accuracy of imputation. The high imputation accuracy achieved for the parent across all scenarios translated to little or no impact on the accuracy of imputation of its descendants at low-density.Key MessageNew fast and accurate method for phasing and imputation of SNP chip genotypes within diploid bi-parental plant populations.


Sign in / Sign up

Export Citation Format

Share Document