A Comprehensive Comparison of Haplotype-Based Single-Step Genomic Predictions in Livestock Populations With Different Genetic Diversity Levels: A Simulation Study

The level of genetic diversity in a population is inversely proportional to the linkage disequilibrium (LD) between individual single nucleotide polymorphisms (SNPs) and quantitative trait loci (QTLs), leading to lower predictive ability of genomic breeding values (GEBVs) in high genetically diverse populations. Haplotype-based predictions could outperform individual SNP predictions by better capturing the LD between SNP and QTL. Therefore, we aimed to evaluate the accuracy and bias of individual-SNP- and haplotype-based genomic predictions under the single-step-genomic best linear unbiased prediction (ssGBLUP) approach in genetically diverse populations. We simulated purebred and composite sheep populations using literature parameters for moderate and low heritability traits. The haplotypes were created based on LD thresholds of 0.1, 0.3, and 0.6. Pseudo-SNPs from unique haplotype alleles were used to create the genomic relationship matrix (G) in the ssGBLUP analyses. Alternative scenarios were compared in which the pseudo-SNPs were combined with non-LD clustered SNPs, only pseudo-SNPs, or haplotypes fitted in a second G (two relationship matrices). The GEBV accuracies for the moderate heritability-trait scenarios fitting individual SNPs ranged from 0.41 to 0.55 and with haplotypes from 0.17 to 0.54 in the most (Ne ≅ 450) and less (Ne < 200) genetically diverse populations, respectively, and the bias fitting individual SNPs or haplotypes ranged between −0.14 and −0.08 and from −0.62 to −0.08, respectively. For the low heritability-trait scenarios, the GEBV accuracies fitting individual SNPs ranged from 0.24 to 0.32, and for fitting haplotypes, it ranged from 0.11 to 0.32 in the more (Ne ≅ 250) and less (Ne ≅ 100) genetically diverse populations, respectively, and the bias ranged between −0.36 and −0.32 and from −0.78 to −0.33 fitting individual SNPs or haplotypes, respectively. The lowest accuracies and largest biases were observed fitting only pseudo-SNPs from blocks constructed with an LD threshold of 0.3 (p < 0.05), whereas the best results were obtained using only SNPs or the combination of independent SNPs and pseudo-SNPs in one or two G matrices, in both heritability levels and all populations regardless of the level of genetic diversity. In summary, haplotype-based models did not improve the performance of genomic predictions in genetically diverse populations.

Download Full-text

Level-biases in estimated breeding values due to the use of different SNP panels over time in ssGBLUP

Genetics Selection Evolution ◽

10.1186/s12711-019-0517-z ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 1

Author(s):

Øyvind Nordbø ◽

Arne B. Gjuvsland ◽

Leiv Sigbjørn Eikje ◽

Theo Meuwissen

Keyword(s):

Value Added ◽

Single Step ◽

Fine Tuning ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Optimal Selection ◽

Breeding Values ◽

Estimated Breeding Values ◽

Snp Panels ◽

Genomic Predictions

Abstract Background The main aim of single-step genomic predictions was to facilitate optimal selection in populations consisting of both genotyped and non-genotyped individuals. However, in spite of intensive research, biases still occur, which make it difficult to perform optimal selection across groups of animals. The objective of this study was to investigate whether incomplete genotype datasets with errors could be a potential source of level-bias between genotyped and non-genotyped animals and between animals genotyped on different single nucleotide polymorphism (SNP) panels in single-step genomic predictions. Results Incomplete and erroneous genotypes of young animals caused biases in breeding values between groups of animals. Systematic noise or missing data for less than 1% of the SNPs in the genotype data had substantial effects on the differences in breeding values between genotyped and non-genotyped animals, and between animals genotyped on different chips. The breeding values of young genotyped individuals were biased upward, and the magnitude was up to 0.8 genetic standard deviations, compared with breeding values of non-genotyped individuals. Similarly, the magnitude of a small value added to the diagonal of the genomic relationship matrix affected the level of average breeding values between groups of genotyped and non-genotyped animals. Cross-validation accuracies and regression coefficients were not sensitive to these factors. Conclusions Because, historically, different SNP chips have been used for genotyping different parts of a population, fine-tuning of imputation within and across SNP chips and handling of missing genotypes are crucial for reducing bias. Although all the SNPs used for estimating breeding values are present on the chip used for genotyping young animals, incompleteness and some genotype errors might lead to level-biases in breeding values.

Download Full-text

Correction to: Validation of genomic predictions for body weight in broilers using crossbred information and considering breed-of-origin of alleles

Genetics Selection Evolution ◽

10.1186/s12711-019-0507-1 ◽

2019 ◽

Vol 51 (1) ◽

Author(s):

Pascal Duenk ◽

Mario P. L. Calus ◽

Yvonne C. J. Wientjes ◽

Vivian P. Breen ◽

John M. Henshall ◽

...

Keyword(s):

Body Weight ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Genomic Predictions

Following publication of original article [1], we noticed that there was an error: Eq. (3) on page 5 is the genomic relationship matrix that

Download Full-text

Genomic Predictions for Fillet Yield and Firmness in Rainbow Trout Using Reduced-density SNP Panels

10.21203/rs.3.rs-36925/v2 ◽

2020 ◽

Author(s):

Rafet Al-Tobasei ◽

Ali R. Ali ◽

Andre L. S. Garcia ◽

Daniela Lourenco ◽

Tim Leeds ◽

...

Keyword(s):

Rainbow Trout ◽

Predictive Ability ◽

Study Data ◽

Single Step ◽

Breeding Values ◽

Snp Chip ◽

Estimated Breeding Values ◽

Snp Panels ◽

Fillet Yield ◽

Genomic Predictions

Abstract BackgroundOne of the most important goals for the rainbow trout aquaculture industry is to improve fillet yield and fillet quality. Previously, we showed that a 50K transcribed-SNP chip can be used to detect quantitative trait loci (QTL) associated with fillet yield and fillet firmness. In this study, data from 1,568 fish genotyped for the 50K transcribed-SNP chip and ~774 fish phenotyped for fillet yield and fillet firmness were used in a single-step genomic BLUP (ssGBLUP) model to compute the genomic estimated breeding values (GEBV). In addition, pedigree-based best linear unbiased prediction (PBLUP) was used to calculate traditional, family-based estimated breeding values (EBV). ResultsThe genomic predictions outperformed the traditional EBV by 35% for fillet yield and 42% for fillet firmness. The predictive ability for fillet yield and fillet firmness was 0.19 - 0.20 with PBLUP, and 0.27 with ssGBLUP. Additionally, reducing SNP panel densities indicated that using 500 – 800 SNPs in genomic predictions still provides predictive abilities higher than PBLUP. ConclusionThese results suggest that genomic evaluation is a feasible strategy to identify and select fish with superior genetic merit within rainbow trout families, even with low-density SNP panels.

Download Full-text

Genomic Predictions for Muscle Yield and Fillet Firmness in Rainbow Trout using Reduced-Density SNP Panels

10.21203/rs.3.rs-36925/v1 ◽

2020 ◽

Author(s):

Rafet Al-Tobasei ◽

Ali R. Ali ◽

Andre L. S. Garcia ◽

Daniela Lourenco ◽

Tim Leeds ◽

...

Keyword(s):

Rainbow Trout ◽

Predictive Ability ◽

Study Data ◽

Single Step ◽

Breeding Values ◽

Snp Chip ◽

Estimated Breeding Values ◽

Snp Panels ◽

Family Based ◽

Genomic Predictions

Abstract Background One of the most important goals for the rainbow trout aquaculture industry is to improve muscle yield and fillet quality. Previously, we showed that a 50K transcribed-SNP chip can be used to detect quantitative trait loci (QTL) associated with muscle yield and fillet firmness. In this study, data from 1,568 fish genotyped for the 50K transcribed-SNP chip and ~774 fish phenotyped for muscle yield and fillet firmness were used in a single-step genomic BLUP (ssGBLUP) model to compute the genomic estimated breeding values (GEBV). In addition, pedigree-based best linear unbiased prediction (PBLUP) was used to calculate traditional, family-based estimated breeding values (EBV). Results The genomic predictions outperformed the traditional EBV by 35% for muscle yield and 42% for fillet firmness. The predictive ability for muscle yield and fillet firmness was 0.19 - 0.20 with PBLUP, and 0.27 with ssGBLUP. Additionally, reducing SNP panel densities indicated that using 500 – 800 SNPs in genomic predictions still provides predictive abilities higher than PBLUP. Conclusion These results suggest that genomic evaluation is a feasible strategy to identify and select fish with superior genetic merit within rainbow trout families, even with low-density SNP panels.

Download Full-text

Genomic predictions for fillet yield and firmness in rainbow trout using reduced-density SNP panels

BMC Genomics ◽

10.1186/s12864-021-07404-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Rafet Al-Tobasei ◽

Ali Ali ◽

Andre L. S. Garcia ◽

Daniela Lourenco ◽

Tim Leeds ◽

...

Keyword(s):

Rainbow Trout ◽

Predictive Ability ◽

Study Data ◽

Single Step ◽

Breeding Values ◽

Snp Chip ◽

Estimated Breeding Values ◽

Snp Panels ◽

Fillet Yield ◽

Genomic Predictions

Abstract Background One of the most important goals for the rainbow trout aquaculture industry is to improve fillet yield and fillet quality. Previously, we showed that a 50 K transcribed-SNP chip can be used to detect quantitative trait loci (QTL) associated with fillet yield and fillet firmness. In this study, data from 1568 fish genotyped for the 50 K transcribed-SNP chip and ~ 774 fish phenotyped for fillet yield and fillet firmness were used in a single-step genomic BLUP (ssGBLUP) model to compute the genomic estimated breeding values (GEBV). In addition, pedigree-based best linear unbiased prediction (PBLUP) was used to calculate traditional, family-based estimated breeding values (EBV). Results The genomic predictions outperformed the traditional EBV by 35% for fillet yield and 42% for fillet firmness. The predictive ability for fillet yield and fillet firmness was 0.19–0.20 with PBLUP, and 0.27 with ssGBLUP. Additionally, reducing SNP panel densities indicated that using 500–800 SNPs in genomic predictions still provides predictive abilities higher than PBLUP. Conclusion These results suggest that genomic evaluation is a feasible strategy to identify and select fish with superior genetic merit within rainbow trout families, even with low-density SNP panels.

Download Full-text

Core-dependent changes in genomic predictions using the Algorithm for Proven and Young in single-step genomic best linear unbiased prediction

Journal of Animal Science ◽

10.1093/jas/skaa374 ◽

2020 ◽

Vol 98 (12) ◽

Author(s):

Ignacy Misztal ◽

Shogo Tsuruta ◽

Ivan Pocrnic ◽

Daniela Lourenco

Keyword(s):

Prediction Accuracy ◽

Best Linear Unbiased Prediction ◽

Single Step ◽

Relationship Matrix ◽

Linear Unbiased Prediction ◽

Breeding Values ◽

Best Linear Unbiased ◽

The Impact ◽

Genomic Predictions ◽

Unbiased Prediction

Abstract Single-step genomic best linear unbiased prediction with the Algorithm for Proven and Young (APY) is a popular method for large-scale genomic evaluations. With the APY algorithm, animals are designated as core or noncore, and the computing resources to create the inverse of the genomic relationship matrix (GRM) are reduced by inverting only a portion of that matrix for core animals. However, using different core sets of the same size causes fluctuations in genomic estimated breeding values (GEBVs) up to one additive standard deviation without affecting prediction accuracy. About 2% of the variation in the GRM is noise. In the recursion formula for APY, the error term modeling the noise is different for every set of core animals, creating changes in breeding values. While average changes are small, and correlations between breeding values estimated with different core animals are close to 1.0, based on the normal distribution theory, outliers can be several times bigger than the average. Tests included commercial datasets from beef and dairy cattle and from pigs. Beyond a certain number of core animals, the prediction accuracy did not improve, but fluctuations decreased with more animals. Fluctuations were much smaller than the possible changes based on prediction error variance. GEBVs change over time even for animals with no new data as genomic relationships ties all the genotyped animals, causing reranking of top animals. In contrast, changes in nongenomic models without new data are small. Also, GEBV can change due to details in the model, such as redefinition of contemporary groups or unknown parent groups. In particular, increasing the fraction of blending of the GRM with a pedigree relationship matrix from 5% to 20% caused changes in GEBV up to 0.45 SD, with a correlation of GEBV > 0.99. Fluctuations in genomic predictions are part of genomic evaluation models and are also present without the APY algorithm when genomic evaluations are computed with updated data. The best approach to reduce the impact of fluctuations in genomic evaluations is to make selection decisions not on individual animals with limited individual accuracy but on groups of animals with high average accuracy.

Download Full-text

A Weighted Genomic Relationship Matrix Based on Fixation Index (FST) Prioritized SNPs for Genomic Selection

Genes ◽

10.3390/genes10110922 ◽

2019 ◽

Vol 10 (11) ◽

pp. 922

Author(s):

Ling-Yun Chang ◽

Sajjad Toghiani ◽

El Hamidi Hay ◽

Samuel E. Aggrey ◽

Romdhane Rekaya

Keyword(s):

Genomic Selection ◽

Statistical Power ◽

Fixation Index ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Nucleotide Polymorphisms ◽

Genomic Relationship ◽

Single Nucleotide ◽

Relative Contribution ◽

Estimation Of Variance

A dramatic increase in the density of marker panels has been expected to increase the accuracy of genomic selection (GS), unfortunately, little to no improvement has been observed. By including all variants in the association model, the dimensionality of the problem should be dramatically increased, and it could undoubtedly reduce the statistical power. Using all Single nucleotide polymorphisms (SNPs) to compute the genomic relationship matrix (G) does not necessarily increase accuracy as the additive relationships can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. The fixation index (FST) as a measure of population differentiation has been used to identify genome segments and variants under selection pressure. Using prioritized variants has increased the accuracy of GS. Additionally, FST can be used to weight the relative contribution of prioritized SNPs in computing G. In this study, relative weights based on FST scores were developed and incorporated into the calculation of G and their impact on the estimation of variance components and accuracy was assessed. The results showed that prioritizing SNPs based on their FST scores resulted in an increase in the genetic similarity between training and validation animals and improved the accuracy of GS by more than 5%.

Download Full-text

335 Genomic predictions with a multi-breed genomic relationship matrix

Journal of Animal Science ◽

10.1093/jas/skz258.099 ◽

2019 ◽

Vol 97 (Supplement_3) ◽

pp. 49-50

Author(s):

Yvette Steyn ◽

Daniela Lourenco ◽

Ignacy Misztal

Keyword(s):

Prediction Accuracy ◽

Negative Impact ◽

Reference Population ◽

Single Step ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Effective Population ◽

Specific Allele ◽

Missing Genotypes

Abstract Multi-breed evaluations have the advantage of increasing the size of the reference population for genomic evaluations and are quite simple; however, combining breeds usually have a negative impact on prediction accuracy. The aim of this study was to evaluate the use of a multi-breed genomic relationship matrix (G), where SNP for each breed are non-shared. The multi-breed G is set assuming known genotypes for one breed and missing genotypes for the remaining breeds. This setup may avoid spurious IBS relationships between breeds and considers breed-specific allele frequencies. This scenario was contrasted to multi-breed evaluations where all SNP are shared, i.e., the same SNP, and to single-breed evaluations. Different SNP densities, namely 9k and 45k, and different effective population sizes (Ne) were tested. Five breeds mimicking recent beef cattle populations that diverged from the same historical population were simulated using different selection criteria. It was assumed that QTL effects were the same over all breeds. For the recent population, generations 1 to 9 had approximately half of the animals genotyped, whereas all 1200 animals were genotyped in generation 10. Genotyped animals in generation 10 were set as validation; therefore, each breed had a validation set. Analysis were performed using single-step GBLUP (ssGBLUP). Prediction accuracy was calculated as correlation between true (T) and genomic estimated (GE) BV. Accuracies of GEBV were lower for the larger Ne and low SNP density. All three scenarios using 45K resulted in similar accuracies, suggesting that the marker density is high enough to account for relationships and linkage disequilibrium with QTL. A shared multi-breed evaluation using 9K resulted in a decrease of accuracy of 0.08 for a smaller Ne and 0.11 for a larger Ne. This loss was mostly avoided when markers were treated as non-shared within the same genomic relationship matrix.

Download Full-text

Efficient computation of the genomic relationship matrix and other matrices used in single-step evaluation

Journal of Animal Breeding and Genetics ◽

10.1111/j.1439-0388.2010.00912.x ◽

2011 ◽

Vol 128 (6) ◽

pp. 422-428 ◽

Cited By ~ 89

Author(s):

I. Aguilar ◽

I. Misztal ◽

A. Legarra ◽

S. Tsuruta

Keyword(s):

Single Step ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Efficient Computation ◽

Genomic Relationship

Download Full-text

Utility of Climatic Information via Combining Ability Models to Improve Genomic Prediction for Yield Within the Genomes to Fields Maize Project

Frontiers in Genetics ◽

10.3389/fgene.2020.592769 ◽

2021 ◽

Vol 11 ◽

Author(s):

Diego Jarquin ◽

Natalia de Leon ◽

Cinta Romay ◽

Martin Bohn ◽

Edward S. Buckler ◽

...

Keyword(s):

Genomic Prediction ◽

Combining Ability ◽

Prediction Models ◽

Predictive Ability ◽

Weather Data ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Environment Interaction ◽

Environmental Covariates ◽

Genotype By Environment

Genomic prediction provides an efficient alternative to conventional phenotypic selection for developing improved cultivars with desirable characteristics. New and improved methods to genomic prediction are continually being developed that attempt to deal with the integration of data types beyond genomic information. Modern automated weather systems offer the opportunity to capture continuous data on a range of environmental parameters at specific field locations. In principle, this information could characterize training and target environments and enhance predictive ability by incorporating weather characteristics as part of the genotype-by-environment (G×E) interaction component in prediction models. We assessed the usefulness of including weather data variables in genomic prediction models using a naïve environmental kinship model across 30 environments comprising the Genomes to Fields (G2F) initiative in 2014 and 2015. Specifically four different prediction scenarios were evaluated (i) tested genotypes in observed environments; (ii) untested genotypes in observed environments; (iii) tested genotypes in unobserved environments; and (iv) untested genotypes in unobserved environments. A set of 1,481 unique hybrids were evaluated for grain yield. Evaluations were conducted using five different models including main effect of environments; general combining ability (GCA) effects of the maternal and paternal parents modeled using the genomic relationship matrix; specific combining ability (SCA) effects between maternal and paternal parents; interactions between genetic (GCA and SCA) effects and environmental effects; and finally interactions between the genetics effects and environmental covariates. Incorporation of the genotype-by-environment interaction term improved predictive ability across all scenarios. However, predictive ability was not improved through inclusion of naive environmental covariates in G×E models. More research should be conducted to link the observed weather conditions with important physiological aspects in plant development to improve predictive ability through the inclusion of weather data.

Download Full-text