scholarly journals 334 Investigating core-dependent changes in predictions using the algorithm for proven and young in ssGBLUP

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 50-50
Author(s):  
Daniela Lourenco ◽  
Shogo Tsuruta ◽  
Ivan Pocrnic ◽  
Ignacy Misztal

Abstract Large-scale single-step GBLUP (ssGBLUP) evaluations rely on techniques to approximate or avoid the inversion of the genomic relationship matrix (G). The algorithm for proven and young (APY) was developed to create the inverse of G without explicit inversion, and relies on the clustering of genotyped animals into two groups, namely core and non-core. Although the correlation between GEBV from regular ssGBLUP and APY ssGBLUP is greater than 0.99 when the appropriate number of core animals is used, reranking is still observed when different core groups are used. We investigated which animals are more suitable to reranking and how the changes in GEBV can be minimized. Datasets from beef and dairy cattle, and pigs were used. The beef cattle data comprised phenotypes on 3 growth traits for up to 6.8M animals, pedigree for 8.2M, and genotypes for 66k. A dairy cattle data with 9M phenotypes for udder depth, 10M animals in pedigree, and 570K genotyped was used. The pig dataset had up to 770k phenotypes recorded on 4 traits, pedigree for 2.6M animals and genotypes for 54k. Investigations included using several different core groups, increasing the number of core animals beyond the optimal number obtained by the eigenvalue decomposition, and comparisons with GEBV from ssGBLUP with direct inversion (except for dairy). Additionally, observed changes were compared with possible changes based on SE of GEBV. In all datasets, larger changes in GEBV by using different core groups were observed for animals with lower accuracy. The observed changes relative to standard deviations of GEBV were, on average, 5% and ranged from 0 to 30%. Increasing the number of core animals beyond the optimal value helped to asymptotically reduce changes in GEBV. Although core-dependent changes in GEBV exist, they are small and can be reduced with larger core groups.

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 6-7
Author(s):  
Andre Garcia ◽  
Ignacio Aguilar ◽  
Andres Legarra ◽  
Stephen P Miller ◽  
Shogo Tsuruta ◽  
...  

Abstract With an ever-increasing number of genotyped animals, there is a question of whether to include all genotypes into single-step GBLUP (ssGBLUP) evaluations or to include only genotyped animals with phenotypes and use indirect predictions (IP) for the remaining young genotyped animals. Under ssGBLUP, SNP effects can be backsolved from GEBV, and IP can be calculated as the sum of SNP effects weighted by the gene content. To publish IP, a measure of accuracy that reflects the standard error of prediction, and that is comparable to GEBV accuracy, is needed. Our first objective was to test formulas to compute accuracy of IP by backsolving prediction error covariance (PEC) of GEBV into PEC of SNP effects. The second objective was to investigate the number of genotyped animals needed to obtain robust IP accuracy. Data were provided by the American Angus Association, with 38,000 post-weaning gain phenotypes and 60,000 genotyped animals. Correlations between GEBV and IP were ≥0.99. When all genotyped animals were used for PEC computations, accuracy correlations were also ≥0.99. Additionally, GEBV and IP accuracies were compatible, with both direct inversion of the genomic relationship matrix (G) or using the algorithm for proven and young (APY) to obtain G inverse. As the number of genotyped animals in PEC computations decreased to 15,000, accuracy correlations were still high (≥0.96), but IP accuracies were biased downwards. Indirect prediction accuracy can be successfully obtained from ssGBLUP without running an extra SNP-BLUP evaluation to compute SNP PEC. It is possible to reduce the number of genotyped animals in PEC computations, but accuracies may be slightly underestimated. When the amount of genomic and phenotypic data is large, the polygenic part of GEBV becomes small and IP can be very accurate. Further research is needed to approximate SNP PEC with a large number of genotyped animals.


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Vinzent Boerner ◽  
David J. Johnston

Abstract Multi-trait single step genetic evaluation is increasingly facing the situation of having more individuals with genotypes than markers within each genotype. This creates a situation where the genomic relationship matrix ($$\mathbf{G }$$ G ) is not of full rank and its inversion is algebraically impossible. Recently, the SS-T-BLUP method was proposed as a modified version of the single step equations, providing an elegant way to circumvent the inversion of the $$\mathbf{G }$$ G and therefore accommodate the situation described. SS-T-BLUP uses the Woodbury matrix identity, thus it requires an add-on matrix, which is usually the covariance matrix of the residual polygenic effet. In this paper, we examine the application of SS-T-BLUP to a large-scale multi-trait Australian Angus beef cattle dataset using the full BREEDPLAN single step genetic evaluation model and compare the results to the application of two different methods of using $$\mathbf{G }$$ G in a single step model. Results clearly show that SS-T-BLUP outperforms other single step formulations in terms of computational speed and avoids approximation of the inverse of $$\mathbf{G }$$ G .


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Øyvind Nordbø ◽  
Arne B. Gjuvsland ◽  
Leiv Sigbjørn Eikje ◽  
Theo Meuwissen

Abstract Background The main aim of single-step genomic predictions was to facilitate optimal selection in populations consisting of both genotyped and non-genotyped individuals. However, in spite of intensive research, biases still occur, which make it difficult to perform optimal selection across groups of animals. The objective of this study was to investigate whether incomplete genotype datasets with errors could be a potential source of level-bias between genotyped and non-genotyped animals and between animals genotyped on different single nucleotide polymorphism (SNP) panels in single-step genomic predictions. Results Incomplete and erroneous genotypes of young animals caused biases in breeding values between groups of animals. Systematic noise or missing data for less than 1% of the SNPs in the genotype data had substantial effects on the differences in breeding values between genotyped and non-genotyped animals, and between animals genotyped on different chips. The breeding values of young genotyped individuals were biased upward, and the magnitude was up to 0.8 genetic standard deviations, compared with breeding values of non-genotyped individuals. Similarly, the magnitude of a small value added to the diagonal of the genomic relationship matrix affected the level of average breeding values between groups of genotyped and non-genotyped animals. Cross-validation accuracies and regression coefficients were not sensitive to these factors. Conclusions Because, historically, different SNP chips have been used for genotyping different parts of a population, fine-tuning of imputation within and across SNP chips and handling of missing genotypes are crucial for reducing bias. Although all the SNPs used for estimating breeding values are present on the chip used for genotyping young animals, incompleteness and some genotype errors might lead to level-biases in breeding values.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 49-50
Author(s):  
Yvette Steyn ◽  
Daniela Lourenco ◽  
Ignacy Misztal

Abstract Multi-breed evaluations have the advantage of increasing the size of the reference population for genomic evaluations and are quite simple; however, combining breeds usually have a negative impact on prediction accuracy. The aim of this study was to evaluate the use of a multi-breed genomic relationship matrix (G), where SNP for each breed are non-shared. The multi-breed G is set assuming known genotypes for one breed and missing genotypes for the remaining breeds. This setup may avoid spurious IBS relationships between breeds and considers breed-specific allele frequencies. This scenario was contrasted to multi-breed evaluations where all SNP are shared, i.e., the same SNP, and to single-breed evaluations. Different SNP densities, namely 9k and 45k, and different effective population sizes (Ne) were tested. Five breeds mimicking recent beef cattle populations that diverged from the same historical population were simulated using different selection criteria. It was assumed that QTL effects were the same over all breeds. For the recent population, generations 1 to 9 had approximately half of the animals genotyped, whereas all 1200 animals were genotyped in generation 10. Genotyped animals in generation 10 were set as validation; therefore, each breed had a validation set. Analysis were performed using single-step GBLUP (ssGBLUP). Prediction accuracy was calculated as correlation between true (T) and genomic estimated (GE) BV. Accuracies of GEBV were lower for the larger Ne and low SNP density. All three scenarios using 45K resulted in similar accuracies, suggesting that the marker density is high enough to account for relationships and linkage disequilibrium with QTL. A shared multi-breed evaluation using 9K resulted in a decrease of accuracy of 0.08 for a smaller Ne and 0.11 for a larger Ne. This loss was mostly avoided when markers were treated as non-shared within the same genomic relationship matrix.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
N. Khalilisamani ◽  
P. C. Thomson ◽  
H. W. Raadsma ◽  
M. S. Khatkar

AbstractGenotypic errors, conflict between recorded genotype and the true genotype, can lead to false or biased population genetic parameters. Here, the effect of genotypic errors on accuracy of genomic predictions and genomic relationship matrix are investigated using a simulation study based on population and genomic structure comparable to black tiger prawn, Penaeus monodon. Fifty full-sib families across five generations with phenotypic and genotypic information on 53 K SNPs were simulated. Ten replicates of different scenarios with three heritability estimates, equal and unequal family contributions were generated. Within each scenario, four SNP densities and three genotypic error rates in each SNP density were implemented. Results showed that family contribution did not have a substantial impact on accuracy of predictions across different datasets. In the absence of genotypic errors, 3 K SNP density was found to be efficient in estimating the accuracy, whilst increasing the SNP density from 3 to 20 K resulted in a marginal increase in accuracy of genomic predictions using the current population and genomic parameters. In addition, results showed that the presence of even 10% errors in a 10 and 20 K SNP panel might not have a severe impact on accuracy of predictions. However, below 10 K marker density, even a 5% error can result in lower accuracy of predictions.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 20-20
Author(s):  
Sungbong Jang ◽  
Shogo Tsuruta ◽  
Natalia Leite ◽  
Ignacy Misztal ◽  
Daniela Lourenco

Abstract The ability to identify true-positive variants increases as more genotyped animals are available. Although thousands of animals can be genotyped, the dimensionality of the genomic information is limited. Therefore, there is a certain number of animals that represent all chromosome segments (Me) segregating in the population. The number of Me can be approximated from the eigenvalue decomposition of the genomic relationship matrix (G). Thus, the limited dimensionality may help to identify the number of animals to be used in genome-wide association (GWA). The first objective of this study was to examine different discovery set sizes for GWA, with set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in G. Additionally, we investigated the impact of incorporating variants selected from different set sizes to regular SNP chip used for genomic prediction. Sequence data were simulated that contained 500k SNP and 2k QTL, where the genetic variance was fully explained by QTL. The GWA was conducted using the number of genotyped animals equal to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99 percent of the variance in G. Significant SNP had a p-value lower than 0.05 with Bonferroni correction. Further, SNP with the largest effect size (top10, 100, 500, 1k, 2k, and 4k) were also selected to be incorporated into the 50k regular chip. Genomic predictions using the 50k combined with selected SNP were conducted using single-step GBLUP (ssGBLUP). Using the number of animals corresponding to at least EIG98 enabled the identification of the largest effect size QTL. The greatest accuracy of prediction was obtained when the top 2k SNP was combined to the 50k chip. The dimensionality of genomic information should be taken into account for variant selection in GWAS.


Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 790 ◽  
Author(s):  
Daniela Lourenco ◽  
Andres Legarra ◽  
Shogo Tsuruta ◽  
Yutaka Masuda ◽  
Ignacio Aguilar ◽  
...  

Single-step genomic evaluation became a standard procedure in livestock breeding, and the main reason is the ability to combine all pedigree, phenotypes, and genotypes available into one single evaluation, without the need of post-analysis processing. Therefore, the incorporation of data on genotyped and non-genotyped animals in this method is straightforward. Since 2009, two main implementations of single-step were proposed. One is called single-step genomic best linear unbiased prediction (ssGBLUP) and uses single nucleotide polymorphism (SNP) to construct the genomic relationship matrix; the other is the single-step Bayesian regression (ssBR), which is a marker effect model. Under the same assumptions, both models are equivalent. In this review, we focus solely on ssGBLUP. The implementation of ssGBLUP into the BLUPF90 software suite was done in 2009, and since then, several changes were made to make ssGBLUP flexible to any model, number of traits, number of phenotypes, and number of genotyped animals. Single-step GBLUP from the BLUPF90 software suite has been used for genomic evaluations worldwide. In this review, we will show theoretical developments and numerical examples of ssGBLUP using SNP data from regular chips to sequence data.


2019 ◽  
Vol 97 (8) ◽  
pp. 3237-3245
Author(s):  
Amanda M Maiorano ◽  
Alula Assen ◽  
Piter Bijma ◽  
Ching-Yi Chen ◽  
Josineudson Augusto II Vasconcelos Silva ◽  
...  

Abstract Pooling semen of multiple boars is commonly used in swine production systems. Compared with single boar systems, this technique changes family structure creating maternal half-sib families. The aim of this simulation study was to investigate how pooling semen affects the accuracy of estimating direct and maternal effects for individual piglet birth weight, in purebred pigs. Different scenarios of pooling semen were simulated by allowing the same female to mate from 1 to 6 boars, per insemination, whereas litter size was kept constant (N = 12). In each pooled boar scenario, genomic information was used to construct either the genomic relationship matrix (G) or to reconstruct pedigree in addition to G. Genotypes were generated for 60,000 SNPs evenly distributed across 18 autosomes. From the 5 simulated generations, only animals from generations 3 to 5 were genotyped (N = 36,000). Direct and maternal true breeding values (TBV) were computed as the sum of the effects of the 1,080 QTLs. Phenotypes were constructed as the sum of direct TBV, maternal TBV, an overall mean of 1.25 kg, and a residual effect. The simulated heritabilities for direct and maternal effects were 0.056 and 0.19, respectively, and the genetic correlation between both effects was −0.25. All simulations were replicated 5 times. Variance components and direct and maternal heritability were estimated using average information REML. Predictions were computed via pedigree-based BLUP and single-step genomic BLUP (ssGBLUP). Genotyped littermates in the last generation were used for validation. Prediction accuracies were calculated as correlations between EBV and TBV for direct (accdirect) and maternal (accmat) effects. When boars were known, accdirect were 0.21 (1 boar) and 0.26 (6 boars) for BLUP, whereas for ssGBLUP, they were 0.38 (1 boar) and 0.43 (6 boars). When boars were unknown, accdirect was lower in BLUP but similar in ssGBLUP. For the scenario with known boars, accmat was 0.58 and 0.63 for 1 and 6 boars, respectively, under ssGBLUP. For unknown boars, accmat was 0.63 for 2 boars and 0.62 for 6 boars in ssGBLUP. In general, accdirect and accmat were lower in the single-boar scenario compared with pooled semen scenarios, indicating that a half-sib structure is more adequate to estimate direct and maternal effects. Using pooled semen from multiple boars can help us to improve accuracy of predicting maternal and direct effects when maternal half-sib families are larger than 2.


Sign in / Sign up

Export Citation Format

Share Document