scholarly journals 63 A comparison of clustering methods for cross-validation of genomic predictors when training on phenotypes or deregressed Estimated Breeding Values

2019 ◽  
Vol 97 (Supplement_2) ◽  
pp. 34-35
Author(s):  
Johnna Baller ◽  
Jeremy T Howard ◽  
Stephen Kachman ◽  
Matthew L Spangler

Abstract The objective of the study was to evaluate the impact of clustering methods for cross-validation on the accuracy of prediction of molecular breeding values (MBV) in Red Angus cattle (n = 9,763) and in simulation. Individuals were clustered using seven methods [k-means, k-medoids, principal component analysis on the numerator relationship matrix (A) and identical-by-state genomic matrix (G) as data and covariance matrices, and random] and two response variables [deregressed Estimated Breeding Values (DEBV) and adjusted phenotypes]. Genotypes were imputed to a 50K reference panel. Using cross-validation and a Bayes C model, MBV were estimated for traits including birth weight (BWT), marbling (MARB), rib-eye area (REA), and yearling weight (YWT) for DEBV and BWT, YWT, and ultrasonically measured intramuscular fat percentage and rib eye area for adjusted phenotypes. A bivariate animal model was used to estimate prediction accuracies calculated using the genetic correlation between estimated MBV and the associated response variable. To quantify the difference between true and estimated accuracies, a simulation mimicking a cattle population was replicated five times. The same clustering methods were used as with the Red Angus data with the addition of forward validation and two genotyping methods (random selection and selection of the top 25% of animals). Predicted accuracies were estimated similarly and true accuracies were estimated using the residual correlation of a bivariate model using MBV and true breeding values (TBV). The Rand index was used to quantify the similarity between clustering methods, showing relationship-based clusters were clearly different from random clusters. In simulation, random genotyping led to higher estimated accuracies than selection of top individuals; however, estimated accuracies over predicted true accuracies with random genotyping but under predicted true accuracies with the selection of top individuals. When forward validation was evaluated within simulation, results suggested DEBV led to less biased estimates of MBV accuracy.

1987 ◽  
Vol 67 (1) ◽  
pp. 201-204
Author(s):  
R. A. KEMP ◽  
J. W. WILTON

A numerator relationship matrix (Ac) due to sires and dams was compared with a numerator relationship matrix (Ai) due to sires and maternal grandsires in a multiple-trait-reduced animal model (MT-RAM). Best linear unbiased predictors of estimated breeding values (EBV) for 200-d weight (WW) and postweaning gain (PG) (gain from 200 to 365 d of age) were estimated from data simulating a beef cattle population. As expected, mean EBV and bias (EBV-BV) for both traits were not significantly affected by different relationship matrices. The mean variances of EBV with Ac were larger than those with Ai for both traits. The mean EBV variances were closer to mean BV variances with Ac compared to Ai, which is consistent with increased precision of EBV. Product-moment correlations of EBV and BV (accuracy of prediction) were not equal (P < 0.01) for Ac compared to Ai with WW or PG. The EBV using Ac were more accurate than EBV using Ai. The increased precision and accuracy of EBV from a MT-RAM with Ac would result in greater genetic progress in the population. Key words: Relationship matrices, estimated breeding values, MT-RAM


2021 ◽  
Vol 12 ◽  
Author(s):  
Mohammad Ali Nilforooshan ◽  
Dorian Garrick

Reduced models are equivalent models to the full model that enable reduction in the computational demand for solving the problem, here, mixed model equations for estimating breeding values of selection candidates. Since phenotyped animals provide data to the model, the aim of this study was to reduce animal models to those equations corresponding to phenotyped animals. Non-phenotyped ancestral animals have normally been included in analyses as they facilitate formation of the inverse numerator relationship matrix. However, a reduced model can exclude those animals and obtain identical solutions for the breeding values of the animals of interest. Solutions corresponding to non-phenotyped animals can be back-solved from the solutions of phenotyped animals and specific blocks of the inverted relationship matrix. This idea was extended to other forms of animal model and the results from each reduced model (and back-solving) were identical to the results from the corresponding full model. Previous studies have been mainly focused on reduced animal models that absorb equations corresponding to non-parents and solve equations only for parents of phenotyped animals. These two types of reduced animal model can be combined to formulate only equations corresponding to phenotyped parents of phenotyped progeny.


Heredity ◽  
2020 ◽  
Vol 126 (1) ◽  
pp. 206-217
Author(s):  
Xiang Ma ◽  
Ole F. Christensen ◽  
Hongding Gao ◽  
Ruihua Huang ◽  
Bjarne Nielsen ◽  
...  

AbstractRecords on groups of individuals could be valuable for predicting breeding values when a trait is difficult or costly to measure on single individuals, such as feed intake and egg production. Adding genomic information has shown improvement in the accuracy of genetic evaluation of quantitative traits with individual records. Here, we investigated the value of genomic information for traits with group records. Besides, we investigated the improvement in accuracy of genetic evaluation for group-recorded traits when including information on a correlated trait with individual records. The study was based on a simulated pig population, including three scenarios of group structure and size. The results showed that both the genomic information and a correlated trait increased the accuracy of estimated breeding values (EBVs) for traits with group records. The accuracies of EBV obtained from group records with a size 24 were much lower than those with a size 12. Random assignment of animals to pens led to lower accuracy due to the weaker relationship between individuals within each group. It suggests that group records are valuable for genetic evaluation of a trait that is difficult to record on individuals, and the accuracy of genetic evaluation can be considerably increased using genomic information. Moreover, the genetic evaluation for a trait with group records can be greatly improved using a bivariate model, including correlated traits that are recorded individually. For efficient use of group records in genetic evaluation, relatively small group size and close relationships between individuals within one group are recommended.


2019 ◽  
Vol 97 (Supplement_2) ◽  
pp. 37-39
Author(s):  
Andrea Plotzki Reis ◽  
Rodrigo Fagundes da Costa ◽  
Fabyano Fonseca e Silva ◽  
Fernando Flores Cardoso ◽  
Matthew L Spangler

Abstract The aim of this study was to investigate selective phenotyping to maintain adequate prediction accuracy. A simulation was conducted, with 10 replicates, using QMSim to mimic the structure and size of a Braford population. A population with 50 generations, 500 animals per generation, was created with phenotyping and genotyping beginning in generation 11. The scenarios investigated were: 1) Randomly phenotype and genotype 10, 25, 50, 75, and 100% of individuals each generation and; 2) Randomly phenotype and genotype 10, 25, 50, 75, and 100% of individuals in every-other generation. Estimated breeding values (EBV) were obtained using single-step GBLUP and accuracy was determined as the correlation between true BV from simulation and those estimated from the blupf90 family of programs. For scenarios where phenotyping and genotyping occurred every generation, EBV accuracies in generation 11 and 50 ranged from 0.32 to 0.32, 0.42 to 0.43, 0.49 to 0.51, 0.53 to 0.56 and 0.57 to 0.59 when 10, 25, 50, 75, and 100% of animals were chosen, respectively. The highest accuracies were 0.40 and 0.50 in generation 38 for scenarios 10 and 25%; 0.56, 0.61 and 0.64 in generation 40 for scenarios 50, 75 and 100%, respectively. When animals were selected every-other generation, EBV accuracy in generation 11 and 50 ranged from 0.24 to 0.26, 0.36 to 0.36, 0.43 to 0.42, 0.48 to 0.44 and 0.53 to 0.48 for 10, 25, 50, 75 and 100% of selected animals, respectively. The highest accuracies were in generation 23 for scenario 10% (0.31), in generation 37 for scenarios 25 (0.43), 50 (0.50) and 75% (0.55) and in generation 39 for 100% (0.59). Although increasing the density of phenotyped and genotyped animals increased prediction accuracy, some gains were marginal. These differences in accuracy must be contemplated in an economic framework to determine the cost-benefit of additional information.


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Øyvind Nordbø ◽  
Arne B. Gjuvsland ◽  
Leiv Sigbjørn Eikje ◽  
Theo Meuwissen

Abstract Background The main aim of single-step genomic predictions was to facilitate optimal selection in populations consisting of both genotyped and non-genotyped individuals. However, in spite of intensive research, biases still occur, which make it difficult to perform optimal selection across groups of animals. The objective of this study was to investigate whether incomplete genotype datasets with errors could be a potential source of level-bias between genotyped and non-genotyped animals and between animals genotyped on different single nucleotide polymorphism (SNP) panels in single-step genomic predictions. Results Incomplete and erroneous genotypes of young animals caused biases in breeding values between groups of animals. Systematic noise or missing data for less than 1% of the SNPs in the genotype data had substantial effects on the differences in breeding values between genotyped and non-genotyped animals, and between animals genotyped on different chips. The breeding values of young genotyped individuals were biased upward, and the magnitude was up to 0.8 genetic standard deviations, compared with breeding values of non-genotyped individuals. Similarly, the magnitude of a small value added to the diagonal of the genomic relationship matrix affected the level of average breeding values between groups of genotyped and non-genotyped animals. Cross-validation accuracies and regression coefficients were not sensitive to these factors. Conclusions Because, historically, different SNP chips have been used for genotyping different parts of a population, fine-tuning of imputation within and across SNP chips and handling of missing genotypes are crucial for reducing bias. Although all the SNPs used for estimating breeding values are present on the chip used for genotyping young animals, incompleteness and some genotype errors might lead to level-biases in breeding values.


2009 ◽  
Vol 49 (6) ◽  
pp. 525 ◽  
Author(s):  
W. A. McKiernan ◽  
J. F. Wilkins ◽  
J. Irwin ◽  
B. Orchard ◽  
S. A. Barwick

The steer progeny of sires genetically diverse for fatness and meat yield were grown at different rates from weaning to feedlot entry and effects on growth, carcass and meat-quality traits were examined. The present paper, the second of a series, reports the effects of genetic and growth treatments on carcass traits. A total of 43 sires, within three ‘carcass class’ categories, defined as high potential for meat yield, marbling or both traits, was used. Where available, estimated breeding values for the carcass traits of retail beef yield (RBY%) and intramuscular fat (IMF%) were used in selection of the sires, which were drawn from Angus, Charolais, Limousin, Black Wagyu and Red Wagyu breeds, to provide a range of carcass sire types across the three carcass classes. Steer progeny of Hereford dams were grown at either conventional (slow: ~0.5 kg/day) or accelerated (fast: ~0.7 kg/day) rates from weaning to feedlot entry weight, with group means of ~400 kg. Accelerated and conventionally grown groups from successive calvings were managed to enter the feedlot at similar mean feedlot entry weights at the same time for the 100-day finish under identical conditions. Faster-backgrounded groups had greater fat levels in the carcass than did slower-backgrounded groups. Dressing percentages and fat colour were unaffected by growth treatment, whereas differences in ossification score and meat colour were explained by age at slaughter. There were significant effects of sire type for virtually all carcass traits measured in the progeny. Differences in hot standard carcass weight showed a clear advantage to European types, with variable outcomes for the Angus and Wagyu progeny. Sire selection by estimated breeding values (within the Angus breed) for yield and/or fat traits resulted in expected differences in the progeny for those traits. There were large differences in both meat yield and fatness among the types of greatest divergence in genetic potential for those traits, with the Black Wagyu and the Angus IMF clearly superior for IMF%, and the European types for RBY%. The Angus IMF progeny performed as well as that of the Black Wagyu for all fatness traits. Differences in RBY% among types were generally reflected by similar differences in eye muscle area. Results here provide guidelines for selecting sire types to target carcass traits for specific markets. The absence of interactions between growth and genetic treatments ensures that consistent responses can be expected across varying management and production systems.


2020 ◽  
Vol 98 (12) ◽  
Author(s):  
Ignacy Misztal ◽  
Shogo Tsuruta ◽  
Ivan Pocrnic ◽  
Daniela Lourenco

Abstract Single-step genomic best linear unbiased prediction with the Algorithm for Proven and Young (APY) is a popular method for large-scale genomic evaluations. With the APY algorithm, animals are designated as core or noncore, and the computing resources to create the inverse of the genomic relationship matrix (GRM) are reduced by inverting only a portion of that matrix for core animals. However, using different core sets of the same size causes fluctuations in genomic estimated breeding values (GEBVs) up to one additive standard deviation without affecting prediction accuracy. About 2% of the variation in the GRM is noise. In the recursion formula for APY, the error term modeling the noise is different for every set of core animals, creating changes in breeding values. While average changes are small, and correlations between breeding values estimated with different core animals are close to 1.0, based on the normal distribution theory, outliers can be several times bigger than the average. Tests included commercial datasets from beef and dairy cattle and from pigs. Beyond a certain number of core animals, the prediction accuracy did not improve, but fluctuations decreased with more animals. Fluctuations were much smaller than the possible changes based on prediction error variance. GEBVs change over time even for animals with no new data as genomic relationships ties all the genotyped animals, causing reranking of top animals. In contrast, changes in nongenomic models without new data are small. Also, GEBV can change due to details in the model, such as redefinition of contemporary groups or unknown parent groups. In particular, increasing the fraction of blending of the GRM with a pedigree relationship matrix from 5% to 20% caused changes in GEBV up to 0.45 SD, with a correlation of GEBV &gt; 0.99. Fluctuations in genomic predictions are part of genomic evaluation models and are also present without the APY algorithm when genomic evaluations are computed with updated data. The best approach to reduce the impact of fluctuations in genomic evaluations is to make selection decisions not on individual animals with limited individual accuracy but on groups of animals with high average accuracy.


2008 ◽  
Vol 90 (2) ◽  
pp. 199-208 ◽  
Author(s):  
T. ROUGHSEDGE ◽  
R. PONG-WONG ◽  
J.A. WOOLLIAMS ◽  
B. VILLANUEVA

SummaryOver recent years, selection methodologies have been developed to allow the maximization of genetic gain whilst constraining the rate of inbreeding. The desired rate of inbreeding is achieved by constraining the group coancestry using the numerator relationship matrix computed from pedigree. It is shown that when the method is applied to mixed inheritance models, where a QTL is segregating together with polygenes, the rate of inbreeding achieved in the region around a QTL is greater than the desired level. The constraint on group coancestry at specific positions around the QTL is achieved by using a relationship matrix computed from pedigree and genetic markers. However, the rate of inbreeding realized at the position of constraint is lower than that expected given the assumed relationship between group coancestry and the subsequent rate of inbreeding. The use of markers in the calculation of the relationship matrix allows the selection of candidates with very low or zero relationships because they are homozygous for alternative alleles, which results in a heterozygosity amongst their offspring higher than would be expected given their allele frequencies. A generation of random selection restored the expected relationship between group coancestry and inbreeding.


Sign in / Sign up

Export Citation Format

Share Document