scholarly journals Predicting the accuracy of genomic predictions

2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Jack C. M. Dekkers ◽  
Hailin Su ◽  
Jian Cheng

Abstract Background Mathematical models are needed for the design of breeding programs using genomic prediction. While deterministic models for selection on pedigree-based estimates of breeding values (PEBV) are available, these have not been fully developed for genomic selection, with a key missing component being the accuracy of genomic EBV (GEBV) of selection candidates. Here, a deterministic method was developed to predict this accuracy within a closed breeding population based on the accuracy of GEBV and PEBV in the reference population and the distance of selection candidates from their closest ancestors in the reference population. Methods The accuracy of GEBV was modeled as a combination of the accuracy of PEBV and of EBV based on genomic relationships deviated from pedigree (DEBV). Loss of the accuracy of DEBV from the reference to the target population was modeled based on the effective number of independent chromosome segments in the reference population (Me). Measures of Me derived from the inverse of the variance of relationships and from the accuracies of GEBV and PEBV in the reference population, derived using either a Fisher information or a selection index approach, were compared by simulation. Results Using simulation, both the Fisher and the selection index approach correctly predicted accuracy in the target population over time, both with and without selection. The index approach, however, resulted in estimates of Me that were less affected by heritability, reference size, and selection, and which are, therefore, more appropriate as a population parameter. The variance of relationships underpredicted Me and was greatly affected by selection. A leave-one-out cross-validation approach was proposed to estimate required accuracies of EBV in the reference population. Aspects of the methods were validated using real data. Conclusions A deterministic method was developed to predict the accuracy of GEBV in selection candidates in a closed breeding population. The population parameter Me that is required for these predictions can be derived from an available reference data set, and applied to other reference data sets and traits for that population. This method can be used to evaluate the benefit of genomic prediction and to optimize genomic selection breeding programs.

2018 ◽  
Author(s):  
Julien Frouin ◽  
Axel Labeyrie ◽  
Arnaud Boisnard ◽  
Gian Attilio Sacchi ◽  
Nourollah Ahmadi

AbstractThe high concentration of arsenic in the paddy fields and, consequently, in the rice grains is a critical issue in many rice-growing areas. Breeding arsenic tolerant rice varieties that prevent As uptake and its accumulation in the grains is a major mitigation options. However, the genetic control of the trait is complex, involving large number of gene of limited individual effect, and raises the question of the most efficient breeding method. Using data from three years of experiment in a naturally arsenic-reach field, we analysed the performances of the two major breeding methods: conventional, quantitative trait loci based, selection targeting loci involved in arsenic tolerance, and the emerging, genomic selection, predicting genetic values without prior hypotheses on causal relationships between markers and target traits. We showed that once calibrated in a reference population the accuracy of genomic prediction of arsenic content in the grains of the breeding population was rather high, ensuring genetic gains per time unite close to phenotypic selection. Conversely, selection targeting quantitative loci proved to be less robust as, though in agreement with the literature on the genetic bases of arsenic tolerance, few target loci identified in the reference population could be validated in the breeding population.


Author(s):  
Pascal Duenk ◽  
Piter Bijma ◽  
Yvonne C J Wientjes ◽  
Mario P L Calus

Abstract Breeding programs aiming to improve the performance of crossbreds may benefit from genomic prediction of crossbred (CB) performance for purebred (PB) selection candidates. In this review, we compared genomic prediction strategies that differed in (1) the genomic prediction model used, or (2) the data used in the reference population. We found 27 unique studies, two of which used deterministic simulation, 11 used stochastic simulation, and 14 real data. Differences in accuracy and response to selection between strategies depended on i) the value of the purebred crossbred genetic correlation (rpc), ii) the genetic distance between the parental lines, iii) the size of PB and CB reference populations, and iv) the relatedness of these reference populations to the selection candidates. In studies where a PB reference population was used, the use of a dominance model yielded accuracies that were equal to or higher than those of additive models. When rpc was lower than ~0.8, and was caused mainly by GxE, it was beneficial to create a reference population of PB animals that are tested in a CB environment. In general, the benefit of collecting CB information increased with decreasing rpc. For a given rpc, the benefit of collecting CB information increased with increasing size of the reference populations. Collecting CB information was not beneficial when rpc was higher than ~0.9, especially when the reference populations were small. Collecting only phenotypes of CB animals may slightly improve accuracy and response to selection, but requires that the pedigree is known. It is therefore advisable to genotype these CB animals as well. Finally, considering the breed-origin of alleles allows for modelling breed-specific effects in the CB, but this did not always lead to higher accuracies. Our review shows that the differences in accuracy and response to selection between strategies depend on several factors. One of the most important factors is rpc, and we therefore recommend to obtain accurate estimates of rpc of all breeding goal traits. Furthermore, knowledge about the importance of components of rpc (i.e., dominance, epistasis, and GxE) can help breeders to decide which model to use, and whether to collect data on animals in a CB environment. Future research should focus on the development of a tool that predicts accuracy and response to selection from scenario specific parameters.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


Author(s):  
T. Pook ◽  
L. Büttgen ◽  
A. Ganesan ◽  
N.T. Ha ◽  
H. Simianer

ABSTRACTSelective breeding is a continued element of both crop and livestock breeding since early prehistory. In this work, we are proposing a new web-based simulation framework (“MoBPSweb”) that is combining a unified language to describe breeding programs with the simulation software MoBPS, standing for ‘Modular Breeding Program Simulator’. Thereby, MoBPSweb is providing a flexible environment to enter, simulate, evaluate and compare breeding programs. Inputs can be provided via modules ranging from a Vis.js-based flash environment for “drawing” the breeding program to a variety of modules to provide phenotype information, economic parameters and other relevant information. Similarly, results of the simulation study can be extracted and compared to other scenarios via output modules (e.g. observed phenotypes, accuracy of breeding value estimation, inbreeding rates). Usability of the framework is showcased along a toy example of a dairy cattle breeding program on farm level, with comparing scenarios differing in implemented breeding value estimation, selection index and selection intensity being considered. Comparisons are made considering both short and long-term effects of the different scenarios in terms of genomic gains, rates of inbreeding and the accuracy of the breeding value estimation. Lastly, general applicability of the MoBPSweb framework and the general potential for simulation studies for genetics and in particular in breeding are discussed.


2019 ◽  
Author(s):  
Daniel Runcie ◽  
Hao Cheng

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.


2015 ◽  
Vol 8 (2) ◽  
Author(s):  
Xuehui Li ◽  
Yanling Wei ◽  
Ananta Acharya ◽  
Julie L. Hansen ◽  
Jamie L. Crawford ◽  
...  

animal ◽  
2016 ◽  
Vol 10 (6) ◽  
pp. 1067-1075 ◽  
Author(s):  
G. Su ◽  
P. Ma ◽  
U.S. Nielsen ◽  
G.P. Aamand ◽  
G. Wiggans ◽  
...  

Animals ◽  
2019 ◽  
Vol 9 (9) ◽  
pp. 672 ◽  
Author(s):  
Beatriz Castro Dias Castro Dias Cuyabano ◽  
Hanna Wackel ◽  
Donghyun Shin ◽  
Cedric Gondro

Genomic models that incorporate dense marker information have been widely used for predicting genomic breeding values since they were first introduced, and it is known that the relationship between individuals in the reference population and selection candidates affects the prediction accuracy. When genomic evaluation is performed over generations of the same population, prediction accuracy is expected to decay if the reference population is not updated. Therefore, the reference population must be updated in each generation, but little is known about the optimal way to do it. This study presents an empirical assessment of the prediction accuracy of genomic breeding values of production traits, across five generations in two Korean pig breeds. We verified the decay in prediction accuracy over time when the reference population was not updated. Additionally we compared the prediction accuracy using only the previous generation as the reference population, as opposed to using all previous generations as the reference population. Overall, the results suggested that, although there is a clear need to continuously update the reference population, it may not be necessary to keep all ancestral genotypes. Finally, comprehending how the accuracy of genomic prediction evolves over generations within a population adds relevant information to improve the performance of genomic selection.


Sign in / Sign up

Export Citation Format

Share Document