Review: Optimizing genomic selection for crossbred performance by model improvement and data collection

Abstract Breeding programs aiming to improve the performance of crossbreds may benefit from genomic prediction of crossbred (CB) performance for purebred (PB) selection candidates. In this review, we compared genomic prediction strategies that differed in (1) the genomic prediction model used, or (2) the data used in the reference population. We found 27 unique studies, two of which used deterministic simulation, 11 used stochastic simulation, and 14 real data. Differences in accuracy and response to selection between strategies depended on i) the value of the purebred crossbred genetic correlation (rpc), ii) the genetic distance between the parental lines, iii) the size of PB and CB reference populations, and iv) the relatedness of these reference populations to the selection candidates. In studies where a PB reference population was used, the use of a dominance model yielded accuracies that were equal to or higher than those of additive models. When rpc was lower than ~0.8, and was caused mainly by GxE, it was beneficial to create a reference population of PB animals that are tested in a CB environment. In general, the benefit of collecting CB information increased with decreasing rpc. For a given rpc, the benefit of collecting CB information increased with increasing size of the reference populations. Collecting CB information was not beneficial when rpc was higher than ~0.9, especially when the reference populations were small. Collecting only phenotypes of CB animals may slightly improve accuracy and response to selection, but requires that the pedigree is known. It is therefore advisable to genotype these CB animals as well. Finally, considering the breed-origin of alleles allows for modelling breed-specific effects in the CB, but this did not always lead to higher accuracies. Our review shows that the differences in accuracy and response to selection between strategies depend on several factors. One of the most important factors is rpc, and we therefore recommend to obtain accurate estimates of rpc of all breeding goal traits. Furthermore, knowledge about the importance of components of rpc (i.e., dominance, epistasis, and GxE) can help breeders to decide which model to use, and whether to collect data on animals in a CB environment. Future research should focus on the development of a tool that predicts accuracy and response to selection from scenario specific parameters.

Download Full-text

Optimizing genomic reference populations to improve crossbred performance

Genetics Selection Evolution ◽

10.1186/s12711-020-00573-3 ◽

2020 ◽

Vol 52 (1) ◽

Author(s):

Yvonne C. J. Wientjes ◽

Piter Bijma ◽

Mario P. L. Calus

Keyword(s):

Genomic Prediction ◽

Genetic Relatedness ◽

Reference Population ◽

Breeding Goal ◽

Poultry Breeding ◽

Population Sizes ◽

Estimated Breeding Values ◽

Production Animals ◽

Chromosome Segments ◽

Different Levels

Abstract Background In pig and poultry breeding, the objective is to improve the performance of crossbred production animals, while selection takes place in the purebred parent lines. One way to achieve this is to use genomic prediction with a crossbred reference population. A crossbred reference population benefits from expressing the breeding goal trait but suffers from a lower genetic relatedness with the purebred selection candidates than a purebred reference population. Our aim was to investigate the benefit of using a crossbred reference population for genomic prediction of crossbred performance for: (1) different levels of relatedness between the crossbred reference population and purebred selection candidates, (2) different levels of the purebred-crossbred correlation, and (3) different reference population sizes. We simulated a crossbred breeding program with 0, 1 or 2 multiplication steps to generate the crossbreds, and compared the accuracy of genomic prediction of crossbred performance in one generation using either a purebred or a crossbred reference population. For each scenario, we investigated the empirical accuracy based on simulation and the predicted accuracy based on the estimated effective number of independent chromosome segments between the reference animals and selection candidates. Results When the purebred-crossbred correlation was 0.75, the accuracy was highest for a two-way crossbred reference population but similar for purebred and four-way crossbred reference populations, for all reference population sizes. When the purebred-crossbred correlation was 0.5, a purebred reference population always resulted in the lowest accuracy. Among the different crossbred reference populations, the accuracy was slightly lower when more multiplication steps were used to create the crossbreds. In general, the benefit of crossbred reference populations increased when the size of the reference population increased. All predicted accuracies overestimated their corresponding empirical accuracies, but the different scenarios were ranked accurately when the reference population was large. Conclusions The benefit of a crossbred reference population becomes larger when the crossbred population is more related to the purebred selection candidates, when the purebred-crossbred correlation is lower, and when the reference population is larger. The purebred-crossbred correlation and reference population size interact with each other with respect to their impact on the accuracy of genomic estimated breeding values.

Download Full-text

Exploring the size of reference population for expected accuracy of genomic prediction using simulated and real data in Japanese Black cattle

BMC Genomics ◽

10.1186/s12864-021-08121-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Masayuki Takeda ◽

Keiichi Inoue ◽

Hidemi Oyama ◽

Katsuo Uchiyama ◽

Kanako Yoshinari ◽

...

Keyword(s):

Genomic Prediction ◽

Simulation Analysis ◽

Carcass Traits ◽

Real Data ◽

Reference Population ◽

Weighting Factor ◽

Breeding Value ◽

Japanese Black Cattle ◽

Simulation Results ◽

Estimated Breeding Value

Abstract Background Size of reference population is a crucial factor affecting the accuracy of prediction of the genomic estimated breeding value (GEBV). There are few studies in beef cattle that have compared accuracies achieved using real data to that achieved with simulated data and deterministic predictions. Thus, extent to which traits of interest affect accuracy of genomic prediction in Japanese Black cattle remains obscure. This study aimed to explore the size of reference population for expected accuracy of genomic prediction for simulated and carcass traits in Japanese Black cattle using a large amount of samples. Results A simulation analysis showed that heritability and size of reference population substantially impacted the accuracy of GEBV, whereas the number of quantitative trait loci did not. The estimated numbers of independent chromosome segments (Me) and the related weighting factor (w) derived from simulation results and a maximum likelihood (ML) approach were 1900–3900 and 1, respectively. The expected accuracy for trait with heritability of 0.1–0.5 fitted well with empirical values when the reference population comprised > 5000 animals. The heritability for carcass traits was estimated to be 0.29–0.41 and the accuracy of GEBVs was relatively consistent with simulation results. When the reference population comprised 7000–11,000 animals, the accuracy of GEBV for carcass traits can range 0.73–0.79, which is comparable to estimated breeding value obtained in the progeny test. Conclusion Our simulation analysis demonstrated that the expected accuracy of GEBV for a polygenic trait with low-to-moderate heritability could be practical in Japanese Black cattle population. For carcass traits, a total of 7000–11,000 animals can be a sufficient size of reference population for genomic prediction.

Download Full-text

Improving Selection Efficiency of Crop Breeding With Genomic Prediction Aided Sparse Phenotyping

Frontiers in Plant Science ◽

10.3389/fpls.2021.735285 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sang He ◽

Yong Jiang ◽

Rebecca Thistlethwaite ◽

Matthew J. Hayden ◽

Richard Trethowan ◽

...

Keyword(s):

Genomic Prediction ◽

Selection Response ◽

Response To Selection ◽

Selection Efficiency ◽

Crop Breeding ◽

Main Interest ◽

Data Set ◽

Breeding Programs ◽

Selection Accuracy

Increasing the number of environments for phenotyping of crop lines in earlier stages of breeding programs can improve selection accuracy. However, this is often not feasible due to cost. In our study, we investigated a sparse phenotyping method that does not test all entries in all environments, but instead capitalizes on genomic prediction to predict missing phenotypes in additional environments without extra phenotyping expenditure. The breeders’ main interest – response to selection – was directly simulated to evaluate the effectiveness of the sparse genomic phenotyping method in a wheat and a rice data set. Whether sparse phenotyping resulted in more selection response depended on the correlations of phenotypes between environments. The sparse phenotyping method consistently showed statistically significant higher responses to selection, compared to complete phenotyping, when the majority of completely phenotyped environments were negatively (wheat) or lowly positively (rice) correlated and any extension environment was highly positively correlated with any of the completely phenotyped environments. When all environments were positively correlated (wheat) or any highly positively correlated environments existed (wheat and rice), sparse phenotyping did not improved response. Our results indicate that genomics-based sparse phenotyping can improve selection response in the middle stages of crop breeding programs.

Download Full-text

Predicting the accuracy of genomic predictions

Genetics Selection Evolution ◽

10.1186/s12711-021-00647-w ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Jack C. M. Dekkers ◽

Hailin Su ◽

Jian Cheng

Keyword(s):

Genomic Prediction ◽

Reference Data ◽

Selection Index ◽

Target Population ◽

Reference Population ◽

Breeding Population ◽

Population Parameter ◽

Deterministic Method ◽

Breeding Programs ◽

Index Approach

Abstract Background Mathematical models are needed for the design of breeding programs using genomic prediction. While deterministic models for selection on pedigree-based estimates of breeding values (PEBV) are available, these have not been fully developed for genomic selection, with a key missing component being the accuracy of genomic EBV (GEBV) of selection candidates. Here, a deterministic method was developed to predict this accuracy within a closed breeding population based on the accuracy of GEBV and PEBV in the reference population and the distance of selection candidates from their closest ancestors in the reference population. Methods The accuracy of GEBV was modeled as a combination of the accuracy of PEBV and of EBV based on genomic relationships deviated from pedigree (DEBV). Loss of the accuracy of DEBV from the reference to the target population was modeled based on the effective number of independent chromosome segments in the reference population (Me). Measures of Me derived from the inverse of the variance of relationships and from the accuracies of GEBV and PEBV in the reference population, derived using either a Fisher information or a selection index approach, were compared by simulation. Results Using simulation, both the Fisher and the selection index approach correctly predicted accuracy in the target population over time, both with and without selection. The index approach, however, resulted in estimates of Me that were less affected by heritability, reference size, and selection, and which are, therefore, more appropriate as a population parameter. The variance of relationships underpredicted Me and was greatly affected by selection. A leave-one-out cross-validation approach was proposed to estimate required accuracies of EBV in the reference population. Aspects of the methods were validated using real data. Conclusions A deterministic method was developed to predict the accuracy of GEBV in selection candidates in a closed breeding population. The population parameter Me that is required for these predictions can be derived from an available reference data set, and applied to other reference data sets and traits for that population. This method can be used to evaluate the benefit of genomic prediction and to optimize genomic selection breeding programs.

Download Full-text

Incorporating selfing to purge deleterious alleles in a cassava genomic selection program

10.1101/2020.04.04.025841 ◽

2020 ◽

Author(s):

Mohamed Somo ◽

Jean-Luc Jannink

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Additive Models ◽

Allele Frequencies ◽

Favorable Allele ◽

Breeding Cycle ◽

Founder Population ◽

Dominance Model ◽

Selfed Progeny ◽

Directional Dominance

AbstractCassava has been found to carry high levels of recessive deleterious mutations and it is known to suffer from inbreeding depression. Breeders therefore consider specific approaches to decrease cassava’s genetic load. Using self fertilization to unmask deleterious recessive alleles and therefore accelerate their purging is one possibility. Before implementation of this approach we sought to understand better its consequences through simulation. Founder populations with high directional dominance were simulated using a natural selection forward simulator. The founder population was then subjected to five generations of genomic selection in schemes that did or did not include a generation of phenotypic selection on selfed progeny. We found that genomic selection was less effective under the directional dominance model than under the additive models that have commonly been used in simulations. While selection did increase favorable allele frequencies, increased inbreeding during selection caused decreased gain in genotypic values under the directional dominance. While purging selection on selfed individuals was effective in the first breeding cycle, it was not effective in later cycles, an effect we attributed to the fact that the generation of selfing decreased the relatedness of the genomic prediction training population from selection candidates. That decreased relatedness caused genomic prediction accuracy to be lower in schemes incorporating selfing. We found that selection on individuals partially inbred by one generation of selfing did increase mean genetic value of the partially inbred population, but that this gain was accompanied by a relatively small increase in favorable allele frequencies such that improvement in the outbred population was lower than might have been intuited.

Download Full-text

Improving Selection Efficiency of Crop Breeding with a Genomic Prediction Aided Partial Phenotyping Strategy

10.21203/rs.3.rs-365517/v1 ◽

2021 ◽

Author(s):

Sang He ◽

Yong Jiang ◽

Rebecca Thistlethwaite ◽

Matthew Hayden ◽

Richard Trethowan ◽

...

Keyword(s):

Genomic Prediction ◽

Selection Response ◽

Response To Selection ◽

Selection Efficiency ◽

Crop Breeding ◽

Main Interest ◽

Breeding Programs ◽

Selection Accuracy

Abstract Increasing the number of environments for phenotyping of crop lines in earlier stages of breeding programs can improve selection accuracy. However, this is often not feasible due to cost. In our study, we investigated a partial phenotyping strategy that does not test all entries in all environments, but instead capitalizes on genomic prediction to predict missing phenotypes in additional environments without extra phenotyping expenditure. The breeders’ main interest – response to selection – was directly simulated to evaluate the effectiveness of the partial genomic phenotyping strategy in a wheat dataset. Whether the partial phenotyping strategy resulted in more selection response depended on the correlations of phenotypes between environments. The partial phenotyping strategy consistently showed statistically significant higher simulated responses to selection, compared to complete phenotyping, when the majority of completely phenotyped environments were negatively correlated and any extension environment was highly positively correlated with any of the completely phenotyped environments. Our results indicate that genomics-based partial phenotyping can improve selection response at middle stages of crop breeding programs.

Download Full-text

What every mathematician should know about modelling crime

European Journal of Applied Mathematics ◽

10.1017/s0956792510000070 ◽

2010 ◽

Vol 21 (4-5) ◽

pp. 275-281 ◽

Cited By ~ 9

Author(s):

MARCUS FELSON

Keyword(s):

Mathematical Models ◽

Real Data ◽

Future Research ◽

Data Modelling

This paper by a criminologist explains why it makes more sense to model criminal acts than to model criminals, how many preconceptions about crime can mislead modellers and offers some simple crime modelling ideas. Many opportunities for simulation now exist, and new opportunities for real-data modelling are emerging. The author suggests mathematical models of crime, including offender foraging for crime targets, as a rich area for future research.

Download Full-text

Lock-In and Team Effects

Journal of Sports Economics ◽

10.1177/1527002515577885 ◽

2015 ◽

Vol 18 (4) ◽

pp. 376-387

Author(s):

Trey Dronyk-Trosper ◽

Brandli Stitzel

Keyword(s):

Set Covering ◽

Future Research ◽

Prior Work ◽

Omitted Variable Bias ◽

Specific Effects ◽

Heterogeneous Effects ◽

Team Success ◽

Variable Bias ◽

Lock In ◽

Least Squares Approach

How important is recruiting to a football program’s success? While prior research has attempted to answer this question, we utilize an extensive panel set covering 13 years of games along with a two-stage least squares approach to investigate the effects of recruiting on team success. This article also includes new control variables to account for omitted variable bias that prior work may have missed. We also split our sample to investigate whether recruiting displays heterogeneous effects across schools. Additionally, we find evidence that the benefits of recruiting are driven by team-specific effects, indicating that team success may be more heavily derived from the ability of teams to harness and improve their recruits than their ability to utilize each athlete’s raw abilities. This leads to important revelations regarding future research into both the value of recruits and what drives a football team’s success.

Download Full-text

An evaluation of the interpretability and predictive performance of the BayesR model for genomic prediction

10.1101/2020.10.23.351700 ◽

2020 ◽

Author(s):

Fanny Mollandin ◽

Andrea Rau ◽

Pascal Croiseau

Keyword(s):

Linkage Disequilibrium ◽

Qtl Mapping ◽

Effect Size ◽

Genomic Prediction ◽

Simulated Data ◽

Predictive Performance ◽

Real Data ◽

Sliding Windows ◽

Size Classes ◽

The Impact

ABSTRACTTechnological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.

Download Full-text

The Evolution of Data Sharing Practices in the Psychological Literature

10.31234/osf.io/3xdja ◽

2021 ◽

Author(s):

Judith Neve ◽

Guillaume A Rousselet

Keyword(s):

Data Sharing ◽

Linear Models ◽

Generalized Additive Models ◽

Additive Models ◽

Future Research ◽

Science Data ◽

Editorial Policies ◽

Psychological Science ◽

The Impact

Sharing data has many benefits. However, data sharing rates remain low, for the most part well below 50%. A variety of interventions encouraging data sharing have been proposed. We focus here on editorial policies. Kidwell et al. (2016) assessed the impact of the introduction of badges in Psychological Science; Hardwicke et al. (2018) assessed the impact of Cognition’s mandatory data sharing policy. Both studies found policies to improve data sharing practices, but only assessed the impact of the policy for up to 25 months after its implementation. We examined the effect of these policies over a longer term by reusing their data and collecting a follow-up sample including articles published up until December 31st, 2019. We fit generalized additive models as these allow for a flexible assessment of the effect of time, in particular to identify non-linear changes in the trend. These models were compared to generalized linear models to examine whether the non-linearity is needed. Descriptive results and the outputs from generalized additive and linear models were coherent with previous findings: following the policies in Cognition and Psychological Science, data sharing statement rates increased immediately and continued to increase beyond the timeframes examined previously, until reaching close to 100%. In Clinical Psychological Science, data sharing statement rates started to increase only two years following the implementation of badges. Reusability rates jumped from close to 0% to around 50% but did not show changes within the pre-policy nor the post-policy timeframes. Journals that did not implement a policy showed no change in data sharing rates or reusability over time. There was variability across journals in the levels of increase, so we suggest future research should examine a larger number of policies to draw conclusions about their efficacy. We also encourage future research to investigate the barriers to data sharing specific to psychology subfields to identify the best interventions to tackle them.

Download Full-text