Extensions of BLUP models for genomic prediction in heterogeneous populations: Application in a diverse switchgrass sample

Mapping Intimacies ◽

10.1101/124081 ◽

2017 ◽

Author(s):

Guillaume P. Ramstein ◽

Michael D. Casler

Keyword(s):

Genetic Gain ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Dna Marker ◽

Population Heterogeneity ◽

Heterogeneous Populations ◽

Interactive Models ◽

Marker Information

ABSTRACTGenomic prediction is a useful tool to accelerate genetic gain in selection using DNA marker information. However, this technology usually relies on models that are not designed to accommodate population heterogeneity, which results from differences in marker effects across genetic backgrounds. Previous studies have proposed to cope with population heterogeneity using diverse approaches: (i) either ignoring it, therefore relying on the robustness of standard approaches; (ii) reducing it, by selecting homogenous subsets of individuals in the sample; or (iii) modelling it by using interactive models. In this study we assessed all three possible approaches, applying existing and novel procedures for each of them. All procedures developed are based on deterministic optimizations, can account for heteroscedasticity, and are applicable in contexts of admixed populations. In a case study on a diverse switchgrass sample, we compared the procedures to a control where predictions rely on homogeneous subsamples. Ignoring heterogeneity was often not detrimental, and sometimes beneficial, to prediction accuracy, compared to the control. Reducing heterogeneity did not result in further increases in accuracy. However, in scenarios of limited subsample sizes, a novel procedure, which accounted for redundancy within subsamples, outperformed the existing procedure, which only considered relationships to selection candidates. Modelling heterogeneity resulted in substantial increases in accuracy, in the cases where accounting for population heterogeneity yielded a highly significant improvement in fit. Our study exemplifies advantages and limits of the various approaches that are promising in various contexts of population heterogeneity, e.g. prediction based on historical datasets or dynamic breeding.

Download Full-text

The look ahead trace back optimizer for genomic selection under transparent and opaque simulators

Scientific Reports ◽

10.1038/s41598-021-83567-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fatemeh Amini ◽

Felipe Restrepo Franco ◽

Guiping Hu ◽

Lizhi Wang

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Genomic Prediction ◽

In Planta ◽

Data Set ◽

Look Ahead ◽

Trace Back ◽

New Variant ◽

The Look ◽

Partially Observable

AbstractRecent advances in genomic selection (GS) have demonstrated the importance of not only the accuracy of genomic prediction but also the intelligence of selection strategies. The look ahead selection algorithm, for example, has been found to significantly outperform the widely used truncation selection approach in terms of genetic gain, thanks to its strategy of selecting breeding parents that may not necessarily be elite themselves but have the best chance of producing elite progeny in the future. This paper presents the look ahead trace back algorithm as a new variant of the look ahead approach, which introduces several improvements to further accelerate genetic gain especially under imperfect genomic prediction. Perhaps an even more significant contribution of this paper is the design of opaque simulators for evaluating the performance of GS algorithms. These simulators are partially observable, explicitly capture both additive and non-additive genetic effects, and simulate uncertain recombination events more realistically. In contrast, most existing GS simulation settings are transparent, either explicitly or implicitly allowing the GS algorithm to exploit certain critical information that may not be possible in actual breeding programs. Comprehensive computational experiments were carried out using a maize data set to compare a variety of GS algorithms under four simulators with different levels of opacity. These results reveal how differently a same GS algorithm would interact with different simulators, suggesting the need for continued research in the design of more realistic simulators. As long as GS algorithms continue to be trained in silico rather than in planta, the best way to avoid disappointing discrepancy between their simulated and actual performances may be to make the simulator as akin to the complex and opaque nature as possible.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

Investigating the impact of input variable selection on daily solar radiation prediction accuracy using data-driven models: a case study in northern Iran

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-021-02070-5 ◽

2021 ◽

Author(s):

Mohammad Sina Jahangir ◽

Seyed Mostafa Biazar ◽

David Hah ◽

John Quilty ◽

Mohammad Isazadeh

Keyword(s):

Solar Radiation ◽

Variable Selection ◽

Prediction Accuracy ◽

Data Driven ◽

Northern Iran ◽

Using Data ◽

The Impact

Download Full-text

Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship

PLoS ONE ◽

10.1371/journal.pone.0189775 ◽

2017 ◽

Vol 12 (12) ◽

pp. e0189775 ◽

Cited By ~ 24

Author(s):

S. Hong Lee ◽

Sam Clark ◽

Julius H. J. van der Werf

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy

Download Full-text

PSXII-22 Genomic prediction accuracy for feed efficiency related traits using different pseudo-phenotypes, prediction and validation methods in Nellore cattle

Journal of Animal Science ◽

10.1093/jas/skaa278.446 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 245-246

Author(s):

Cláudio U Magnabosco ◽

Fernando Lopes ◽

Valentina Magnabosco ◽

Raysildo Lobo ◽

Leticia Pereira ◽

...

Keyword(s):

Body Weight ◽

Weight Gain ◽

Genomic Prediction ◽

Feed Efficiency ◽

Prediction Accuracy ◽

Body Weight Gain ◽

Prediction Methods ◽

Genomic Breeding ◽

Validation Population ◽

Nellore Cattle

Abstract The aim of the study was to evaluate prediction methods, validation approaches and pseudo-phenotypes for the prediction of the genomic breeding values of feed efficiency related traits in Nellore cattle. It used the phenotypic and genotypic information of 4,329 and 3,594 animals, respectively, which were tested for residual feed intake (RFI), dry matter intake (DMI), feed efficiency (FE), feed conversion ratio (FCR), residual body weight gain (RG), and residual intake and body weight gain (RIG). Six prediction methods were used: ssGBLUP, BayesA, BayesB, BayesCπ, BLASSO, and BayesR. Three validation approaches were used: 1) random: where the data was randomly divided into ten subsets and the validation was done in each subset at a time; 2) age: the division into the training (2010 to 2016) and validation population (2017) were based on the year of birth; 3) genetic breeding value (EBV) accuracy: the data was split in the training population being animals with accuracy above 0.45; and validation population those below 0.45. We checked the accuracy and bias of genomic value (GEBV). The results showed that the GEBV accuracy was the highest when the prediction is obtained with ssGBLUP (0.05 to 0.31) (Figure 1). The low heritability obtained, mainly for FE (0.07 ± 0.03) and FCR (0.09 ± 0.03), limited the GEBVs accuracy, which ranged from low to moderate. The regression coefficient estimates were close to 1, and similar between the prediction methods, validation approaches, and pseudo-phenotypes. The cross-validation presented the most accurate predictions ranging from 0.07 to 0.037. The prediction accuracy was higher for phenotype adjusted for fixed effects than for EBV and EBV deregressed (30.0 and 34.3%, respectively). Genomic prediction can provide a reliable estimate of genomic breeding values for RFI, DMI, RG and RGI, as to even say that those traits may have higher genetic gain than FE and FCR.

Download Full-text

Increased genomic prediction accuracy in wheat breeding using a large Australian panel

Theoretical and Applied Genetics ◽

10.1007/s00122-017-2975-4 ◽

2017 ◽

Vol 130 (12) ◽

pp. 2543-2555 ◽

Cited By ~ 12

Author(s):

Adam Norman ◽

Julian Taylor ◽

Emi Tanaka ◽

Paul Telfer ◽

James Edwards ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Wheat Breeding

Download Full-text

The effects of training population design on genomic prediction accuracy in wheat

Theoretical and Applied Genetics ◽

10.1007/s00122-019-03327-y ◽

2019 ◽

Cited By ~ 16

Author(s):

Stefan McKinnon Edwards ◽

Jaap B. Buntjer ◽

Robert Jackson ◽

Alison R. Bentley ◽

Jacob Lage ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Training Population ◽

Population Design

Download Full-text

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

10.1101/595397 ◽

2019 ◽

Author(s):

Daniel Runcie ◽

Hao Cheng

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Selection Index ◽

Parametric Method ◽

Multiple Traits ◽

Gold Standard Method ◽

Secondary Traits ◽

Validation Strategy

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.

Download Full-text

Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection

G3 Genes|Genome|Genetics ◽

10.1534/g3.116.036582 ◽

2017 ◽

Vol 7 (3) ◽

pp. 801-811 ◽

Cited By ~ 16

Author(s):

Dominik Müller ◽

Pascal Schopp ◽

Albrecht E. Melchinger

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Prediction Accuracy ◽

Synthetic Populations

Download Full-text

Natural variation and genomic prediction of growth, physiological traits, and nitrogen-use efficiency in perennial ryegrass under low-nitrogen stress

Journal of Experimental Botany ◽

10.1093/jxb/eraa388 ◽

2020 ◽

Vol 71 (20) ◽

pp. 6670-6683

Author(s):

Xiongwei Zhao ◽

Gang Nie ◽

Yanyu Yao ◽

Zhongjie Ji ◽

Jianhua Gao ◽

...

Keyword(s):

Perennial Ryegrass ◽

Nitrogen Use Efficiency ◽

Ridge Regression ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Physiological Traits ◽

Grass Species ◽

Nitrogen Use ◽

Use Efficiency

Abstract Genomic prediction of nitrogen-use efficiency (NUE) has not previously been studied in perennial grass species exposed to low-N stress. Here, we conducted a genomic prediction of physiological traits and NUE in 184 global accessions of perennial ryegrass (Lolium perenne) in response to a normal (7.5 mM) and low (0.75 mM) supply of N. After 21 d of treatment under greenhouse conditions, significant variations in plant height increment (ΔHT), leaf fresh weight (LFW), leaf dry weight (LDW), chlorophyll index (Chl), chlorophyll fluorescence, leaf N and carbon (C) contents, C/N ratio, and NUE were observed in accessions , but to a greater extent under low-N stress. Six genomic prediction models were applied to the data, namely the Bayesian method Bayes C, Bayesian LASSO, Bayesian Ridge Regression, Ridge Regression-Best Linear Unbiased Prediction, Reproducing Kernel Hilbert Spaces, and randomForest. These models produced similar prediction accuracy of traits within the normal or low-N treatments, but the accuracy differed between the two treatments. ΔHT, LFW, LDW, and C were predicted slightly better under normal N with a mean Pearson r-value of 0.26, compared with r=0.22 under low N, while the prediction accuracies for Chl, N, C/N, and NUE were significantly improved under low-N stress with a mean r=0.45, compared with r=0.26 under normal N. The population panel contained three population structures, which generally had no effect on prediction accuracy. The moderate prediction accuracies obtained for N, C, and NUE under low-N stress are promising, and suggest a feasible means by which germplasm might be initially assessed for further detailed studies in breeding programs.

Download Full-text