Increased genomic prediction accuracy in wheat breeding using a large Australian panel

Anther extrusion (AE) is the most important male floral trait for hybrid wheat seed production. AE is a complex quantitative trait that is difficult to phenotype reliably in field experiments not only due to high genotype-by-environment effects but also due to the short expression window in the field condition. In this study, we conducted a genome-wide association scan (GWAS) and explored the possibility of applying genomic prediction (GP) for AE in the CIMMYT hybrid wheat breeding program. An elite set of male lines (n = 603) were phenotype for anther count (AC) and anther visual score (VS) across three field experiments in 2017–2019 and genotyped with the 20K Infinitum is elect SNP array. GWAS produced five marker trait associations with small effects. For GP, the main effects of lines (L), environment (E), genomic (G) and pedigree relationships (A), and their interaction effects with environments were used to develop seven statistical models of incremental complexity. The base model used only L and E, whereas the most complex model included L, E, G, A, and G × E and A × E. These models were evaluated in three cross-validation scenarios (CV0, CV1, and CV2). In cross-validation CV0, data from two environments were used to predict an untested environment; in random cross-validation CV1, the test set was never evaluated in any environment; and in CV2, the genotypes in the test set were evaluated in only a subset of environments. The prediction accuracies ranged from −0.03 to 0.74 for AC and −0.01 to 0.54 for VS across different models and CV schemes. For both traits, the highest prediction accuracies with low variance were observed in CV2, and inclusion of the interaction effects increased prediction accuracy for AC only. In CV0, the prediction accuracy was 0.73 and 0.45 for AC and VS, respectively, indicating the high reliability of across environment prediction. Genomic prediction appears to be a very reliable tool for AE in hybrid wheat breeding. Moreover, high prediction accuracy in CV0 demonstrates the possibility of implementing genomic selection across breeding cycles in related germplasm, aiding the rapid breeding cycle.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship

PLoS ONE ◽

10.1371/journal.pone.0189775 ◽

2017 ◽

Vol 12 (12) ◽

pp. e0189775 ◽

Cited By ~ 24

Author(s):

S. Hong Lee ◽

Sam Clark ◽

Julius H. J. van der Werf

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy

Download Full-text

PSXII-22 Genomic prediction accuracy for feed efficiency related traits using different pseudo-phenotypes, prediction and validation methods in Nellore cattle

Journal of Animal Science ◽

10.1093/jas/skaa278.446 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 245-246

Author(s):

Cláudio U Magnabosco ◽

Fernando Lopes ◽

Valentina Magnabosco ◽

Raysildo Lobo ◽

Leticia Pereira ◽

...

Keyword(s):

Body Weight ◽

Weight Gain ◽

Genomic Prediction ◽

Feed Efficiency ◽

Prediction Accuracy ◽

Body Weight Gain ◽

Prediction Methods ◽

Genomic Breeding ◽

Validation Population ◽

Nellore Cattle

Abstract The aim of the study was to evaluate prediction methods, validation approaches and pseudo-phenotypes for the prediction of the genomic breeding values of feed efficiency related traits in Nellore cattle. It used the phenotypic and genotypic information of 4,329 and 3,594 animals, respectively, which were tested for residual feed intake (RFI), dry matter intake (DMI), feed efficiency (FE), feed conversion ratio (FCR), residual body weight gain (RG), and residual intake and body weight gain (RIG). Six prediction methods were used: ssGBLUP, BayesA, BayesB, BayesCπ, BLASSO, and BayesR. Three validation approaches were used: 1) random: where the data was randomly divided into ten subsets and the validation was done in each subset at a time; 2) age: the division into the training (2010 to 2016) and validation population (2017) were based on the year of birth; 3) genetic breeding value (EBV) accuracy: the data was split in the training population being animals with accuracy above 0.45; and validation population those below 0.45. We checked the accuracy and bias of genomic value (GEBV). The results showed that the GEBV accuracy was the highest when the prediction is obtained with ssGBLUP (0.05 to 0.31) (Figure 1). The low heritability obtained, mainly for FE (0.07 ± 0.03) and FCR (0.09 ± 0.03), limited the GEBVs accuracy, which ranged from low to moderate. The regression coefficient estimates were close to 1, and similar between the prediction methods, validation approaches, and pseudo-phenotypes. The cross-validation presented the most accurate predictions ranging from 0.07 to 0.037. The prediction accuracy was higher for phenotype adjusted for fixed effects than for EBV and EBV deregressed (30.0 and 34.3%, respectively). Genomic prediction can provide a reliable estimate of genomic breeding values for RFI, DMI, RG and RGI, as to even say that those traits may have higher genetic gain than FE and FCR.

Download Full-text

The effects of training population design on genomic prediction accuracy in wheat

Theoretical and Applied Genetics ◽

10.1007/s00122-019-03327-y ◽

2019 ◽

Cited By ~ 16

Author(s):

Stefan McKinnon Edwards ◽

Jaap B. Buntjer ◽

Robert Jackson ◽

Alison R. Bentley ◽

Jacob Lage ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Training Population ◽

Population Design

Download Full-text

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

10.1101/595397 ◽

2019 ◽

Author(s):

Daniel Runcie ◽

Hao Cheng

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Selection Index ◽

Parametric Method ◽

Multiple Traits ◽

Gold Standard Method ◽

Secondary Traits ◽

Validation Strategy

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.

Download Full-text

Natural variation and genomic prediction of growth, physiological traits, and nitrogen-use efficiency in perennial ryegrass under low-nitrogen stress

Journal of Experimental Botany ◽

10.1093/jxb/eraa388 ◽

2020 ◽

Vol 71 (20) ◽

pp. 6670-6683

Author(s):

Xiongwei Zhao ◽

Gang Nie ◽

Yanyu Yao ◽

Zhongjie Ji ◽

Jianhua Gao ◽

...

Keyword(s):

Perennial Ryegrass ◽

Nitrogen Use Efficiency ◽

Ridge Regression ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Physiological Traits ◽

Grass Species ◽

Nitrogen Use ◽

Use Efficiency

Abstract Genomic prediction of nitrogen-use efficiency (NUE) has not previously been studied in perennial grass species exposed to low-N stress. Here, we conducted a genomic prediction of physiological traits and NUE in 184 global accessions of perennial ryegrass (Lolium perenne) in response to a normal (7.5 mM) and low (0.75 mM) supply of N. After 21 d of treatment under greenhouse conditions, significant variations in plant height increment (ΔHT), leaf fresh weight (LFW), leaf dry weight (LDW), chlorophyll index (Chl), chlorophyll fluorescence, leaf N and carbon (C) contents, C/N ratio, and NUE were observed in accessions , but to a greater extent under low-N stress. Six genomic prediction models were applied to the data, namely the Bayesian method Bayes C, Bayesian LASSO, Bayesian Ridge Regression, Ridge Regression-Best Linear Unbiased Prediction, Reproducing Kernel Hilbert Spaces, and randomForest. These models produced similar prediction accuracy of traits within the normal or low-N treatments, but the accuracy differed between the two treatments. ΔHT, LFW, LDW, and C were predicted slightly better under normal N with a mean Pearson r-value of 0.26, compared with r=0.22 under low N, while the prediction accuracies for Chl, N, C/N, and NUE were significantly improved under low-N stress with a mean r=0.45, compared with r=0.26 under normal N. The population panel contained three population structures, which generally had no effect on prediction accuracy. The moderate prediction accuracies obtained for N, C, and NUE under low-N stress are promising, and suggest a feasible means by which germplasm might be initially assessed for further detailed studies in breeding programs.

Download Full-text

A study of Genomic Prediction across Generations of Two Korean Pig Populations

Animals ◽

10.3390/ani9090672 ◽

2019 ◽

Vol 9 (9) ◽

pp. 672 ◽

Cited By ~ 1

Author(s):

Beatriz Castro Dias Castro Dias Cuyabano ◽

Hanna Wackel ◽

Donghyun Shin ◽

Cedric Gondro

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Reference Population ◽

Relevant Information ◽

Production Traits ◽

Genomic Evaluation ◽

Genomic Breeding ◽

Breeding Values ◽

The Relationship ◽

Dense Marker

Genomic models that incorporate dense marker information have been widely used for predicting genomic breeding values since they were first introduced, and it is known that the relationship between individuals in the reference population and selection candidates affects the prediction accuracy. When genomic evaluation is performed over generations of the same population, prediction accuracy is expected to decay if the reference population is not updated. Therefore, the reference population must be updated in each generation, but little is known about the optimal way to do it. This study presents an empirical assessment of the prediction accuracy of genomic breeding values of production traits, across five generations in two Korean pig breeds. We verified the decay in prediction accuracy over time when the reference population was not updated. Additionally we compared the prediction accuracy using only the previous generation as the reference population, as opposed to using all previous generations as the reference population. Overall, the results suggested that, although there is a clear need to continuously update the reference population, it may not be necessary to keep all ancestral genotypes. Finally, comprehending how the accuracy of genomic prediction evolves over generations within a population adds relevant information to improve the performance of genomic selection.

Download Full-text

Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations

Genetics Selection Evolution ◽

10.1186/s12711-019-0514-2 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 6

Author(s):

Nasir Moghaddar ◽

Majid Khansefid ◽

Julius H. J. van der Werf ◽

Sunduimijid Bolormaa ◽

Naomi Duijvesteijn ◽

...

Keyword(s):

Genome Sequence ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Sequence Data ◽

Whole Genome Sequence ◽

Sequence Variants ◽

Whole Genome ◽

Absolute Increase ◽

Genome Sequence Data ◽

Australian Sheep

Abstract Background Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes. Methods Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep. Results A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants. Conclusions Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.

Download Full-text