Genetic architecture and genomic prediction accuracy of apple quantitative traits across environments

Implementation of genomic tools is desirable to increase the efficiency of apple breeding. The apple reference population (apple REFPOP) proved useful for rediscovering loci, estimating genomic prediction accuracy, and studying genotype by environment interactions (GxE). Here we show contrasting genetic architecture and genomic prediction accuracies for 30 quantitative traits across up to six European locations using the apple REFPOP. A total of 59 stable and 277 location-specific associations were found using GWAS, 69.2% of which are novel when compared with 41 reviewed publications. Average genomic prediction accuracies of 0.18-0.88 were estimated using single-environment univariate, single-environment multivariate, multi-environment univariate, and multi-environment multivariate models. The GxE accounted for up to 24% of the phenotypic variability. This most comprehensive genomic study in apple in terms of trait-environment combinations provided knowledge of trait biology and prediction models that can be readily applied for marker-assisted or genomic selection, thus facilitating increased breeding efficiency.

Download Full-text

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

10.1101/595397 ◽

2019 ◽

Author(s):

Daniel Runcie ◽

Hao Cheng

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Selection Index ◽

Parametric Method ◽

Multiple Traits ◽

Gold Standard Method ◽

Secondary Traits ◽

Validation Strategy

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.

Download Full-text

Natural variation and genomic prediction of growth, physiological traits, and nitrogen-use efficiency in perennial ryegrass under low-nitrogen stress

Journal of Experimental Botany ◽

10.1093/jxb/eraa388 ◽

2020 ◽

Vol 71 (20) ◽

pp. 6670-6683

Author(s):

Xiongwei Zhao ◽

Gang Nie ◽

Yanyu Yao ◽

Zhongjie Ji ◽

Jianhua Gao ◽

...

Keyword(s):

Perennial Ryegrass ◽

Nitrogen Use Efficiency ◽

Ridge Regression ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Physiological Traits ◽

Grass Species ◽

Nitrogen Use ◽

Use Efficiency

Abstract Genomic prediction of nitrogen-use efficiency (NUE) has not previously been studied in perennial grass species exposed to low-N stress. Here, we conducted a genomic prediction of physiological traits and NUE in 184 global accessions of perennial ryegrass (Lolium perenne) in response to a normal (7.5 mM) and low (0.75 mM) supply of N. After 21 d of treatment under greenhouse conditions, significant variations in plant height increment (ΔHT), leaf fresh weight (LFW), leaf dry weight (LDW), chlorophyll index (Chl), chlorophyll fluorescence, leaf N and carbon (C) contents, C/N ratio, and NUE were observed in accessions , but to a greater extent under low-N stress. Six genomic prediction models were applied to the data, namely the Bayesian method Bayes C, Bayesian LASSO, Bayesian Ridge Regression, Ridge Regression-Best Linear Unbiased Prediction, Reproducing Kernel Hilbert Spaces, and randomForest. These models produced similar prediction accuracy of traits within the normal or low-N treatments, but the accuracy differed between the two treatments. ΔHT, LFW, LDW, and C were predicted slightly better under normal N with a mean Pearson r-value of 0.26, compared with r=0.22 under low N, while the prediction accuracies for Chl, N, C/N, and NUE were significantly improved under low-N stress with a mean r=0.45, compared with r=0.26 under normal N. The population panel contained three population structures, which generally had no effect on prediction accuracy. The moderate prediction accuracies obtained for N, C, and NUE under low-N stress are promising, and suggest a feasible means by which germplasm might be initially assessed for further detailed studies in breeding programs.

Download Full-text

A study of Genomic Prediction across Generations of Two Korean Pig Populations

Animals ◽

10.3390/ani9090672 ◽

2019 ◽

Vol 9 (9) ◽

pp. 672 ◽

Cited By ~ 1

Author(s):

Beatriz Castro Dias Castro Dias Cuyabano ◽

Hanna Wackel ◽

Donghyun Shin ◽

Cedric Gondro

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Reference Population ◽

Relevant Information ◽

Production Traits ◽

Genomic Evaluation ◽

Genomic Breeding ◽

Breeding Values ◽

The Relationship ◽

Dense Marker

Genomic models that incorporate dense marker information have been widely used for predicting genomic breeding values since they were first introduced, and it is known that the relationship between individuals in the reference population and selection candidates affects the prediction accuracy. When genomic evaluation is performed over generations of the same population, prediction accuracy is expected to decay if the reference population is not updated. Therefore, the reference population must be updated in each generation, but little is known about the optimal way to do it. This study presents an empirical assessment of the prediction accuracy of genomic breeding values of production traits, across five generations in two Korean pig breeds. We verified the decay in prediction accuracy over time when the reference population was not updated. Additionally we compared the prediction accuracy using only the previous generation as the reference population, as opposed to using all previous generations as the reference population. Overall, the results suggested that, although there is a clear need to continuously update the reference population, it may not be necessary to keep all ancestral genotypes. Finally, comprehending how the accuracy of genomic prediction evolves over generations within a population adds relevant information to improve the performance of genomic selection.

Download Full-text

Accounting for Group-Specific Allele Effects and Admixture in Genomic Predictions: Theory and Experimental Evaluation in Maize

Genetics ◽

10.1534/genetics.120.303278 ◽

2020 ◽

Vol 216 (1) ◽

pp. 27-41

Author(s):

Simon Rio ◽

Laurence Moreau ◽

Alain Charcosset ◽

Tristan Mary-Huard

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Best Linear Unbiased Prediction ◽

Linear Unbiased Prediction ◽

Modeling Group ◽

A Genome ◽

Specific Allele ◽

Best Linear Unbiased ◽

Unbiased Prediction

Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.

Download Full-text

Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs

Genetics Selection Evolution ◽

10.1186/s12711-021-00661-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Cheng Bian ◽

Dzianis Prakapenka ◽

Cheng Tan ◽

Ruifei Yang ◽

Di Zhu ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Average Daily Gain ◽

Live Weight ◽

Feed Conversion ◽

Muscle Area ◽

Haplotype Blocks ◽

Low Coverage

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.

Download Full-text

Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize

PLoS Genetics ◽

10.1371/journal.pgen.1009568 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009568

Author(s):

Anju Giri ◽

Merritt Khaipho-Burch ◽

Edward S. Buckler ◽

Guillaume P. Ramstein

Keyword(s):

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Prediction Models ◽

Cost Effective ◽

Rna Expression ◽

Haplotype Structure ◽

Association Panel ◽

Nested Association Mapping ◽

Mapping Panel

Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.

Download Full-text

Multi-generation genomic prediction of maize yield using parametric and non-parametric sparse selection indices

Heredity ◽

10.1038/s41437-021-00474-1 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Yoseph Beyene ◽

Manje Gowda ◽

Jose Crossa ◽

Paulino Pérez-Rodríguez ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Selection Index ◽

Maize Yield ◽

Additive Models ◽

Training Data ◽

Training Set ◽

Gaussian Kernels ◽

Non Parametric

AbstractGenomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.

Download Full-text

Selection of trait-specific markers and multi-environment models improve genomic predictive ability in rice

10.1101/482109 ◽

2018 ◽

Author(s):

Aditi Bhandari ◽

Jérôme Bartholomé ◽

Tuong-Vi Cao ◽

Nilima Kumari ◽

Julien frouin ◽

...

Keyword(s):

Drought Stress ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Predictive Ability ◽

Reference Population ◽

Snp Markers ◽

Selection Strategy ◽

Specific Marker ◽

Marker Selection

AbstractDeveloping high yielding rice varieties that are tolerant to drought stress is crucial for the sustainable livelihood of rice farmers in rainfed rice cropping ecosystems. Genomic selection (GS) promises to be an effective breeding option for these complex traits. We evaluated the effectiveness of two rather new options in the implementation of GS: trait and environment-specific marker selection and the use of multi-environment prediction models. A reference population of 280 rainfed lowland accessions endowed with 215k SNP markers data was phenotyped under a favorable and two managed drought environments. Trait-specific SNP subsets (28k) were selected for each trait under each environment, using results of GWAS performed with the complete genotype dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS) under two cross validation scenario: availability (CV2) or not (CV1) of phenotypic data for the validation set, in one of the environments. The most realistic trait-specific marker selection strategy achieved predictive ability (PA) of genomic prediction was up to 22% higher than markers selected on the bases of neutral linkage disequilibrium (LD). Tolerance to drought stress was up to 32% better predicted by multi-environment models (especially RKHS based models) under CV2 strategy. Under the less favorable CV1 strategy, the multi-environment models achieved similar PA than the single-environment predictions. We also showed that reasonable PA could be obtained with as few as 3,000 SNP markers, even in a population of low LD extent, provided marker selection is based on pairwise LD. The implications of these findings for breeding for drought tolerance are discussed. The most resource sparing option would be accurate phenotyping of the reference population in a favorable environment and under a managed drought, while the candidate population would be phenotyped only under one of those environments.

Download Full-text

Genomic Prediction of Two Complex Orthopedic Traits Across Multiple Pure and Mixed Breed Dogs

Frontiers in Genetics ◽

10.3389/fgene.2021.666740 ◽

2021 ◽

Vol 12 ◽

Author(s):

Liping Jiang ◽

Zhuo Li ◽

Jessica J. Hayward ◽

Kei Hayashi ◽

Ursula Krotscheck ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cruciate Ligament ◽

Pearson Correlation ◽

Snp Array ◽

Reference Population ◽

Genotype Data ◽

Dna Array ◽

Cranial Cruciate Ligament ◽

Validation Population

Canine hip dysplasia (CHD) and rupture of the cranial cruciate ligament (RCCL) are two complex inherited orthopedic traits of dogs. These two traits may occur concurrently in the same dog. Genomic prediction of these two diseases would benefit veterinary medicine, the dog’s owner, and dog breeders because of their high prevalence, and because both traits result in painful debilitating osteoarthritis in affected joints. In this study, 842 unique dogs from 6 breeds with hip and stifle phenotypes were genotyped on a customized Illumina high density 183 k single nucleotide polymorphism (SNP) array and also analyzed using an imputed dataset of 20,487,155 SNPs. To implement genomic prediction, two different statistical methods were employed: Genomic Best Linear Unbiased Prediction (GBLUP) and a Bayesian method called BayesC. The cross-validation results showed that the two methods gave similar prediction accuracy (r = 0.3–0.4) for CHD (measured as Norberg angle) and RCCL in the multi-breed population. For CHD, the average correlation of the AUC was 0.71 (BayesC) and 0.70 (GBLUP), which is a medium level of prediction accuracy and consistent with Pearson correlation results. For RCCL, the correlation of the AUC was slightly higher. The prediction accuracy of GBLUP from the imputed genotype data was similar to the accuracy from DNA array data. We demonstrated that the genomic prediction of CHD and RCCL with DNA array genotype data is feasible in a multiple breed population if there is a genetic connection, such as breed, between the reference population and the validation population. Albeit these traits have heritability of about one-third, higher accuracy is needed to implement in a natural population and predicting a complex phenotype will require much larger number of dogs within a breed and across breeds. It is possible that with higher accuracy, genomic prediction of these orthopedic traits could be implemented in a clinical setting for early diagnosis and treatment, and the selection of dogs for breeding. These results need continuous improvement in model prediction through ongoing genotyping and data sharing. When genomic prediction indicates that a dog is susceptible to one of these orthopedic traits, it should be accompanied by clinical and radiographic screening at an acceptable age with appropriate follow-up.

Download Full-text

Improving Genomic Prediction of Crossbred and Purebred Dairy Cattle

Frontiers in Genetics ◽

10.3389/fgene.2020.598580 ◽

2020 ◽

Vol 11 ◽

Author(s):

Majid Khansefid ◽

Michael E. Goddard ◽

Mekonnen Haile-Mariam ◽

Kon V. Konstantinov ◽

Chris Schrooten ◽

...

Keyword(s):

Genomic Prediction ◽

Milk Fat ◽

Prediction Accuracy ◽

Reference Population ◽

High Density ◽

Crossbred Cows ◽

Mixed Breed ◽

Best Linear Unbiased ◽

Custom Panel ◽

Accuracy Of Prediction

This study assessed the accuracy and bias of genomic prediction (GP) in purebred Holstein (H) and Jersey (J) as well as crossbred (H and J) validation cows using different reference sets and prediction strategies. The reference sets were made up of different combinations of 36,695 H and J purebreds and crossbreds. Additionally, the effect of using different sets of marker genotypes on GP was studied (conventional panel: 50k, custom panel enriched with, or close to, causal mutations: XT_50k, and conventional high-density with a limited custom set: pruned HDnGBS). We also compared the use of genomic best linear unbiased prediction (GBLUP) and Bayesian (emBayesR) models, and the traits tested were milk, fat, and protein yields. On average, by including crossbred cows in the reference population, the prediction accuracies increased by 0.01–0.08 and were less biased (regression coefficient closer to 1 by 0.02–0.16), and the benefit was greater for crossbreds compared to purebreds. The accuracy of prediction increased by 0.02 using XT_50k compared to 50k genotypes without affecting the bias. Although using pruned HDnGBS instead of 50k also increased the prediction accuracy by about 0.02, it increased the bias for purebred predictions in emBayesR models. Generally, emBayesR outperformed GBLUP for prediction accuracy when using 50k or pruned HDnGBS genotypes, but the benefits diminished with XT_50k genotypes. Crossbred predictions derived from a joint pure H and J reference were similar in accuracy to crossbred predictions derived from the two separate purebred reference sets and combined proportional to breed composition. However, the latter approach was less biased by 0.13. Most interestingly, using an equalized breed reference instead of an H-dominated reference, on average, reduced the bias of prediction by 0.16–0.19 and increased the accuracy by 0.04 for crossbred and J cows, with a little change in the H accuracy. In conclusion, we observed improved genomic predictions for both crossbreds and purebreds by equalizing breed contributions in a mixed breed reference that included crossbred cows. Furthermore, we demonstrate, that compared to the conventional 50k or high-density panels, our customized set of 50k sequence markers improved or matched the prediction accuracy and reduced bias with both GBLUP and Bayesian models.

Download Full-text