scholarly journals Bivariate genomic prediction of phenotypes by selecting epistatic interactions across years

2020 ◽  
Author(s):  
Elaheh Vojgani ◽  
Torsten Pook ◽  
Armin C. Hölker ◽  
Manfred Mayer ◽  
Chris-Carolin Schön ◽  
...  

AbstractThe importance of accurate genomic prediction of phenotypes in plant breeding is undeniable, as higher prediction accuracy can increase selection responses. In this study, we investigated the ability of three models to improve prediction accuracy by including phenotypic information from the last growing season. This was done by considering a single biological trait in two growing seasons (2017 and 2018) as separate traits in a multi-trait model. Thus, bivariate variants of the Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) and selective Epistatic Random Regression BLUP (sERRBLUP) as epistasis models were compared with respect to their prediction accuracies for the second year. The results indicate that bivariate ERRBLUP is slightly superior to bivariate GBLUP in predication accuracy, while bivariate sERRBLUP has the highest prediction accuracy in most cases. The average relative increase in prediction accuracy from bivariate GBLUP to maximum bivariate sERRBLUP across eight phenotypic traits and studied dataset from 471/402 doubled haploid lines in the European maize landrace Kemater Landmais Gelb/Petkuser Ferdinand Rot, were 7.61 and 3.47 percent, respectively. We further investigated the genomic correlation, phenotypic correlation and trait heritability as the factors affecting the bivariate model’s predication accuracy, with genetic correlation between growing seasons being the most important one. For all three considered model architectures results were far worse when using a univariate version of the model, e.g. with an average reduction in prediction accuracy of 0.23/0.14 for Kemater/Petkuser when using univariate GBLUP.Key MassageBivariate models based on selected subsets of pairwise SNP interactions can increase the prediction accuracy by utilizing phenotypic data across years under the assumption of high genomic correlation across years.

2021 ◽  
Author(s):  
Elaheh Vojgani ◽  
Torsten Pook ◽  
Armin C. Hölker ◽  
Manfred Mayer ◽  
Chris-Carolin Schön ◽  
...  

Abstract The importance of accurate genomic prediction of phenotypes in plant breeding is undeniable, as higher prediction accuracy can increase selection responses. In this study, we investigated the ability of three models to improve prediction accuracy by including phenotypic information from the last growing season. This was done by considering a single biological trait in two growing seasons (2017 and 2018) as separate traits in a multi-trait model. Thus, bivariate variants of the Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) and selective Epistatic Random Regression BLUP (sERRBLUP) as epistasis models were compared with respect to their prediction accuracies for the second year. The results indicate that bivariate ERRBLUP is almost identical to bivariate GBLUP in prediction accuracy, while bivariate sERRBLUP has the highest prediction accuracy in most cases. The obtained prediction accuracies were similar when utilizing pruned sets of SNPs and haplotype blocks, while utilizing haplotype blocks reduces the computational load significantly compared to utilizing pruned sets of SNPs. The prediction accuracies of bivariate GBLUP, ERRBLUP and sERRBLUP have been assessed across eight phenotypic traits and studied datasets from 471/402 doubled haploid lines in the European maize landrace Kemater Landmais Gelb/Petkuser Ferdinand Rot. We further investigated the genomic correlation, phenotypic correlation and trait heritability as factors affecting the bivariate models’ prediction accuracy, with genetic correlation between growing seasons being the most important one. For all three considered model architectures results were far worse when using a univariate version of the model.


Author(s):  
Elaheh Vojgani ◽  
Torsten Pook ◽  
Johannes W.R. Martini ◽  
Armin C. Hölker ◽  
Manfred Mayer ◽  
...  

AbstractWe compared the predictive ability of various prediction models for a maize dataset derived from 910 doubled haploid lines from European landraces (Kemater Landmais Gelb and Petkuser Ferdinand Rot), which were tested in six locations in Germany and Spain. The compared models were Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) accounting for all pairwise SNP interactions, and selective Epistatic Random Regression BLUP (sERRBLUP) accounting for a selected subset of pairwise SNP interactions. These models have been compared in both univariate and bivariate statistical settings within and across environments. Our results indicate that modeling all pairwise SNP interactions into the univariate/bivariate model (ERRBLUP) is not superior in predictive ability to the respective additive model (GBLUP). However, incorporating only a selected subset of interactions with the highest effect variances in univariate/bivariate sERRBLUP can increase predictive ability significantly compared to the univariate/bivariate GBLUP. Overall, bivariate models consistently outperform univariate models in predictive ability. Over all studied traits, locations, and landraces, the increase in prediction accuracy from univariate GBLUP to univariate sERRBLUP ranged from 5.9 to 112.4 percent, with an average increase of 47 percent. For bivariate models, the change ranged from −0.3 to +27.9 percent comparing the bivariate sERRBLUP to the bivariate GBLUP. The average increase across traits and locations was 11 percent. This considerable increase in predictive ability achieved by sERRBLUP may be of interest for “sparse testing” approaches in which only a subset of the lines/hybrids of interest is observed at each location.Key MessageThe prediction accuracy of genomic prediction of phenotypes can be increased by only including top ranked pairwise SNP interactions into the prediction models.


2018 ◽  
Author(s):  
Malachy Campbell ◽  
Harkamal Walia ◽  
Gota Morota

AbstractThe accessibility of high-throughput phenotyping platforms in both the greenhouse and field, as well as the relatively low cost of unmanned aerial vehicles, have provided researchers with an effective means to characterize large populations throughout the growing season. These longitudinal phenotypes can provide important insight into plant development and responses to the environment. Despite the growing use of these new phenotyping approaches in plant breeding, the use of genomic prediction models for longitudinal phenotypes is limited in major crop species. The objective of this study is to demonstrate the utility of random regression (RR) models using Legendre polynomials for genomic prediction of shoot growth trajectories in rice (Oryza sativa). An estimate of shoot biomass, projected shoot area (PSA), was recored over a period of 20 days for a panel of 357 diverse rice accessions using an image-based greenhouse phenotyping platform. A RR that included a fixed second-order Legendre polynomial, a random second-order Legendre polynomial for the additive genetic effect, a first-order Legendre polynomial for the environmental effect, and heterogeneous residual variances was used to model PSA trajectories. The utility of the RR model over a single time point (TP) approach, where PSA is fit at each time point independently, is shown through four prediction scenarios. In the first scenario, the RR and TP approaches were used to predict PSA for a set of lines lacking phenotypic data. The RR approach showed a 11.6% increase in prediction accuracy over the TP approach. Much of this improvement could be attributed to the greater additive genetic variance captured by the RR approach. The remaining scenarios focused forecasting future phenotypes using a subset of early time points for known lines with phenotypic data, as well new lines lacking phenotypic data. In all cases, PSA could be predicted with high accuracy (r: 0.79 to 0.89 and 0.55 to 0.58 for known and unknown lines, respectively). This study provides the first application of RR models for genomic prediction of a longitudinal trait in rice, and demonstrates that RR models can be effectively used to improve the accuracy of genomic prediction for complex traits compared to a TP approach.


2019 ◽  
Vol 15 ◽  
pp. 117693431983130 ◽  
Author(s):  
Diego Jarquín ◽  
Reka Howard ◽  
George Graef ◽  
Aaron Lorenz

An important and broadly used tool for selection purposes and to increase yield and genetic gain in plant breeding programs is genomic prediction (GP). Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype (eg, yield) of individuals for which only marker data are available. Higher prediction accuracy can be achieved not only by using efficient models but also by using quality molecular marker and phenotypic data. The steps of a typical quality control (QC) of marker data include the elimination of markers with certain level of minor allele frequency (MAF) and missing marker values and the imputation of missing marker values. In this article, we evaluated how the prediction accuracy is influenced by the combination of 12 MAF values, 27 different percentages of missing marker values, and 2 imputation techniques (IT; naïve and Random Forest (RF)). We constructed a response surface of prediction accuracy values for the two ITs as a function of MAF and percentage of missing marker values using soybean data from the University of Nebraska–Lincoln Soybean Breeding Program. We found that both the genetic architecture of the trait and the IT affect the prediction accuracy implying that we have to be careful how we perform QC on the marker data. For the corresponding combinations MAF-percentage of missing values we observed that implementing the RF imputation increased the number of markers by 2 to 5 times than the simple naïve imputation method that is based on the mean allele dosage of the non-missing values at each loci. We conclude that there is not a unique strategy (combination of the QCs and imputation method) that outperforms the results of the others for all traits.


2016 ◽  
Author(s):  
S. Hong Lee ◽  
W.M. Shalanee P. Weerasinghe ◽  
Naomi R. Wray ◽  
Michael E. Goddard ◽  
Julius H.J. Van der Werf

ABSTRACTGenomic prediction shows promise for personalised medicine in which diagnosis and treatment are tailored to individuals based on their genetic profiles. Genomic prediction is arguably the greatest need for complex diseases and disorders for which both genetic and non-genetic factors contribute to risk. However, we have no adequate insight of the accuracy of such predictions, and how accuracy may vary between individuals or between populations. In this study, we present a theoretical framework to demonstrate that prediction accuracy can be maximised by targeting more informative individuals in a discovery set with closer relationships with the subjects, making prediction more similar to those in populations with small effective size (Ne). Increase of prediction accuracy from closer relationships is achieved under an additive model and does not rely on any interaction effects (gene × gene, gene × environment or gene × family). Using theory, simulations and real data analyses, we show that the predictive accuracy or the area under the receiver operating characteristic curve (AUC) increased exponentially with decreasing Ne. For example, with a set of realistic parameters (the sample size of discovery set N=3000 and heritability h2=0.5), AUC value approached to 0.9 (Ne=100) from 0.6 (Ne=10000), and the top percentile of the estimated genetic profile scores had 23 times higher proportion of cases than the general population (with Ne=100), which increased from 2 times higher proportion of cases (with Ne=10000). This suggests that different interventions in the top percentile risk groups maybe justified (i.e. stratified medicine). In conclusion, it is argued that there is considerable room to increase prediction accuracy for polygenic traits by using an efficient design of a smaller Ne (e.g. a design consisting of closer relationships) so that genomic prediction can be more beneficial in clinical applications in the near future.


Author(s):  
Elaheh Vojgani ◽  
Torsten Pook ◽  
Johannes W. R. Martini ◽  
Armin C. Hölker ◽  
Manfred Mayer ◽  
...  

Abstract Key Message The accuracy of genomic prediction of phenotypes can be increased by including the top-ranked pairwise SNP interactions into the prediction model. Abstract We compared the predictive ability of various prediction models for a maize dataset derived from 910 doubled haploid lines from two European landraces (Kemater Landmais Gelb and Petkuser Ferdinand Rot), which were tested at six locations in Germany and Spain. The compared models were Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) accounting for all pairwise SNP interactions, and selective Epistatic Random Regression BLUP (sERRBLUP) accounting for a selected subset of pairwise SNP interactions. These models have been compared in both univariate and bivariate statistical settings for predictions within and across environments. Our results indicate that modeling all pairwise SNP interactions into the univariate/bivariate model (ERRBLUP) is not superior in predictive ability to the respective additive model (GBLUP). However, incorporating only a selected subset of interactions with the highest effect variances in univariate/bivariate sERRBLUP can increase predictive ability significantly compared to the univariate/bivariate GBLUP. Overall, bivariate models consistently outperform univariate models in predictive ability. Across all studied traits, locations and landraces, the increase in prediction accuracy from univariate GBLUP to univariate sERRBLUP ranged from 5.9 to 112.4 percent, with an average increase of 47 percent. For bivariate models, the change ranged from −0.3 to + 27.9 percent comparing the bivariate sERRBLUP to the bivariate GBLUP, with an average increase of 11 percent. This considerable increase in predictive ability achieved by sERRBLUP may be of interest for “sparse testing” approaches in which only a subset of the lines/hybrids of interest is observed at each location.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


PLoS ONE ◽  
2017 ◽  
Vol 12 (12) ◽  
pp. e0189775 ◽  
Author(s):  
S. Hong Lee ◽  
Sam Clark ◽  
Julius H. J. van der Werf

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 245-246
Author(s):  
Cláudio U Magnabosco ◽  
Fernando Lopes ◽  
Valentina Magnabosco ◽  
Raysildo Lobo ◽  
Leticia Pereira ◽  
...  

Abstract The aim of the study was to evaluate prediction methods, validation approaches and pseudo-phenotypes for the prediction of the genomic breeding values of feed efficiency related traits in Nellore cattle. It used the phenotypic and genotypic information of 4,329 and 3,594 animals, respectively, which were tested for residual feed intake (RFI), dry matter intake (DMI), feed efficiency (FE), feed conversion ratio (FCR), residual body weight gain (RG), and residual intake and body weight gain (RIG). Six prediction methods were used: ssGBLUP, BayesA, BayesB, BayesCπ, BLASSO, and BayesR. Three validation approaches were used: 1) random: where the data was randomly divided into ten subsets and the validation was done in each subset at a time; 2) age: the division into the training (2010 to 2016) and validation population (2017) were based on the year of birth; 3) genetic breeding value (EBV) accuracy: the data was split in the training population being animals with accuracy above 0.45; and validation population those below 0.45. We checked the accuracy and bias of genomic value (GEBV). The results showed that the GEBV accuracy was the highest when the prediction is obtained with ssGBLUP (0.05 to 0.31) (Figure 1). The low heritability obtained, mainly for FE (0.07 ± 0.03) and FCR (0.09 ± 0.03), limited the GEBVs accuracy, which ranged from low to moderate. The regression coefficient estimates were close to 1, and similar between the prediction methods, validation approaches, and pseudo-phenotypes. The cross-validation presented the most accurate predictions ranging from 0.07 to 0.037. The prediction accuracy was higher for phenotype adjusted for fixed effects than for EBV and EBV deregressed (30.0 and 34.3%, respectively). Genomic prediction can provide a reliable estimate of genomic breeding values for RFI, DMI, RG and RGI, as to even say that those traits may have higher genetic gain than FE and FCR.


Sign in / Sign up

Export Citation Format

Share Document