BWGS: a R package for genomic selection and its application to a wheat breeding programme

AbstractWe developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric.The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.

Download Full-text

Genomic predictive ability for foliar nutritive traits in perennial ryegrass

10.1101/727958 ◽

2019 ◽

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A Barrett ◽

Marty J Faville

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Nutritive Value ◽

Prediction Models ◽

Genotypic Variation ◽

Genetic Correlations ◽

Predictive Ability ◽

Water Soluble ◽

Training Set ◽

Sib Families

AbstractForage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of genotypic, environmental and genotype-by-environment (G × E) variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P<0.05) genotypic variation was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). G × E interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesC genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two environments. High predictive ability was observed for the mineral traits sulphur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from 1 million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined. For traits with lower predictive ability, multi-trait genomic prediction approaches that exploit the strong genetic correlations observed amongst some nutritive traits may be useful. This appears to be particularly important for WSC, considered one of the primary constituent of nutritive value for forages.

Download Full-text

Genomic Predictive Ability for Foliar Nutritive Traits in Perennial Ryegrass

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400880 ◽

2019 ◽

Vol 10 (2) ◽

pp. 695-708 ◽

Cited By ~ 6

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A. Barrett ◽

Marty J. Faville

Keyword(s):

Genomic Selection ◽

Perennial Ryegrass ◽

Nutritive Value ◽

Prediction Models ◽

Predictive Ability ◽

Genotyping By Sequencing ◽

Water Soluble ◽

Soluble Carbohydrate ◽

Training Set ◽

Sib Families

Forage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits in perennial ryegrass has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of family, location and family-by-location variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P < 0.05) family variance was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). Family-by-location interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesCπ genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two locations. High predictive ability was observed for the mineral traits sulfur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from one million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined.

Download Full-text

Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size

BMC Medical Research Methodology ◽

10.1186/s12874-021-01321-x ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Menelaos Pavlou ◽

Gareth Ambler ◽

Rumana Z. Omar

Keyword(s):

Risk Prediction ◽

Cluster Size ◽

Linear Models ◽

Prediction Models ◽

Predictive Accuracy ◽

Clustered Data ◽

Predictor Variable ◽

Simulated Data ◽

Predictive Ability ◽

Informative Cluster Size

Abstract Background Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. Methods Confounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions. Results Both CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC. Conclusion Ignoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions.

Download Full-text

Inbreeding in a Population of Polish Holstein-Friesian Young Bulls Before and After Genomic Selection

Annals of Animal Science ◽

10.2478/aoas-2019-0065 ◽

2020 ◽

Vol 20 (1) ◽

pp. 71-83

Author(s):

Piotr Topolski ◽

Wojciech Jagusiak

Keyword(s):

Genomic Selection ◽

Linear Trend ◽

Pedigree Analysis ◽

Breeding Programme ◽

Black And White ◽

Holstein Friesian ◽

Before And After ◽

Breeding Programmes ◽

Dynamics Of Population ◽

The Mean

AbstractInbreeding was analysed in a population of 14,144 Polish Black-and-White Holstein-Friesian (PBWHF) young bulls born between 1994 and 2017 and bred under both conventional and genomic breeding programmes. The inbreeding coefficients were computed using a model with genetic groups, according to the algorithm given by VanRaden. It was found that in the analysed population all bulls are inbred (100% of the population), with the mean coefficient of inbreeding ranging from 0.09% to 26.95%. Pedigree analysis also showed a relationship between the changing number of bulls over the years and the dynamics of population inbreeding. These trends are connected with changes in the breeding scheme, related to the implementation of genomic selection in the breeding programme for PBWHF cattle in 2014. The increasing number of weaned young bulls in Poland was paralleled by a fairly consistent increase in the mean inbreeding, but the inbreeding dynamics were relatively small. A reverse trend was observed in the group of young bulls born after 2013. As the number of bulls very rapidly decreased in successive birth years, the mean inbreeding for successive birth-year groups very rapidly increased. As a result, the estimated linear trend was equal to 0.02% inbreeding per year of birth in the group of bulls raised before genomic selection (~20 birth-year) whereas in the group of bulls raised after genomic selection (~4 birth-year) the trend was much higher and amounted to 0.56% inbreeding per year of birth. The high mean inbreeding found in the group of the genomically selected young bulls may translate into higher inbreeding in the whole population of PBWHF cattle, because these bulls are now intensively used as sires. The results of our study also show that the implementation of genomic selection in the breeding programme caused a very rapid increase in the inbreeding rate per birth-year in young bulls.

Download Full-text

Financial Compass for Slovak Enterprises: Modeling Economic Stability of Agricultural Entities

Journal of Risk and Financial Management ◽

10.3390/jrfm13050092 ◽

2020 ◽

Vol 13 (5) ◽

pp. 92

Author(s):

Katarina Valaskova ◽

Pavol Durana ◽

Peter Adamko ◽

Jaroslav Jaros

Keyword(s):

Prediction Models ◽

Predictive Accuracy ◽

Characteristic Curve ◽

Confusion Matrix ◽

Predictive Ability ◽

Early Warning Systems ◽

Emerging Countries ◽

Bankruptcy Prediction ◽

Financial Health ◽

Prediction Ability

The risk of corporate financial distress negatively affects the operation of the enterprise itself and can change the financial performance of all other partners that come into close or wider contact. To identify these risks, business entities use early warning systems, prediction models, which help identify the level of corporate financial health. Despite the fact that the relevant financial analyses and financial health predictions are crucial to mitigate or eliminate the potential risks of bankruptcy, the modeling of financial health in emerging countries is mostly based on models which were developed in different economic sectors and countries. However, several prediction models have been introduced in emerging countries (also in Slovakia) in the last few years. Thus, the main purpose of the paper is to verify the predictive ability of the bankruptcy models formed in conditions of the Slovak economy in the sector of agriculture. To compare their predictive accuracy the confusion matrix (cross tables) and the receiver operating characteristic curve are used, which allow more detailed analysis than the mere proportion of correct classifications (predictive accuracy). The results indicate that the models developed in the specific economic sector highly outperform the prediction ability of other models either developed in the same country or abroad, usage of which is then questionable considering the issue of prediction accuracy. The research findings confirm that the highest predictive ability of the bankruptcy prediction models is achieved provided that they are used in the same economic conditions and industrial sector in which they were primarily developed.

Download Full-text

Combining genetic resources and elite material populations to improve the accuracy of genomic prediction in apple

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab420 ◽

2021 ◽

Author(s):

Xabi Cazenave ◽

Bernard Petit ◽

Marc Lateur ◽

Hilde Nybom ◽

Jiri Sedlak ◽

...

Keyword(s):

Genetic Resources ◽

Genomic Selection ◽

Predictive Ability ◽

Practical Implementation ◽

Specific Marker ◽

Training Set ◽

High Genetic Diversity ◽

Breeding Programs ◽

Breeding Cycles ◽

Two Populations

Abstract Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e. genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and small increases in predictive ability could be obtained for some traits when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.

Download Full-text

META-ANALYSIS FOR EVALUATING THE EFFICIENCY OF GENOMIC SELECTION IN CEREALS

Journal of Basic and Applied Genetics ◽

10.35407/bag.2020.31.01.03 ◽

2020 ◽

Vol 31 (1) ◽

pp. 23-32

Author(s):

M. A. Rueda Calderón ◽

M. Balzarini ◽

C. Bruno

Keyword(s):

Systematic Review ◽

Genomic Selection ◽

Statistical Approach ◽

Predictive Accuracy ◽

Meta Analysis ◽

Predictive Ability ◽

Genomic Data ◽

Forest Plot ◽

Average Correlation ◽

Best Linear Unbiased

Genomic selection (GS) is used to predict the merit of a genotype with respect to a quantitative trait from molecular or genomic data. Statistically, GS requires fitting a regression model with multiple predictors associated with the molecular markers (MM) states. The model is calibrated in a population with phenotypic and genomic data. The abundance and correlation of MM information make model estimation challenging. For that reason there are diverse strategies to adjust the model: based on best linear unbiased predictors (BLUP), Bayesian regressions and machine learning methods. The correlation between the observed phenotype and the predicted genetic merit by the fitted model provides a measure of the efficiency (predictive ability) of the GS. The objective of this work was to perform a metaanalysis on the efficiency of GS in cereals. A systematic review of related GS studies and a meta-analysis, in wheat and maize, was carried out to obtain a global measure of GS efficiency under different scenarios (MM quantity and statistical models used in GS). The meta-analysis indicated an average correlation coefficient of 0.61 between observed and predicted genetic merits. There were no significant differences in the efficiency of the GS based on BLUP (RR-BLUP and GBLUP), the most common statistical approach. The increase of MM data, make GS efficiency do not vary widely. Key words: Systematic review; Random effects model; Forest plot; Predictive accuracy.

Download Full-text

Combining genetic resources and elite material populations to improve the accuracy of genomic prediction in apple

10.1101/2021.08.27.457920 ◽

2021 ◽

Author(s):

Xabi Cazenave ◽

Bernard Petit ◽

Francois Laurens ◽

Charles-Eric Durel ◽

Helene Muranty

Keyword(s):

Genetic Resources ◽

Genomic Selection ◽

Predictive Ability ◽

Practical Implementation ◽

Specific Marker ◽

Training Set ◽

High Genetic Diversity ◽

Breeding Programs ◽

Breeding Cycles ◽

Two Populations

Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e. genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and were always highest when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.

Download Full-text

Genomic Selection in Winter Wheat Breeding Using a Recommender Approach

Genes ◽

10.3390/genes11070779 ◽

2020 ◽

Vol 11 (7) ◽

pp. 779

Author(s):

Dennis N. Lozada ◽

Arron H. Carter

Keyword(s):

Winter Wheat ◽

Genomic Selection ◽

Prediction Models ◽

Heading Date ◽

Predictive Ability ◽

Wheat Breeding ◽

Snp Markers ◽

Bayesian Regression ◽

Phenotypic Trait ◽

Breeding Programs

Achieving optimal predictive ability is key to increasing the relevance of implementing genomic selection (GS) approaches in plant breeding programs. The potential of an item-based collaborative filtering (IBCF) recommender system in the context of multi-trait, multi-environment GS has been explored. Different GS scenarios for IBCF were evaluated for a diverse population of winter wheat lines adapted to the Pacific Northwest region of the US. Predictions across years through cross-validations resulted in improved predictive ability when there is a high correlation between environments. Using multiple spectral traits collected from high-throughput phenotyping resulted in better GS accuracies for grain yield (GY) compared to using only single traits for predictions. Trait adjustments through various Bayesian regression models using genomic information from SNP markers was the most effective in achieving improved accuracies for GY, heading date, and plant height among the GS scenarios evaluated. Bayesian LASSO had the highest predictive ability compared to other models for phenotypic trait adjustments. IBCF gave competitive accuracies compared to a genomic best linear unbiased predictor (GBLUP) model for predicting different traits. Overall, an IBCF approach could be used as an alternative to traditional prediction models for important target traits in wheat breeding programs.

Download Full-text

Selection of trait-specific markers and multi-environment models improve genomic predictive ability in rice

10.1101/482109 ◽

2018 ◽

Author(s):

Aditi Bhandari ◽

Jérôme Bartholomé ◽

Tuong-Vi Cao ◽

Nilima Kumari ◽

Julien frouin ◽

...

Keyword(s):

Drought Stress ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Predictive Ability ◽

Reference Population ◽

Snp Markers ◽

Selection Strategy ◽

Specific Marker ◽

Marker Selection

AbstractDeveloping high yielding rice varieties that are tolerant to drought stress is crucial for the sustainable livelihood of rice farmers in rainfed rice cropping ecosystems. Genomic selection (GS) promises to be an effective breeding option for these complex traits. We evaluated the effectiveness of two rather new options in the implementation of GS: trait and environment-specific marker selection and the use of multi-environment prediction models. A reference population of 280 rainfed lowland accessions endowed with 215k SNP markers data was phenotyped under a favorable and two managed drought environments. Trait-specific SNP subsets (28k) were selected for each trait under each environment, using results of GWAS performed with the complete genotype dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS) under two cross validation scenario: availability (CV2) or not (CV1) of phenotypic data for the validation set, in one of the environments. The most realistic trait-specific marker selection strategy achieved predictive ability (PA) of genomic prediction was up to 22% higher than markers selected on the bases of neutral linkage disequilibrium (LD). Tolerance to drought stress was up to 32% better predicted by multi-environment models (especially RKHS based models) under CV2 strategy. Under the less favorable CV1 strategy, the multi-environment models achieved similar PA than the single-environment predictions. We also showed that reasonable PA could be obtained with as few as 3,000 SNP markers, even in a population of low LD extent, provided marker selection is based on pairwise LD. The implications of these findings for breeding for drought tolerance are discussed. The most resource sparing option would be accurate phenotyping of the reference population in a favorable environment and under a managed drought, while the candidate population would be phenotyped only under one of those environments.

Download Full-text