Genomic Prediction and Selection in Support of Sorghum Value Chains

AbstractGenomic prediction and selection models (GS) were deployed as part of DataBio project infrastructure and solutions. The work addressed end-user requirements, i.e., the need for cost-effectiveness of the implemented technologies, simplified breeding schemes, and shortening the time to cultivar development by selecting for genetic merit. Our solutions applied genomic modelling in order to sustainably improve productivity and profits. GS models were implemented in sorghum crop for several breeding scenarios. We fitted the best linear unbiased predictions data using Bayesian ridge regression, genomic best linear unbiased predictions, Bayesian least absolute shrinkage and selection operator, and BayesB algorithms. The performance of the models was evaluated using Monte Carlo cross-validation with 70% and 30%, respectively, as training and validation sets. Our results show that genomic models perform comparably with traditional methods under single environments. Under multiple environments, predicting non-field evaluated lines benefits from borrowing information from lines that were evaluated in other environments. Accounting for environmental noise and other factors, also this model gave comparable accuracy with traditional methods, but higher compared to the single environment model. The GS accuracy was comparable in genomic selection index, aboveground dry biomass yield and plant height, while it was lower for the dry mass fraction of the fresh weight. The genomic selection model performances obtained in our pilots are high enough to sustain sorghum breeding for several traits including antioxidants production and allow important genetic gains per unit of time and cost.

Download Full-text

A classic approach for determining genomic prediction accuracy under terminal drought stress and well-watered conditions in wheat landraces and cultivars

PLoS ONE ◽

10.1371/journal.pone.0247824 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0247824

Author(s):

Morteza Shabannejad ◽

Mohammad-Reza Bihamta ◽

Eslam Majidi-Hervan ◽

Hadi Alipour ◽

Asa Ebrahimi

Keyword(s):

Drought Stress ◽

Genomic Selection ◽

Bread Wheat ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Terminal Drought ◽

Genome Wide ◽

Best Linear Unbiased ◽

Terminal Drought Stress ◽

Trait Associations

The present study aimed to improve the accuracy of genomic prediction of 16 agronomic traits in a diverse bread wheat (Triticum aestivum L.) germplasm under terminal drought stress and well-watered conditions in semi-arid environments. An association panel including 87 bread wheat cultivars and 199 landraces from Iran bread wheat germplasm was planted under two irrigation systems in semi-arid climate zones. The whole association panel was genotyped with 9047 single nucleotide polymorphism markers using the genotyping-by-sequencing method. A number of 23 marker-trait associations were selected for traits under each condition, whereas 17 marker-trait associations were common between terminal drought stress and well-watered conditions. The identified marker-trait associations were mostly single nucleotide polymorphisms with minor allele effects. This study examined the effect of population structure, genomic selection method (ridge regression-best linear unbiased prediction, genomic best-linear unbiased predictions, and Bayesian ridge regression), training set size, and type of marker set on genomic prediction accuracy. The prediction accuracies were low (-0.32) to moderate (0.52). A marker set including 93 significant markers identified through genome-wide association studies with P values ≤ 0.001 increased the genomic prediction accuracy for all traits under both conditions. This study concluded that obtaining the highest genomic prediction accuracy depends on the extent of linkage disequilibrium, the genetic architecture of trait, genetic diversity of the population, and the genomic selection method. The results encouraged the integration of genome-wide association study and genomic selection to enhance genomic prediction accuracy in applied breeding programs.

Download Full-text

Genomic selection in American mink (Neovison vison) using a single-step genomic best linear unbiased prediction model for size and quality traits graded on live mink

Journal of Animal Science ◽

10.1093/jas/skab003 ◽

2021 ◽

Vol 99 (1) ◽

Author(s):

Trine M Villumsen ◽

Guosheng Su ◽

Bernt Guldbrandtsen ◽

Torben Asp ◽

Mogens S Lund

Keyword(s):

Body Weight ◽

Genomic Selection ◽

Genomic Prediction ◽

Best Linear Unbiased Prediction ◽

Single Step ◽

Quality Traits ◽

Linear Unbiased Prediction ◽

Best Linear Unbiased ◽

Regression Slopes ◽

Unbiased Prediction

Abstract Genomic selection relies on single-nucleotide polymorphisms (SNPs), which are often collected using medium-density SNP arrays. In mink, no such array is available; instead, genotyping by sequencing (GBS) can be used to generate marker information. Here, we evaluated the effect of genomic selection for mink using GBS. We compared the estimated breeding values (EBVs) from single-step genomic best linear unbiased prediction (SSGBLUP) models to the EBV from ordinary pedigree-based BLUP models. We analyzed seven size and quality traits from the live grading of brown mink. The phenotype data consisted of ~20,600 records for the seven traits from the mink born between 2013 and 2016. Genotype data included 2,103 mink born between 2010 and 2014, mostly breeding animals. In total, 28,336 SNP markers from 391 scaffolds were available for genomic prediction. The pedigree file included 29,212 mink. The predictive ability was assessed by the correlation (r) between progeny trait deviation (PTD) and EBV, and the regression of PTD on EBV, using 5-fold cross-validation. For each fold, one-fifth of animals born in 2014 formed the validation set. For all traits, the SSGBLUP model resulted in higher accuracies than the BLUP model. The average increase in accuracy was 15% (between 3% for fur clarity and 28% for body weight). For three traits (body weight, silky appearance of the under wool, and guard hair thickness), the difference in r between the two models was significant (P < 0.05). For all traits, the regression slopes of PTD on EBV from SSGBLUP models were closer to 1 than regression slopes from BLUP models, indicating SSGBLUP models resulted in less bias of EBV for selection candidates than the BLUP models. However, the regression coefficients did not differ significantly. In conclusion, the SSGBLUP model is superior to conventional BLUP model in the accurate selection of superior animals, and, thus, it would increase genetic gain in a selective breeding program. In addition, this study shows that GBS data work well in genomic prediction in mink, demonstrating the potential of GBS for genomic selection in livestock species.

Download Full-text

The look ahead trace back optimizer for genomic selection under transparent and opaque simulators

Scientific Reports ◽

10.1038/s41598-021-83567-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fatemeh Amini ◽

Felipe Restrepo Franco ◽

Guiping Hu ◽

Lizhi Wang

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Genomic Prediction ◽

In Planta ◽

Data Set ◽

Look Ahead ◽

Trace Back ◽

New Variant ◽

The Look ◽

Partially Observable

AbstractRecent advances in genomic selection (GS) have demonstrated the importance of not only the accuracy of genomic prediction but also the intelligence of selection strategies. The look ahead selection algorithm, for example, has been found to significantly outperform the widely used truncation selection approach in terms of genetic gain, thanks to its strategy of selecting breeding parents that may not necessarily be elite themselves but have the best chance of producing elite progeny in the future. This paper presents the look ahead trace back algorithm as a new variant of the look ahead approach, which introduces several improvements to further accelerate genetic gain especially under imperfect genomic prediction. Perhaps an even more significant contribution of this paper is the design of opaque simulators for evaluating the performance of GS algorithms. These simulators are partially observable, explicitly capture both additive and non-additive genetic effects, and simulate uncertain recombination events more realistically. In contrast, most existing GS simulation settings are transparent, either explicitly or implicitly allowing the GS algorithm to exploit certain critical information that may not be possible in actual breeding programs. Comprehensive computational experiments were carried out using a maize data set to compare a variety of GS algorithms under four simulators with different levels of opacity. These results reveal how differently a same GS algorithm would interact with different simulators, suggesting the need for continued research in the design of more realistic simulators. As long as GS algorithms continue to be trained in silico rather than in planta, the best way to avoid disappointing discrepancy between their simulated and actual performances may be to make the simulator as akin to the complex and opaque nature as possible.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

Genomic Prediction Using Alternative Strategies of Weighted Single-Step Genomic BLUP for Yearling Weight and Carcass Traits in Hanwoo Beef Cattle

Genes ◽

10.3390/genes12020266 ◽

2021 ◽

Vol 12 (2) ◽

pp. 266

Author(s):

Hossein Mehrban ◽

Masoumeh Naserkheil ◽

Deuk Hwan Lee ◽

Chungil Cho ◽

Taejeong Choi ◽

...

Keyword(s):

Quantitative Trait Loci ◽

Beef Cattle ◽

Genomic Prediction ◽

Quantitative Trait ◽

Carcass Traits ◽

Best Linear Unbiased Prediction ◽

Single Step ◽

Linear Unbiased Prediction ◽

Single Nucleotide ◽

Best Linear Unbiased

The weighted single-step genomic best linear unbiased prediction (GBLUP) method has been proposed to exploit information from genotyped and non-genotyped relatives, allowing the use of weights for single-nucleotide polymorphism in the construction of the genomic relationship matrix. The purpose of this study was to investigate the accuracy of genetic prediction using the following single-trait best linear unbiased prediction methods in Hanwoo beef cattle: pedigree-based (PBLUP), un-weighted (ssGBLUP), and weighted (WssGBLUP) single-step genomic methods. We also assessed the impact of alternative single and window weighting methods according to their effects on the traits of interest. The data was comprised of 15,796 phenotypic records for yearling weight (YW) and 5622 records for carcass traits (backfat thickness: BFT, carcass weight: CW, eye muscle area: EMA, and marbling score: MS). Also, the genotypic data included 6616 animals for YW and 5134 for carcass traits on the 43,950 single-nucleotide polymorphisms. The ssGBLUP showed significant improvement in genomic prediction accuracy for carcass traits (71%) and yearling weight (99%) compared to the pedigree-based method. The window weighting procedures performed better than single SNP weighting for CW (11%), EMA (11%), MS (3%), and YW (6%), whereas no gain in accuracy was observed for BFT. Besides, the improvement in accuracy between window WssGBLUP and the un-weighted method was low for BFT and MS, while for CW, EMA, and YW resulted in a gain of 22%, 15%, and 20%, respectively, which indicates the presence of relevant quantitative trait loci for these traits. These findings indicate that WssGBLUP is an appropriate method for traits with a large quantitative trait loci effect.

Download Full-text

Genomic prediction models trained with historical records enable populating the German ex situ genebank bio-digital resource center of barley (Hordeum sp.) with information on resistances to soilborne barley mosaic viruses

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03815-0 ◽

2021 ◽

Author(s):

Maria Y. Gonzalez ◽

Yusheng Zhao ◽

Yong Jiang ◽

Nils Stein ◽

Antje Habekuss ◽

...

Keyword(s):

Genetic Resources ◽

Mosaic Virus ◽

Genomic Prediction ◽

Prediction Models ◽

Ex Situ ◽

Yellow Mosaic Virus ◽

Plant Responses ◽

Resource Center ◽

Digital Resource ◽

Best Linear Unbiased

AbstractKey messageGenomic prediction with special weight of major genes is a valuable tool to populate bio-digital resource centers.AbstractPhenotypic information of crop genetic resources is a prerequisite for an informed selection that aims to broaden the genetic base of the elite breeding pools. We investigated the potential of genomic prediction based on historical screening data of plant responses against theBarley yellow mosaic virusesfor populating the bio-digital resource center of barley. Our study includes dense marker data for 3838 accessions of winter barley, and historical screening data of 1751 accessions forBarley yellow mosaic virus(BaYMV) and of 1771 accessions forBarley mild mosaic virus(BaMMV). Linear mixed models were fitted by considering combinations for the effects of genotypes, years, and locations. The best linear unbiased estimations displayed a broad spectrum of plant responses against BaYMV and BaMMV. Prediction abilities, computed as correlations between predictions and observed phenotypes of accessions, were low for the marker-assisted selection approach amounting to 0.42. In contrast, prediction abilities of genomic best linear unbiased predictions were high, with values of 0.62 for BaYMV and 0.64 for BaMMV. Prediction abilities of genomic prediction were improved by up to ~ 5% using W-BLUP, in which more weight is given to markers with significant major effects found by association mapping. Our results outline the utility of historical screening data and W-BLUP model to predict the performance of the non-phenotyped individuals in genebank collections. The presented strategy can be considered as part of the different approaches used in genebank genomics to valorize genetic resources for their usage in disease resistance breeding and research.

Download Full-text

Genomic selection in French dairy cattle

Animal Production Science ◽

10.1071/an11119 ◽

2012 ◽

Vol 52 (3) ◽

pp. 115 ◽

Cited By ~ 63

Author(s):

D. Boichard ◽

F. Guillaume ◽

A. Baur ◽

P. Croiseau ◽

M. N. Rossignol ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Genomic Selection ◽

Reference Population ◽

Specific Reference ◽

Nucleotide Polymorphisms ◽

Genomic Evaluation ◽

Linear Unbiased Prediction ◽

Best Linear Unbiased ◽

Estimated Breeding Values ◽

Restricted Use

Genomic selection is implemented in French Holstein, Montbéliarde, and Normande breeds (70%, 16% and 12% of French dairy cows). A characteristic of the model for genomic evaluation is the use of haplotypes instead of single-nucleotide polymorphisms (SNPs), so as to maximise linkage disequilibrium between markers and quantitative trait loci (QTLs). For each trait, a QTL-BLUP model (i.e. a best linear unbiased prediction model including QTL random effects) includes 300–700 trait-dependent chromosomal regions selected either by linkage disequilibrium and linkage analysis or by elastic net. This model requires an important effort to phase genotypes, detect QTLs, select SNPs, but was found to be the most efficient one among all tested ones. QTLs are defined within breed and many of them were found to be breed specific. Reference populations include 1800 and 1400 bulls in Montbéliarde and Normande breeds. In Holstein, the very large reference population of 18 300 bulls originates from the EuroGenomics consortium. Since 2008, ~65 000 animals have been genotyped for selection by Labogena with the 50k chip. Bulls genomic estimated breeding values (GEBVs) were made official in June 2009. In 2010, the market share of the young bulls reached 30% and is expected to increase rapidly. Advertising actions have been undertaken to recommend a time-restricted use of young bulls with a limited number of doses. In January 2011, genomic selection was opened to all farmers for females. Current developments focus on the extension of the method to a multi-breed context, to use all reference populations simultaneously in genomic evaluation.

Download Full-text

A review of deep learning applications for genomic selection

BMC Genomics ◽

10.1186/s12864-020-07319-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Osval Antonio Montesinos-López ◽

Abelardo Montesinos-López ◽

Paulino Pérez-Rodríguez ◽

José Alberto Barrón-López ◽

Johannes W. R. Martini ◽

...

Keyword(s):

Deep Learning ◽

Plant Breeding ◽

Genomic Selection ◽

Genomic Prediction ◽

Mixed Model ◽

Prediction Models ◽

Genetic Effect ◽

Training Data ◽

Additive Genetic Effect ◽

Main Body

Abstract Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.

Download Full-text

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

10.1101/595397 ◽

2019 ◽

Author(s):

Daniel Runcie ◽

Hao Cheng

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Selection Index ◽

Parametric Method ◽

Multiple Traits ◽

Gold Standard Method ◽

Secondary Traits ◽

Validation Strategy

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.

Download Full-text

Evaluation of Genomic Prediction for Pasmo Resistance in Flax

International Journal of Molecular Sciences ◽

10.3390/ijms20020359 ◽

2019 ◽

Vol 20 (2) ◽

pp. 359 ◽

Cited By ~ 8

Author(s):

Liqiang He ◽

Jin Xiao ◽

Khalid Rashid ◽

Gaofeng Jia ◽

Pingchuan Li ◽

...

Keyword(s):

Population Size ◽

Genomic Prediction ◽

Field Experiments ◽

Training Dataset ◽

Nucleotide Polymorphisms ◽

Training Population ◽

Yield And Quality ◽

Genome Wide ◽

Best Linear Unbiased ◽

Breeding Efficiency

Pasmo (Septoria linicola) is a fungal disease causing major losses in seed yield and quality and stem fibre quality in flax. Pasmo resistance (PR) is quantitative and has low heritability. To improve PR breeding efficiency, the accuracy of genomic prediction (GP) was evaluated using a diverse worldwide core collection of 370 accessions. Four marker sets, including three defined by 500, 134 and 67 previously identified quantitative trait loci (QTL) and one of 52,347 PR-correlated genome-wide single nucleotide polymorphisms, were used to build ridge regression best linear unbiased prediction (RR-BLUP) models using pasmo severity (PS) data collected from field experiments performed during five consecutive years. With five-fold random cross-validation, GP accuracy as high as 0.92 was obtained from the models using the 500 QTL when the average PS was used as the training dataset. GP accuracy increased with training population size, reaching values >0.9 with training population size greater than 185. Linear regression of the observed PS with the number of positive-effect QTL in accessions provided an alternative GP approach with an accuracy of 0.86. The results demonstrate the GP models based on marker information from all identified QTL and the 5-year PS average is highly effective for PR prediction.

Download Full-text