scholarly journals A review of deep learning applications for genomic selection

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Osval Antonio Montesinos-López ◽  
Abelardo Montesinos-López ◽  
Paulino Pérez-Rodríguez ◽  
José Alberto Barrón-López ◽  
Johannes W. R. Martini ◽  
...  

Abstract Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.

2019 ◽  
Author(s):  
L.M. Souza ◽  
F.R. Francisco ◽  
P.S. Gonçalves ◽  
E.J. Scaloppi Junior ◽  
V. Le Guen ◽  
...  

AbstractSeveral genomic prediction models incorporating genotype × environment (G×E) interactions have recently been developed and used in genomic selection (GS) in plant breeding programs. G×E interactions decrease selection accuracy and limit genetic gains in plant breeding. Two genomic data sets were used to compare the prediction ability of multi-environment G×E genomic models and two kernel methods (a linear kernel (genomic best linear unbiased predictor, GBLUP) (GB) and a nonlinear kernel (Gaussian kernel, GK)) and prediction accuracy (PA) of five genomic prediction models: (1) one without environmental data (BSG); (2) a single-environment, main genotypic effect model (SM); (3) a multi-environment, main genotypic effect model (MM); (4) a multi-environment, single variance GxE deviation model (MDs); and (5) a multi-environment, environment-specific variance GxE deviation model (MDe). We evaluated the utility of GS with 435 rubber tree individuals in two sites and genotyped the individuals with genotyping-by-sequencing (GBS) of single-nucleotide polymorphisms (SNPs). Prediction models were estimated for diameter (DAP) and height (AP) at different ages, with a heritability ranging from 0.59 to 0.75 for both traits. Applying the model (BSG, SM, MM, MDs, and MDe) and kernel method (GBLUP and GK) combinations to rubber tree data showed that models with the nonlinear GK and linear GBLUP kernel had similar PAs. Multi-environment models were superior to single-environment genomic models regardless the kernel (GBLUP or GK), suggesting that introducing interactions between markers and environmental conditions increases the proportion of variance explained by the model and, more importantly, the PA. In the best scenario (well-watered (WW / GK), an increase of 6.7 and 8.7 fold of genetic gain can be obtained for AP and DAP, respectively, with multi-environment GS (MM, MDe and MDS) than by conventional genetic breeding model (CBM). Furthermore, GS resulted in a more balanced selection response in DAP and AP and if used in conjunction with traditional genetic breeding programs will contribute to a reduction in selection time. With the rapid advances in and declining costs of genotyping methods, balanced against the overall costs of managing large progeny trials and potential increased gains per unit time, we are hopeful that GS can be implemented in rubber tree breeding programs.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Cheng Bian ◽  
Dzianis Prakapenka ◽  
Cheng Tan ◽  
Ruifei Yang ◽  
Di Zhu ◽  
...  

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.


Author(s):  
Osval Antonio Montesinos López ◽  
Abelardo Montesinos López ◽  
Jose Crossa

AbstractWe give a detailed description of random forest and exemplify its use with data from plant breeding and genomic selection. The motivations for using random forest in genomic-enabled prediction are explained. Then we describe the process of building decision trees, which are a key component for building random forest models. We give (1) the random forest algorithm, (2) the main hyperparameters that need to be tuned, and (3) different splitting rules that are key for implementing random forest models for continuous, binary, categorical, and count response variables. In addition, many examples are provided for training random forest models with different types of response variables with plant breeding data. The random forest algorithm for multivariate outcomes is provided and its most popular splitting rules are also explained. In this case, some examples are provided for illustrating its implementation even with mixed outcomes (continuous, binary, and categorical). Final comments about the pros and cons of random forest are provided.


Agronomy ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. 1591
Author(s):  
Sebastian Michel ◽  
Franziska Löschenberger ◽  
Ellen Sparry ◽  
Christian Ametz ◽  
Hermann Bürstmayr

The availability of cost-efficient genotyping technologies has facilitated the implementation of genomic selection into numerous breeding programs. However, some studies reported a superiority of pedigree over genomic selection in line breeding, and as, aside from systematic record keeping, no additional costs are incurring in pedigree-based prediction, the question about the actual benefit of fingerprinting several hundred lines each year might suggest itself. This study aimed thus on shedding some light on this question by comparing pedigree, genomic, and single-step prediction models using phenotypic and genotypic data that has been collected during a time period of ten years in an applied wheat breeding program. The mentioned models were for this purpose empirically tested in a multi-year forward prediction as well as a supporting simulation study. Given the availability of deep pedigree records, pedigree prediction performed similar to genomic prediction for some of the investigated traits if preexisting information of the selection candidates was available. Notwithstanding, blending both information sources increased the prediction accuracy and thus the selection gain substantially, especially for low heritable traits. Nevertheless, the largest advantage of genomic predictions can be seen for breeding scenarios where such preexisting information is not systemically available or difficult and costly to obtain.


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 7378
Author(s):  
Pedro M. R. Bento ◽  
Jose A. N. Pombo ◽  
Maria R. A. Calado ◽  
Silvio J. P. S. Mariano

Short-Term Load Forecasting is critical for reliable power system operation, and the search for enhanced methodologies has been a constant field of investigation, particularly in an increasingly competitive environment where the market operator and its participants need to better inform their decisions. Hence, it is important to continue advancing in terms of forecasting accuracy and consistency. This paper presents a new deep learning-based ensemble methodology for 24 h ahead load forecasting, where an automatic framework is proposed to select the best Box-Jenkins models (ARIMA Forecasters), from a wide-range of combinations. The method is distinct in its parameters but more importantly in considering different batches of historical (training) data, thus benefiting from prediction models focused on recent and longer load trends. Afterwards, these accurate predictions, mainly the linear components of the load time-series, are fed to the ensemble Deep Forward Neural Network. This flexible type of network architecture not only functions as a combiner but also receives additional historical and auxiliary data to further its generalization capabilities. Numerical testing using New England market data validated the proposed ensemble approach with diverse base forecasters, achieving promising results in comparison with other state-of-the-art methods.


2019 ◽  
Author(s):  
Sai Krishna Arojju ◽  
Mingshu Cao ◽  
M. Z. Zulfi Jahufer ◽  
Brent A Barrett ◽  
Marty J Faville

AbstractForage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of genotypic, environmental and genotype-by-environment (G × E) variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P<0.05) genotypic variation was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). G × E interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesC genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two environments. High predictive ability was observed for the mineral traits sulphur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from 1 million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined. For traits with lower predictive ability, multi-trait genomic prediction approaches that exploit the strong genetic correlations observed amongst some nutritive traits may be useful. This appears to be particularly important for WSC, considered one of the primary constituent of nutritive value for forages.


Heredity ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Yoseph Beyene ◽  
Manje Gowda ◽  
Jose Crossa ◽  
Paulino Pérez-Rodríguez ◽  
...  

AbstractGenomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.


Agronomy ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 95 ◽  
Author(s):  
Charlotte Robertsen ◽  
Rasmus Hjortshøj ◽  
Luc Janss

Genomic Selection (GS) is a method in plant breeding to predict the genetic value of untested lines based on genome-wide marker data. The method has been widely explored with simulated data and also in real plant breeding programs. However, the optimal strategy and stage for implementation of GS in a plant-breeding program is still uncertain. The accuracy of GS has proven to be affected by the data used in the GS model, including size of the training population, relationships between individuals, marker density, and use of pedigree information. GS is commonly used to predict the additive genetic value of a line, whereas non-additive genetics are often disregarded. In this review, we provide a background knowledge on genomic prediction models used for GS and a view on important considerations concerning data used in these models. We compare within- and across-breeding cycle strategies for implementation of GS in cereal breeding and possibilities for using GS to select untested lines as parents. We further discuss the difference of estimating additive and non-additive genetic values and its usefulness to either select new parents, or new candidate varieties.


2019 ◽  
Vol 136 (4) ◽  
pp. 279-300 ◽  
Author(s):  
Daniel J. Tolhurst ◽  
Ky L. Mathews ◽  
Alison B. Smith ◽  
Brian R. Cullis

Agronomy ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 340
Author(s):  
Marty Faville ◽  
Mingshu Cao ◽  
Jana Schmidt ◽  
Douglas Ryan ◽  
Siva Ganesh ◽  
...  

Increasing the rate of genetic gain for dry matter (DM) yield in perennial ryegrass (Lolium perenne L.), which is a key source of nutrition for ruminants in temperate environments, is an important goal for breeders. Genomic selection (GS) is a strategy used to improve genetic gain by using molecular marker information to predict breeding values in selection candidates. An empirical assessment of GS for herbage accumulation (HA; proxy for DM yield) and days-to-heading (DTH) was completed by using existing genomic prediction models to conduct one cycle of divergent GS in four selection populations (Pop I G1 and G3; Pop III G1 and G3), for each trait. G1 populations were the offspring of the training set and G3 populations were two generations further on from that. The HA of the High GEBV selection group (SG) progenies, averaged across all four populations, was 28% higher (p < 0.05) than Low GEBV SGs when assessed in the target environment, while it did not differ significantly in a second environment. Divergence was greater in Pop I (43%–65%) than Pop III (10%–16%) and the selection response was higher in G1 than in G3. Divergent GS for DTH also produced significant (p < 0.05) differences between High and Low GEBV SGs in G1 populations (+6.3 to 9.1 days; 31%–61%) and smaller, non-significant (p > 0.05) responses in G3. This study shows that genomic prediction models, trained from a small, composite reference set, can be used to improve traits with contrasting genetic architectures in perennial ryegrass. The results highlight the importance of target environment selection for training models, as well as the influence of relatedness between the training set and selection populations.


Sign in / Sign up

Export Citation Format

Share Document