scholarly journals LightGBM: accelerated genomically designed crop breeding through ensemble learning

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jun Yan ◽  
Yuetong Xu ◽  
Qian Cheng ◽  
Shuqin Jiang ◽  
Qian Wang ◽  
...  

AbstractLightGBM is an ensemble model of decision trees for classification and regression prediction. We demonstrate its utility in genomic selection-assisted breeding with a large dataset of inbred and hybrid maize lines. LightGBM exhibits superior performance in terms of prediction precision, model stability, and computing efficiency through a series of benchmark tests. We also assess the factors that are essential to ensure the best performance of genomic selection prediction by taking complex scenarios in crop hybrid breeding into account. LightGBM has been implemented as a toolbox, CropGBM, encompassing multiple novel functions and analytical modules to facilitate genomically designed breeding in crops.

2021 ◽  
Vol 12 ◽  
Author(s):  
Marlee R. Labroo ◽  
Anthony J. Studer ◽  
Jessica E. Rutkoski

Although hybrid crop varieties are among the most popular agricultural innovations, the rationale for hybrid crop breeding is sometimes misunderstood. Hybrid breeding is slower and more resource-intensive than inbred breeding, but it allows systematic improvement of a population by recurrent selection and exploitation of heterosis simultaneously. Inbred parental lines can identically reproduce both themselves and their F1progeny indefinitely, whereas outbred lines cannot, so uniform outbred lines must be bred indirectly through their inbred parents to harness heterosis. Heterosis is an expected consequence of whole-genome non-additive effects at the population level over evolutionary time. Understanding heterosis from the perspective of molecular genetic mechanisms alone may be elusive, because heterosis is likely an emergent property of populations. Hybrid breeding is a process of recurrent population improvement to maximize hybrid performance. Hybrid breeding is not maximization of heterosisper se, nor testing random combinations of individuals to find an exceptional hybrid, nor using heterosis in place of population improvement. Though there are methods to harness heterosis other than hybrid breeding, such as use of open-pollinated varieties or clonal propagation, they are not currently suitable for all crops or production environments. The use of genomic selection can decrease cycle time and costs in hybrid breeding, particularly by rapidly establishing heterotic pools, reducing testcrossing, and limiting the loss of genetic variance. Open questions in optimal use of genomic selection in hybrid crop breeding programs remain, such as how to choose founders of heterotic pools, the importance of dominance effects in genomic prediction, the necessary frequency of updating the training set with phenotypic information, and how to maintain genetic variance and prevent fixation of deleterious alleles.


Plants ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 895
Author(s):  
Samira El Hanafi ◽  
Souad Cherkaoui ◽  
Zakaria Kehel ◽  
Ayed Al-Abdallat ◽  
Wuletaw Tadesse

Hybrid wheat breeding is one of the most promising technologies for further sustainable yield increases. However, the cleistogamous nature of wheat displays a major bottleneck for a successful hybrid breeding program. Thus, an optimized breeding strategy by developing appropriate parental lines with favorable floral trait combinations is the best way to enhance the outcrossing ability. This study, therefore, aimed to dissect the genetic basis of various floral traits using genome-wide association study (GWAS) and to assess the potential of genome-wide prediction (GP) for anther extrusion (AE), visual anther extrusion (VAE), pollen mass (PM), pollen shedding (PSH), pollen viability (PV), anther length (AL), openness of the flower (OPF), duration of floret opening (DFO) and stigma length. To this end, we employed 196 ICARDA spring bread wheat lines evaluated for three years and genotyped with 10,477 polymorphic SNP. In total, 70 significant markers were identified associated to the various assessed traits at FDR ≤ 0.05 contributing a minor to large proportion of the phenotypic variance (8–26.9%), affecting the traits either positively or negatively. GWAS revealed multi-marker-based associations among AE, VAE, PM, OPF and DFO, most likely linked markers, suggesting a potential genomic region controlling the genetic association of these complex traits. Of these markers, Kukri_rep_c103359_233 and wsnp_Ex_rep_c107911_91350930 deserve particular attention. The consistently significant markers with large effect could be useful for marker-assisted selection. Genomic selection revealed medium to high prediction accuracy ranging between 52% and 92% for the assessed traits with the least and maximum value observed for stigma length and visual anther extrusion, respectively. This indicates the feasibility to implement genomic selection to predict the performance of hybrid floral traits with high reliability.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii203-ii203
Author(s):  
Alexander Hulsbergen ◽  
Yu Tung Lo ◽  
Vasileios Kavouridis ◽  
John Phillips ◽  
Timothy Smith ◽  
...  

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.


2021 ◽  
Author(s):  
Lance F Merrick ◽  
Dennis N Lozada ◽  
Xianming Chen ◽  
Arron H Carter

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in four years (2016-2018, and 2020) and a diversity panel phenotyped in four years (2013-2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using rrBLUP and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Further, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.


Author(s):  
Matthew McGowan ◽  
Zhiwu Zhang ◽  
Jiabo Wang ◽  
Haixiao Dong ◽  
Xiaolei Liu ◽  
...  

Estimation of breeding values through Best Linear Unbiased Prediction (BLUP) using pedigree-based kinship and Marker-Assisted Selection (MAS) are the two fundamental breeding methods used before and after the introduction of genetic markers, respectively. The emergence of high-density genome-wide markers has led to the development of two parallel series of approaches inspired by BLUP and MAS, which are collectively referred to as Genomic Selection (GS). The first series of GS methods alters pedigree-based BLUP by replacing pedigree-based kinship with marker-based kinship in a variety of ways, including weighting markers by their effects in genome-wide association study (GWAS), joining both pedigree and marker-based kinship together in a single-step BLUP, and substituting individuals with groups in a compressed BLUP. The second series of GS methods estimates the effects for all genetic markers simultaneously. For the second series methods, the marker effects are summed together regardless of their individual significance. Instead of fitting individuals as random effects like in the BLUP series, the second series fits markers as random effects. Differing assumptions regarding the underlying distribution of these marker effects have resulted in the development of many Bayesian-based GS methods. This review highlights critical concept developments for both of these series and explores ongoing GS developments in machine learning, multiple trait selection, and adaptation for hybrid breeding. Furthermore, considering the increasing use and variety of GS methods in plant breeding programs, this review addresses important concerns for future GS development and application, such as the use of GWAS-assisted GS, the long-term effectiveness of GS methods, and the valid assessment of prediction accuracy.


Author(s):  
Chin Jian Yang ◽  
Rajiv Sharma ◽  
Gregor Gorjanc ◽  
Sarah Hearne ◽  
Wayne Powell ◽  
...  

AbstractModern crop breeding is in constant demand for new genetic diversity as part of the arms race with genetic gain. The elite gene pool has limited genetic variation and breeders are trying to introduce novelty from unadapted germplasm, landraces and wild relatives. For polygenic traits, currently available approaches to introgression are not ideal, as there is a demonstrable bias against exotic alleles during selection. Here, we propose a partitioned form of genomic selection, called Origin Specific Genomic Selection (OSGS), where we identify and target selection on favourable exotic alleles. Briefly, within a population derived from a bi-parental cross, we isolate alleles originating from the elite and exotic parents, which then allows us to separate out the predicted marker effects based on the allele origins. We validated the usefulness of OSGS using two nested association mapping (NAM) datasets: barley NAM (elite-exotic) and maize NAM (elite-elite), as well as by computer simulation. Our results suggest that OSGS works well in bi-parental crosses, and it is possible to extend the approach to broader multi-parental populations.


Author(s):  
Matthew McGowan ◽  
Jiabo Wang ◽  
Haixiao Dong ◽  
Xiaolei Liu ◽  
Yi Jia ◽  
...  

Estimation of breeding values through Best Linear Unbiased Prediction (BLUP) using pedigree-based kinship and Marker-Assisted Selection (MAS) are the two fundamental breeding methods used before and after the introduction of genetic markers, respectively. The emergence of high-density genome-wide markers has led to the development of two parallel series of approaches inspired by BLUP and MAS, which are collectively referred to as Genomic Selection (GS). The first series of GS methods alters pedigree-based BLUP by replacing pedigree-based kinship with marker-based kinship in a variety of ways, including weighting markers by their effects in genome-wide association study (GWAS), joining both pedigree and marker-based kinship together in a single-step BLUP, and substituting individuals with groups in a compressed BLUP. The second series of GS methods estimates the effects for all genetic markers simultaneously. For the second series methods, the marker effects are summed together regardless of their individual significance. Instead of fitting individuals as random effects like in the BLUP series, the second series fits markers as random effects. Differing assumptions regarding the underlying distribution of these marker effects have resulted in the development of many Bayesian-based GS methods. This review highlights critical concept developments for both of these series and explores ongoing GS developments in machine learning, multiple trait selection, and adaptation for hybrid breeding. Furthermore, considering the increasing use and variety of GS methods in plant breeding programs, this review addresses important concerns for future GS development and application, such as the use of GWAS-assisted GS, the long-term effectiveness of GS methods, and the valid assessment of prediction accuracy.


2020 ◽  
Vol 2 (4) ◽  
pp. 15-23
Author(s):  
E. B. Khatefov ◽  
B. R. Shomakhov ◽  
R. S. Kushkhova ◽  
R. A. Kudaev ◽  
Z. T. Khashirova ◽  
...  

Abstract. Hybrid maize breeding requires constant renewal of the source material. In this regard, broadening of genetic variation in parental lines is one of the primary tasks in heterotic hybrid breeding programs. The use of reverse diploid inbred lines derived from a tetraploid population is considered as an innovative approach to achieve this goal.Results. The investigated material comprised 106 reverse diploid (rediploid) inbred lines originating from diploid plants selected in segregating selfed progenies of triploid populations and consequently subjected to inbreeding, while triploid populations resulted from a cross between plants of a tetraploid population with a broad genetic basis and a diploid line. The use of a system of crosses with 37 sterile testers belonging to different FAO maize maturity groups allowed the evaluation of the rediploid lines’ combining ability and the response to M and C types of CMS. Field tests were conducted in 2019 in the steppe zone of Kabardino-Balkaria. Forty-six lines (43.3%) with the combining ability ranging from ultra-high to good, and 78 lines (73.6%) maintaining the CMS character were identified. Among them, 59 lines (55.7%) were maintainers for the M type CMS, 15 lines (14.1%)  for C type CMS, and 4 lines maintained sterility for both CMS types. Sixteen lines (15.1%) restored pollen fertility of the forms with M type CMS, 11 lines (10.4%) were restorers for the C-type and one line turned out to be a universal restorer for both CMS types. Ranking by the “sprout - flowering of ears” interstage period duration showed that most of the lines (66.0%) with the ability to maintain sterility or restore male fertility of M and C CMS types, as well as with the combining ability from ultrahigh to good (32.6%) fell into the group with the flowering period duration of 51-55 days. According to the results of the harvested grain moisture assessment, the hybrids ♀(РГС246с × OL213) × ♂92с5986·2·3, ♀714М  ×  ♂1/67-1 and ♀714М  ×  ♂92н136-4, with the values of 13, 6%, 13.9%, 14.0%, respectively, were identified. The hybrids ♀714М × ♂1/67- 1 and ♀(OL563С × KL1392) × ♂92с0653 2 1 2 were characterized by the maximum value of the selection index, i.e. 5.03 and 5.13, respectively.Conclusions. The results of the studies showed the breeding value of rediploid lines as an initial material for hybrid maize breeding. 


Sign in / Sign up

Export Citation Format

Share Document