scholarly journals Novel Bayesian Networks for Genomic Prediction of Developmental Traits in Biomass Sorghum

2019 ◽  
Vol 10 (2) ◽  
pp. 769-781 ◽  
Author(s):  
Jhonathan P. R. dos Santos ◽  
Samuel B. Fernandes ◽  
Scott McCoy ◽  
Roberto Lozano ◽  
Patrick J. Brown ◽  
...  

The ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor (L.) Moench) lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In fivefold cross-validation, prediction accuracies ranged from 0.46 (PBN) to 0.49 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.75 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4–52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits.

2019 ◽  
Author(s):  
Jhonathan P. R. dos Santos ◽  
Samuel B. Fernandes ◽  
Roberto Lozano ◽  
Patrick J. Brown ◽  
Edward S. Buckler ◽  
...  

ABSTRACTThe ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor(L.) Moench] lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In 5-fold cross-validation, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.74 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4-52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits.


Author(s):  
Josquin Foulliaron ◽  
Laurent Bouillaut ◽  
Patrice Aknin ◽  
Anne Barros

The maintenance optimization of complex systems is a key question. One important objective is to be able to anticipate future maintenance actions required to optimize the logistic and future investments. That is why, over the past few years, the predictive maintenance approaches have been an expanding area of research. They rely on the concept of prognosis. Many papers have shown how dynamic Bayesian networks can be relevant to represent multicomponent complex systems and carry out reliability studies. The diagnosis and maintenance group from French institute of science and technology for transport, development and networks (IFSTTAR) developed a model (VirMaLab: Virtual Maintenance Laboratory) based on dynamic Bayesian networks in order to model a multicomponent system with its degradation dynamic and its diagnosis and maintenance processes. Its main purpose is to model a maintenance policy to be able to optimize the maintenance parameters due to the use of dynamic Bayesian networks. A discrete state-space system is considered, periodically observable through a diagnosis process. Such systems are common in railway or road infrastructure fields. This article presents a prognosis algorithm whose purpose is to compute the remaining useful life of the system and update this estimation each time a new diagnosis is available. Then, a representation of this algorithm is given as a dynamic Bayesian network in order to be next integrated into the Virtual Maintenance Laboratory model to include the set of predictive maintenance policies. Inference computation questions on the considered dynamic Bayesian networks will be discussed. Finally, an application on simulated data will be presented.


Author(s):  
Andrey Chukhray ◽  
Olena Havrylenko

The subject of research in the article is the process of intelligent computer training in engineering skills. The aim is to model the process of teaching engineering skills in intelligent computer training programs through dynamic Bayesian networks. Objectives: To propose an approach to modeling the process of teaching engineering skills. To assess the student competence level by considering the algorithms development skills in engineering tasks and the algorithms implementation ability. To create a dynamic Bayesian network structure for the learning process. To select values for conditional probability tables. To solve the problems of filtering, forecasting, and retrospective analysis. To simulate the developed dynamic Bayesian network using a special Genie 2.0-environment. The methods used are probability theory and inference methods in Bayesian networks. The following results are obtained: the development of a dynamic Bayesian network for the educational process based on the solution of engineering problems is presented. Mathematical calculations for probabilistic inference problems such as filtering, forecasting, and smoothing are considered. The solution of the filtering problem makes it possible to assess the current level of the student's competence after obtaining the latest probabilities of the development of the algorithm and its numerical calculations of the task. The probability distribution of the learning process model is predicted. The number of additional iterations required to achieve the required competence level was estimated. The retrospective analysis allows getting a smoothed assessment of the competence level, which was obtained after the task's previous instance completion and after the computation of new additional probabilities characterizing the two checkpoints implementation. The solution of the described probabilistic inference problems makes it possible to provide correct information about the learning process for intelligent computer training systems. It helps to get proper feedback and to track the student's competence level. The developed technique of the kernel of probabilistic inference can be used as the decision-making model basis for an automated training process. The scientific novelty lies in the fact that dynamic Bayesian networks are applied to a new class of problems related to the simulation of engineering skills training in the process of performing algorithmic tasks.


2019 ◽  
Author(s):  
Daniel Runcie ◽  
Hao Cheng

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.


2020 ◽  
Author(s):  
Rafael Massahiro Yassue ◽  
José Felipe Gonzaga Sabadin ◽  
Giovanni Galli ◽  
Filipe Couto Alves ◽  
Roberto Fritsche-Neto

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.


2021 ◽  
Vol 12 ◽  
Author(s):  
Md. Abdullah Al Bari ◽  
Ping Zheng ◽  
Indalecio Viera ◽  
Hannah Worral ◽  
Stephen Szwiec ◽  
...  

Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder’s toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction’s potential to a set of 482 pea (Pisum sativum L.) accessions—genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components—for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy.


Modelling ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 240-258
Author(s):  
Nima Khakzad

High complexity and growing interdependencies of chemical and process facilities have made them increasingly vulnerable to domino effects. Domino effects, particularly fire dominoes, are spatial-temporal phenomena where not only the location of involved units, but also their temporal entailment in the accident chain matter. Spatial-temporal dependencies and uncertainties prevailing during domino effects, arising mainly from possible synergistic effects and randomness of potential events, restrict the use of conventional risk assessment techniques such as fault tree and event tree. Bayesian networks—a type of probabilistic network for reasoning under uncertainty—have proven to be a reliable and robust technique for the modeling and risk assessment of domino effects. In the present study, applications of Bayesian networks to modeling and safety assessment of domino effects in petroleum tank terminals has been demonstrated via some examples. The tutorial starts by illustrating the inefficacy of event tree analysis in domino effect modeling and then discusses the capabilities of Bayesian network and its derivatives such as dynamic Bayesian network and influence diagram. It is also discussed how noisy OR can be used to significantly reduce the complexity and number of conditional probabilities required for model establishment.


2018 ◽  
Author(s):  
Aditi Bhandari ◽  
Jérôme Bartholomé ◽  
Tuong-Vi Cao ◽  
Nilima Kumari ◽  
Julien frouin ◽  
...  

AbstractDeveloping high yielding rice varieties that are tolerant to drought stress is crucial for the sustainable livelihood of rice farmers in rainfed rice cropping ecosystems. Genomic selection (GS) promises to be an effective breeding option for these complex traits. We evaluated the effectiveness of two rather new options in the implementation of GS: trait and environment-specific marker selection and the use of multi-environment prediction models. A reference population of 280 rainfed lowland accessions endowed with 215k SNP markers data was phenotyped under a favorable and two managed drought environments. Trait-specific SNP subsets (28k) were selected for each trait under each environment, using results of GWAS performed with the complete genotype dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS) under two cross validation scenario: availability (CV2) or not (CV1) of phenotypic data for the validation set, in one of the environments. The most realistic trait-specific marker selection strategy achieved predictive ability (PA) of genomic prediction was up to 22% higher than markers selected on the bases of neutral linkage disequilibrium (LD). Tolerance to drought stress was up to 32% better predicted by multi-environment models (especially RKHS based models) under CV2 strategy. Under the less favorable CV1 strategy, the multi-environment models achieved similar PA than the single-environment predictions. We also showed that reasonable PA could be obtained with as few as 3,000 SNP markers, even in a population of low LD extent, provided marker selection is based on pairwise LD. The implications of these findings for breeding for drought tolerance are discussed. The most resource sparing option would be accurate phenotyping of the reference population in a favorable environment and under a managed drought, while the candidate population would be phenotyped only under one of those environments.


2019 ◽  
Author(s):  
Ainhoa Calleja-Rodriguez ◽  
Jin Pan ◽  
Tomas Funda ◽  
Zhi-Qiang Chen ◽  
John Baison ◽  
...  

ABSTRACTHigher genetic gains can be achieved through genomic selection (GS) by shortening time of progeny testing in tree breeding programs. Genotyping-by-sequencing (GBS), combined with two imputation methods, allowed us to perform the current genomic prediction study in Scots pine (Pinus sylvestrisL.). 694 individuals representing 183 full-sib families were genotyped and phenotyped for growth and wood quality traits. 8719 SNPs were used to compare different genomic prediction models. In addition, the impact on the predictive ability (PA) and prediction accuracy to estimate genomic breeding values was evaluated by assigning different ratios of training and validation sets, as well as different subsets of SNP markers. Genomic Best Linear Unbiased Prediction (GBLUP) and Bayesian Ridge Regression (BRR) combined with expectation maximization (EM) imputation algorithm showed higher PAs and prediction accuracies than Bayesian LASSO (BL). A subset of approximately 4000 markers was sufficient to provide the same PAs and accuracies as the full set of 8719 markers. Furthermore, PAs were similar for both pedigree- and genomic-based estimations, whereas accuracies and heritabilities were slightly higher for pedigree-based estimations. However, prediction accuracies of genomic models were sufficient to achieve a higher selection efficiency per year, varying between 50-87% compared to the traditional pedigree-based selection.


Sign in / Sign up

Export Citation Format

Share Document