scholarly journals Genomic prediction of green fraction dynamics in soybean using UAV observations.

Author(s):  
Yusuke Toda ◽  
Goshi Sasaki ◽  
Yoshihiro Omori ◽  
Yuji Yamasaki ◽  
Hirokazu Takahashi ◽  
...  

Abstract With the widespread use of high-throughput phenotyping systems, growth process data are expected to become more easily available. By applying genomic prediction to growth data, it will be possible to predict the growth of untested genotypes. Predicting the growth process will be useful for crop breeding, as variability in the growth process has a significant impact on the management of plant cultivation. However, the integration of growth modeling and genomic prediction has yet to be studied in depth. In this study, we implemented new prediction models to propose a novel growth prediction scheme. Phenotype data of 198 soybean germplasm genotypes were acquired for three years in experimental fields in Tottori, Japan. The longitudinal changes in the green fractions were measured using UAV remote sensing. Then, a dynamic model was fitted to the green fraction to extract the dynamic characteristics of the green fraction as five parameters. Using the estimated growth parameters, we developed models for genomic prediction of the growth process and tested whether the inclusion of the dynamic model contributed to better prediction of growth. Our proposed models consist of two steps: first, predicting the parameters of the dynamics model with genomic prediction, and then substituting the predicted values for the parameters of the dynamics model. By evaluating the heritability of the growth parameters, the dynamic model was able to effectively extract genetic diversity in the growth characteristics of the green fraction. In addition, the proposed prediction model showed higher prediction accuracy than conventional genomic prediction models, especially when the future growth of the test population is a prediction target given the observed values in the first half of growth as training data. This indicates that our model was able to successfully combine information from the early growth period with phenotypic data from the training population for prediction. This prediction method could be applied to selection at an early growth stage in crop breeding, and could reduce the cost and time of field trials.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Osval Antonio Montesinos-López ◽  
Abelardo Montesinos-López ◽  
Paulino Pérez-Rodríguez ◽  
José Alberto Barrón-López ◽  
Johannes W. R. Martini ◽  
...  

Abstract Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.


Heredity ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Yoseph Beyene ◽  
Manje Gowda ◽  
Jose Crossa ◽  
Paulino Pérez-Rodríguez ◽  
...  

AbstractGenomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.


2021 ◽  
Vol 550 ◽  
pp. 152938
Author(s):  
Kun Jie Yang ◽  
Yue-Lin Liu ◽  
Hai-Hong Li ◽  
Peng Shao ◽  
Xu Zhang ◽  
...  

Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 2050
Author(s):  
Beatriz Castro Dias Cuyabano ◽  
Gabriel Rovere ◽  
Dajeong Lim ◽  
Tae Hun Kim ◽  
Hak Kyo Lee ◽  
...  

It is widely known that the environment influences phenotypic expression and that its effects must be accounted for in genetic evaluation programs. The most used method to account for environmental effects is to add herd and contemporary group to the model. Although generally informative, the herd effect treats different farms as independent units. However, if two farms are located physically close to each other, they potentially share correlated environmental factors. We introduce a method to model herd effects that uses the physical distances between farms based on the Global Positioning System (GPS) coordinates as a proxy for the correlation matrix of these effects that aims to account for similarities and differences between farms due to environmental factors. A population of Hanwoo Korean cattle was used to evaluate the impact of modelling herd effects as correlated, in comparison to assuming the farms as completely independent units, on the variance components and genomic prediction. The main result was an increase in the reliabilities of the predicted genomic breeding values compared to reliabilities obtained with traditional models (across four traits evaluated, reliabilities of prediction presented increases that ranged from 0.05 ± 0.01 to 0.33 ± 0.03), suggesting that these models may overestimate heritabilities. Although little to no significant gain was obtained in phenotypic prediction, the increased reliability of the predicted genomic breeding values is of practical relevance for genetic evaluation programs.


Author(s):  
Maria Y. Gonzalez ◽  
Yusheng Zhao ◽  
Yong Jiang ◽  
Nils Stein ◽  
Antje Habekuss ◽  
...  

AbstractKey messageGenomic prediction with special weight of major genes is a valuable tool to populate bio-digital resource centers.AbstractPhenotypic information of crop genetic resources is a prerequisite for an informed selection that aims to broaden the genetic base of the elite breeding pools. We investigated the potential of genomic prediction based on historical screening data of plant responses against theBarley yellow mosaic virusesfor populating the bio-digital resource center of barley. Our study includes dense marker data for 3838 accessions of winter barley, and historical screening data of 1751 accessions forBarley yellow mosaic virus(BaYMV) and of 1771 accessions forBarley mild mosaic virus(BaMMV). Linear mixed models were fitted by considering combinations for the effects of genotypes, years, and locations. The best linear unbiased estimations displayed a broad spectrum of plant responses against BaYMV and BaMMV. Prediction abilities, computed as correlations between predictions and observed phenotypes of accessions, were low for the marker-assisted selection approach amounting to 0.42. In contrast, prediction abilities of genomic best linear unbiased predictions were high, with values of 0.62 for BaYMV and 0.64 for BaMMV. Prediction abilities of genomic prediction were improved by up to ~ 5% using W-BLUP, in which more weight is given to markers with significant major effects found by association mapping. Our results outline the utility of historical screening data and W-BLUP model to predict the performance of the non-phenotyped individuals in genebank collections. The presented strategy can be considered as part of the different approaches used in genebank genomics to valorize genetic resources for their usage in disease resistance breeding and research.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


Author(s):  
Wael H. Awad ◽  
Bruce N. Janson

Three different modeling approaches were applied to explain truck accidents at interchanges in Washington State during a 27-month period. Three models were developed for each ramp type including linear regression, neural networks, and a hybrid system using fuzzy logic and neural networks. The study showed that linear regression was able to predict accident frequencies that fell within one standard deviation from the overall mean of the dependent variable. However, the coefficient of determination was very low in all cases. The other two artificial intelligence (AI) approaches showed a high level of performance in identifying different patterns of accidents in the training data and presented a better fit when compared to the regression model. However, the ability of these AI models to predict test data that were not included in the training process showed unsatisfactory results.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Cong Bai ◽  
Zhong-Ren Peng ◽  
Qing-Chang Lu ◽  
Jian Sun

Accurate and real-time travel time information for buses can help passengers better plan their trips and minimize waiting times. A dynamic travel time prediction model for buses addressing the cases on road with multiple bus routes is proposed in this paper, based on support vector machines (SVMs) and Kalman filtering-based algorithm. In the proposed model, the well-trained SVM model predicts the baseline bus travel times from the historical bus trip data; the Kalman filtering-based dynamic algorithm can adjust bus travel times with the latest bus operation information and the estimated baseline travel times. The performance of the proposed dynamic model is validated with the real-world data on road with multiple bus routes in Shenzhen, China. The results show that the proposed dynamic model is feasible and applicable for bus travel time prediction and has the best prediction performance among all the five models proposed in the study in terms of prediction accuracy on road with multiple bus routes.


Sign in / Sign up

Export Citation Format

Share Document