scholarly journals Understanding the Effectiveness of Genomic Prediction in Tetraploid Potato

2021 ◽  
Vol 12 ◽  
Author(s):  
Stefan Wilson ◽  
Chaozhi Zheng ◽  
Chris Maliepaard ◽  
Han A. Mulder ◽  
Richard G. F. Visser ◽  
...  

Use of genomic prediction (GP) in tetraploid is becoming more common. Therefore, we think it is the right time for a comparison of GP models for tetraploid potato. GP models were compared that contrasted shrinkage with variable selection, parametric vs. non-parametric models and different ways of accounting for non-additive genetic effects. As a complement to GP, association studies were carried out in an attempt to understand the differences in prediction accuracy. We compared our GP models on a data set consisting of 147 cultivars, representing worldwide diversity, with over 39 k GBS markers and measurements on four tuber traits collected in six trials at three locations during 2 years. GP accuracies ranged from 0.32 for tuber count to 0.77 for dry matter content. For all traits, differences between GP models that utilised shrinkage penalties and those that performed variable selection were negligible. This was surprising for dry matter, as only a few additive markers explained over 50% of phenotypic variation. Accuracy for tuber count increased from 0.35 to 0.41, when dominance was included in the model. This result is supported by Genome Wide Association Study (GWAS) that found additive and dominance effects accounted for 37% of phenotypic variation, while significant additive effects alone accounted for 14%. For tuber weight, the Reproducing Kernel Hilbert Space (RKHS) model gave a larger improvement in prediction accuracy than explicitly modelling epistatic effects. This is an indication that capturing the between locus epistatic effects of tuber weight can be done more effectively using the semi-parametric RKHS model. Our results show good opportunities for GP in 4x potato.

Author(s):  
Hans-Jürgen Auinger ◽  
Christina Lehermeier ◽  
Daniel Gianola ◽  
Manfred Mayer ◽  
Albrecht E. Melchinger ◽  
...  

Abstract Key message Model training on data from all selection cycles yielded the highest prediction accuracy by attenuating specific effects of individual cycles. Expected reliability was a robust predictor of accuracies obtained with different calibration sets. Abstract The transition from phenotypic to genome-based selection requires a profound understanding of factors that determine genomic prediction accuracy. We analysed experimental data from a commercial maize breeding programme to investigate if genomic measures can assist in identifying optimal calibration sets for model training. The data set consisted of six contiguous selection cycles comprising testcrosses of 5968 doubled haploid lines genotyped with a minimum of 12,000 SNP markers. We evaluated genomic prediction accuracies in two independent prediction sets in combination with calibration sets differing in sample size and genomic measures (effective sample size, average maximum kinship, expected reliability, number of common polymorphic SNPs and linkage phase similarity). Our results indicate that across selection cycles prediction accuracies were as high as 0.57 for grain dry matter yield and 0.76 for grain dry matter content. Including data from all selection cycles in model training yielded the best results because interactions between calibration and prediction sets as well as the effects of different testers and specific years were attenuated. Among genomic measures, the expected reliability of genomic breeding values was the best predictor of empirical accuracies obtained with different calibration sets. For grain yield, a large difference between expected and empirical reliability was observed in one prediction set. We propose to use this difference as guidance for determining the weight phenotypic data of a given selection cycle should receive in model retraining and for selection when both genomic breeding values and phenotypes are available.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2012 ◽  
Vol 137 (2) ◽  
pp. 71-79 ◽  
Author(s):  
A. Maaike Wubs ◽  
Yun T. Ma ◽  
Ep Heuvelink ◽  
Lia Hemerik ◽  
Leo F.M. Marcelis

Quantifying fruit growth can be desirable for several purposes (e.g., prediction of fruit yield and size, or for the use in crop simulation models). The goal of this article was to determine the best sigmoid function to describe fruit growth of pepper (Capsicum annuum) from nondestructive fruit growth measurements. The Richards, Gompertz, logistic, and beta growth functions were tested. Fruit growth of sweet pepper was measured nondestructively in an experiment with three different average daily temperatures (18, 21, and 24 °C) and in an experiment with six cultivars with different fruit sizes (20 to 205 g fresh weight). Measurements of fruit length and fruit diameter or circumference were performed twice per week. From these, fruit volume was estimated. A linear relationship related fruit fresh weight to estimated fruit volume, and a Ricker or polynomial function related fruit dry matter content to fruit age. These relations were used to convert estimated fruit volume into fruit fresh and dry weights. As dry weight increased until harvest, fitting the sigmoid function to the dry weight data was less suitable: it would create uncertainty in the estimated asymptote. Therefore, the sigmoid functions were fitted to fresh weight growth of the fruit. The Richards function was the best function in each data set, closely followed by the Gompertz function. The fruit dry weight growth is obtained by multiplication of the sigmoid function and the function relating fruit dry matter content to fruit age.


2014 ◽  
Author(s):  
Frank Technow ◽  
L. Radu Totir

Estimation set size is an important determinant of genomic prediction accuracy. Plant breeding programs are characterized by a high degree of structuring, particularly into populations. This hampers establishment of large estimation sets for each population. Pooling populations increases estimation set size but ignores unique genetic characteristics of each. A possible solution is partial pooling with multilevel models, which allows estimating population specific marker effects while still leveraging information across populations. We developed a Bayesian multilevel whole-genome regression model and compared its performance to that of the popular BayesA model applied to each population separately (no pooling) and to the joined data set (complete pooling). As example we analyzed a wide array of traits from the nested association mapping maize population. There we show that for small population sizes (e.g., < 50), partial pooling increased prediction accuracy over no or complete pooling for populations represented in the estimation set. No pooling was superior however when populations were large. In another example data set of interconnected biparental maize populations either partial or complete pooling were superior, depending on the trait. A simulation showed that no pooling is superior when differences in genetic effects among populations are large and partial pooling when they are intermediate. With small differences, partial and complete pooling achieved equally high accuracy. For prediction of new populations, partial and complete pooling had very similar accuracy in all cases. We conclude that partial pooling with multilevel models can maximize the potential of pooling by making optimal use of information in pooled estimation sets.


2022 ◽  
Author(s):  
Jian Cheng ◽  
Francesco Tiezzi ◽  
Jeremy Howard ◽  
Christian Maltecca ◽  
Jicai Jiang

Abstract Background: Genomic selection has been implemented in livestock genetic evaluations for years. However, currently most genomic selection models only consider the additive effects associated with SNP markers and nonadditive genetic effects have been for the most part ignored. Methods: Production traits for 26,735 to 27,647 Duroc pigs and reproductive traits for 5,338 sows were used, including off-test body weight (WT), off-test back fat (BF), off-test loin muscle depth (MS), number born alive (NBA), number born dead (NBD), and number weaned (NW). All animals were genotyped with the PorcineSNP60K Bead Chip. Variance components were estimated using a linear mixed model that includes inbreeding coefficient, additive, dominance, additive-by-additive, additive-by-dominance, dominance-by-dominance effect, and common litter environmental effect. Genomic prediction performance, including all nonadditive genetic effects, was compared with a reduced model that included only additive genetic effect. Results: Significant estimates of additive-by-additive effect variance were observed for NBA, BF, and WT (31%, 9%, and 10%, respectively). Production traits showed significant large estimates of additive-by-dominance variance (9%-23%). MS also showed large estimate of dominance-by-dominance variance (10%). Dominance effect variance estimates were low for all traits (0%-2%). Compared to the reduced model, prediction accuracies using the full model, including nonadditive effects, increased significantly by 12%, 12%, and 1% for NBA, WT, and MS, respectively. A strong dominance association signal with BF was identified near AK5.Conclusions: Sizable estimates of epistatic effects were found for the reproduction and production traits, while the dominance effect was relatively small for all traits yet significant for all production traits. Including nonadditive effects, especially epistatic effects in the genomic prediction model, significantly improved prediction accuracy for NBA, WT, and MS.


2020 ◽  
Author(s):  
Yihuan Huang ◽  
Amanda Kay Montoya

Machine learning methods are being increasingly adopted in psychological research. Lasso performs variable selection and regularization, and is particularly appealing to psychology researchers because of its connection to linear regression. Researchers conflate properties of linear regression with properties of lasso; however, we demonstrate that this is not the case for models with categorical predictors. Specifically, the coding strategy used for categorical predictors impacts lasso’s performance but not linear regression. Group lasso is an alternative to lasso for models with categorical predictors. We demonstrate the inconsistency of lasso and group lasso models using a real data set: lasso performs different variable selection and has different prediction accuracy depending on the coding strategy, and group lasso performs consistent variable selection but has different prediction accuracy. Additionally, group lasso may include many predictors when very few are needed, leading to overfitting. Using Monte Carlo simulation, we show that categorical variables with one group mean differing from all others (one dominant group) are more likely to be included in the model by group lasso than lasso, leading to overfitting. This effect is strongest when the mean difference is large and there are many categories. Researchers primarily focus on the similarity between linear regression and lasso, but pay little attention to their different properties. This project demonstrates that when using lasso and group lasso, the effect of coding strategies should be considered. We conclude with recommended solutions to this issue and future directions of exploration to improve implementation of machine learning approaches in psychological science.


Author(s):  
Amarullah

The evaluating cassava varieties for productivity necessary to assess the cassava stem characteristics for their suitability as planting material and to improve the capability of cassava farmers to select good quality cassava varieties. Cassava varieties are generally distinguished from each other by their morphological characteristics, which include leaf, stem and tuber colour. The cassava planting material used in this study and some of characteristics by five cassava varieties, consisted of mature stem cuttings of about 20 cm in length, containing between 10 and 12 nodes and planted in a vertical position along the top of the ridges. Each plot consisted of 30 plants, with date being taken from ten plants within each plot. Cassava varieties Malang-6 presented the highest yield average is 13.81 tones ha-1 followed by Singgah and Adira-4 i.e. 11.98 t ha-1 and 11.11 t ha-1 of different varieties 1 yang Ketan, which only weighs 6.63 t ha-1. Harvest index varieties of Adira-4, Malang-6, UJ 5 and Singgah were found 0.78, 0.77, 0.77 and 0.76, respectively higher and significantly different IP varieties of glutinous Ketan is 0.58. Malang varieties and Malang-6 that produces the highest tuber weight, has a low dry matter content is 5.65% and 5.62%. The Ketan varieties and UJ-5 has a tuber weight was significantly lower with dry matter content higher than that dried 8.69% and 8.68%. The UJ-5 varieties has a tuber weight starch HCN 230,17 higher and significantly other variety. Int. J. Agril. Res. Innov. Tech. 10(1): 108-116, June 2020


2009 ◽  
Vol 57 (2) ◽  
pp. 119-125
Author(s):  
G. Hadi

The dry matter and moisture contents of the aboveground vegetative organs and kernels of four maize hybrids were studied in Martonvásár at five harvest dates, with four replications per hybrid. The dry matter yield per hectare of the kernels and other plant organs were investigated in order to obtain data on the optimum date of harvest for the purposes of biogas and silage production.It was found that the dry mass of the aboveground vegetative organs, both individually and in total, did not increase after silking. During the last third of the ripening period, however, a significant reduction in the dry matter content was sometimes observed as a function of the length of the vegetation period. The data suggest that, with the exception of extreme weather conditions or an extremely long vegetation period, the maximum dry matter yield could be expected to range from 22–42%, depending on the vegetation period of the variety. The harvest date should be chosen to give a kernel moisture content of above 35% for biogas production and below 35% for silage production. In this phenophase most varieties mature when the stalks are still green, so it is unlikely that transport costs can be reduced by waiting for the vegetative mass to dry.


2018 ◽  
Vol 13 (1) ◽  
pp. 23
Author(s):  
Rosileyde Golçalves Siqueira Cardoso ◽  
Adriene Woods Pedrosa ◽  
Mateus Cupertino Rodrigues ◽  
Ricardo Henrique Silva Santos ◽  
Paulo Roberto Cecon ◽  
...  

The knowledge about the rate of decomposition and nitrogen mineralization of green manures provides synchronization with the higher absorption stage by the coffee tree. The rate of decomposition and nitrogen mineralization varies according to the species of green manure and with the environmental factors. The aim of the present study was to evaluate the decomposition and nitrogen mineralization of two green manures intercropped with coffee trees for three different periods. The experiment was divided into two designs for statistical analysis, one referring to the characterization of plant material (fresh mass, dry matter, dry matter content, nitrogen concentration and accumulation in the jack bean (Canavalia ensiformis) and hyacinth bean (Dolichos lablab) and another to evaluate the rate of decomposition and N mineralization of these species. The decomposition rate decreased in both species as their growth time increased in the field. The decomposition was influenced by the phenology of green manures. Nitrogen mineralization of the jack bean decreased as the growth period in the field increased and was faster than hyacinth bean only when cut at 60 days. The N mineralization was slower than mass decomposition in both species.


Sign in / Sign up

Export Citation Format

Share Document