scholarly journals Genomic Bayesian Prediction Model for Count Data with Genotype × Environment Interaction

2015 ◽  
Author(s):  
Abelardo Montesinos-Lopez ◽  
Osval Montesinos-Lopez ◽  
Jose Crossa ◽  
Juan Burgueno ◽  
Kent Eskridge ◽  
...  

Genomic tools allow the study of the whole genome and are facilitating the study of genotype-environment combinations and their relationship with the phenotype. However, most genomic prediction models developed so far are appropriate for Gaussian phenotypes. For this reason, appropriate genomic prediction models are needed for count data, since the conventional regression models used on count data with a large sample size (n) and a small number of parameters (p) cannot be used for genomic-enabled prediction where the number of parameters (p) is larger than the sample size (n). Here we propose a Bayesian mixed negative binomial (BMNB) genomic regression model for counts that takes into account genotype by environment (G × E) interaction. We also provide all the full conditional distributions to implement a Gibbs sampler. We evaluated the proposed model using a simulated data set and a real wheat data set from the International Maize and Wheat Improvement Center (CIMMYT) and collaborators. Results indicate that our BMNB model is a viable alternative for analyzing count data.

Author(s):  
Osval Antonio Montesinos López ◽  
Abelardo Montesinos López ◽  
Jose Crossa

AbstractIn this chapter, we explain, under a Bayesian framework, the fundamentals and practical issues for implementing genomic prediction models for categorical and count traits. First, we derive the Bayesian ordinal model and exemplify it with plant breeding data. These examples were implemented in the library BGLR. We also derive the ordinal logistic regression. The fundamentals and practical issues of penalized multinomial logistic regression and penalized Poisson regression are given including several examples illustrating the use of the glmnet library. All the examples include main effects of environments and genotypes as well as the genotype × environment interaction term.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2020 ◽  
Vol 10 (8) ◽  
pp. 2629-2639
Author(s):  
Edna K. Mageto ◽  
Jose Crossa ◽  
Paulino Pérez-Rodríguez ◽  
Thanda Dhliwayo ◽  
Natalia Palacios-Rojas ◽  
...  

Zinc (Zn) deficiency is a major risk factor for human health, affecting about 30% of the world’s population. To study the potential of genomic selection (GS) for maize with increased Zn concentration, an association panel and two doubled haploid (DH) populations were evaluated in three environments. Three genomic prediction models, M (M1: Environment + Line, M2: Environment + Line + Genomic, and M3: Environment + Line + Genomic + Genomic x Environment) incorporating main effects (lines and genomic) and the interaction between genomic and environment (G x E) were assessed to estimate the prediction ability (rMP) for each model. Two distinct cross-validation (CV) schemes simulating two genomic prediction breeding scenarios were used. CV1 predicts the performance of newly developed lines, whereas CV2 predicts the performance of lines tested in sparse multi-location trials. Predictions for Zn in CV1 ranged from -0.01 to 0.56 for DH1, 0.04 to 0.50 for DH2 and -0.001 to 0.47 for the association panel. For CV2, rMP values ranged from 0.67 to 0.71 for DH1, 0.40 to 0.56 for DH2 and 0.64 to 0.72 for the association panel. The genomic prediction model which included G x E had the highest average rMP for both CV1 (0.39 and 0.44) and CV2 (0.71 and 0.51) for the association panel and DH2 population, respectively. These results suggest that GS has potential to accelerate breeding for enhanced kernel Zn concentration by facilitating selection of superior genotypes.


2006 ◽  
Vol 9 (1) ◽  
pp. 1-8 ◽  
Author(s):  
Lindon J. Eaves

AbstractRecent studies have claimed to detect interaction between candidate genes and specific environmental factors (Genotype × Environment interaction, G × E) in susceptibility to psychiatric disorder. The objective of the present study was to examine possible artifacts that could explain widely publicized findings. The additive effects of candidate genes and measured environment on liability to disorder were simulated under a model that allowed for mixture of distributions in liability conditional on genotype and environment. Simulated liabilities were dichotomized at a threshold value to reflect diagnosis of disorder. Multiple blocks of simulated data were analyzed by standard statistical methods to test for the main effects and interactions of genes and environment on outcome. The main outcome of this study was simulated liabilities and diagnoses of major depression and antisocial behavior. Analysis of the dichotomized data by logistic regression frequently detected significant G × E interaction even though none was present for liability. There is therefore reason to question the biological significance of published findings.


2016 ◽  
Vol 6 (5) ◽  
pp. 1165-1177 ◽  
Author(s):  
Abelardo Montesinos-López ◽  
Osval A. Montesinos-López ◽  
José Crossa ◽  
Juan Burgueño ◽  
Kent M. Eskridge ◽  
...  

2016 ◽  
Vol 27 (4) ◽  
pp. 1187-1201 ◽  
Author(s):  
Marzieh Mahmoodi ◽  
Abbas Moghimbeigi ◽  
Kazem Mohammad ◽  
Javad Faradmal

This study proposes semiparametric models for analysis of hierarchical count data containing excess zeros and overdispersion simultaneously. The methods discussed in this paper handle nonlinear covariate effects through flexible semiparametric multilevel regression techniques. This is performed by providing a comprehensive comparison of semiparametric multilevel zero-inflated negative binomial and semiparametric multilevel zero-inflated generalized Poisson models under the real and simulated data. An EM algorithm based on Newton–Raphson equations for maximum penalized likelihood estimation approach is developed. The performance of the proposed models is assessed by using a Monte Carlo simulation study. We also illustrated the methods by the analysis of decayed, missing, and filled teeth of children aged 5–14 years old.


Crop Science ◽  
2017 ◽  
Vol 57 (4) ◽  
pp. 1865-1880 ◽  
Author(s):  
Sivakumar Sukumaran ◽  
Jose Crossa ◽  
Diego Jarquín ◽  
Matthew Reynolds

2020 ◽  
Vol 10 (8) ◽  
pp. 2725-2739 ◽  
Author(s):  
Diego Jarquin ◽  
Reka Howard ◽  
Jose Crossa ◽  
Yoseph Beyene ◽  
Manje Gowda ◽  
...  

“Sparse testing” refers to reduced multi-environment breeding trials in which not all genotypes of interest are grown in each environment. Using genomic-enabled prediction and a model embracing genotype × environment interaction (GE), the non-observed genotype-in-environment combinations can be predicted. Consequently, the overall costs can be reduced and the testing capacities can be increased. The accuracy of predicting the unobserved data depends on different factors including (1) how many genotypes overlap between environments, (2) in how many environments each genotype is grown, and (3) which prediction method is used. In this research, we studied the predictive ability obtained when using a fixed number of plots and different sparse testing designs. The considered designs included the extreme cases of (1) no overlap of genotypes between environments, and (2) complete overlap of the genotypes between environments. In the latter case, the prediction set fully consists of genotypes that have not been tested at all. Moreover, we gradually go from one extreme to the other considering (3) intermediates between the two previous cases with varying numbers of different or non-overlapping (NO)/overlapping (O) genotypes. The empirical study is built upon two different maize hybrid data sets consisting of different genotypes crossed to two different testers (T1 and T2) and each data set was analyzed separately. For each set, phenotypic records on yield from three different environments are available. Three different prediction models were implemented, two main effects models (M1 and M2), and a model (M3) including GE. The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component. Also, M3 provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios. Reducing the size of the calibration sets decreased the prediction accuracy under all allocation designs with M3 being the less affected model; however, using the genome-enabled models (i.e., M2 and M3) the predictive ability is recovered when more genotypes are tested across environments. Our results indicate that a substantial part of the testing resources can be saved when using genome-based models including GE for optimizing sparse testing designs.


Sign in / Sign up

Export Citation Format

Share Document