Bayesian and Classical Prediction Models for Categorical and Count Data

Multivariate Statistical Machine Learning Methods for Genomic Prediction ◽

10.1007/978-3-030-89010-0_7 ◽

2022 ◽

pp. 209-249

Author(s):

Osval Antonio Montesinos López ◽

Abelardo Montesinos López ◽

Jose Crossa

Keyword(s):

Logistic Regression ◽

Count Data ◽

Poisson Regression ◽

Prediction Models ◽

Multinomial Logistic Regression ◽

Bayesian Framework ◽

Environment Interaction ◽

Genotype Environment Interaction ◽

Interaction Term ◽

Main Effects

AbstractIn this chapter, we explain, under a Bayesian framework, the fundamentals and practical issues for implementing genomic prediction models for categorical and count traits. First, we derive the Bayesian ordinal model and exemplify it with plant breeding data. These examples were implemented in the library BGLR. We also derive the ordinal logistic regression. The fundamentals and practical issues of penalized multinomial logistic regression and penalized Poisson regression are given including several examples illustrating the use of the glmnet library. All the examples include main effects of environments and genotypes as well as the genotype × environment interaction term.

Download Full-text

Genomic Bayesian Prediction Model for Count Data with Genotype × Environment Interaction

10.1101/034967 ◽

2015 ◽

Author(s):

Abelardo Montesinos-Lopez ◽

Osval Montesinos-Lopez ◽

Jose Crossa ◽

Juan Burgueno ◽

Kent Eskridge ◽

...

Keyword(s):

Sample Size ◽

Count Data ◽

Genomic Prediction ◽

Negative Binomial ◽

Prediction Models ◽

Simulated Data ◽

Large Sample Size ◽

Environment Interaction ◽

Data Set ◽

Genotype Environment Interaction

Genomic tools allow the study of the whole genome and are facilitating the study of genotype-environment combinations and their relationship with the phenotype. However, most genomic prediction models developed so far are appropriate for Gaussian phenotypes. For this reason, appropriate genomic prediction models are needed for count data, since the conventional regression models used on count data with a large sample size (n) and a small number of parameters (p) cannot be used for genomic-enabled prediction where the number of parameters (p) is larger than the sample size (n). Here we propose a Bayesian mixed negative binomial (BMNB) genomic regression model for counts that takes into account genotype by environment (G × E) interaction. We also provide all the full conditional distributions to implement a Gibbs sampler. We evaluated the proposed model using a simulated data set and a real wheat data set from the International Maize and Wheat Improvement Center (CIMMYT) and collaborators. Results indicate that our BMNB model is a viable alternative for analyzing count data.

Download Full-text

A comprehensive, contemporary assessment of the association between hepatosteatosis and coronary artery calcium scoring

European Heart Journal - Cardiovascular Imaging ◽

10.1093/ehjci/jeaa356.234 ◽

2021 ◽

Vol 22 (Supplement_1) ◽

Author(s):

T Heseltine ◽

SW Murray ◽

RL Jones ◽

M Fisher ◽

B Ruzsics

Keyword(s):

Risk Factors ◽

Logistic Regression ◽

Coronary Artery ◽

Regression Model ◽

Coronary Artery Calcium ◽

Logistic Regression Model ◽

Prediction Models ◽

Multinomial Logistic Regression ◽

Cvd Risk ◽

Male Sex

Abstract Funding Acknowledgements Type of funding sources: None. onbehalf Liverpool Multiparametric Imaging Collaboration Background Coronary artery calcium (CAC) score is a well-established technique for stratifying an individual’s cardiovascular disease (CVD) risk. Several well-established registries have incorporated CAC scoring into CVD risk prediction models to enhance accuracy. Hepatosteatosis (HS) has been shown to be an independent predictor of CVD events and can be measured on non-contrast computed tomography (CT). We sought to undertake a contemporary, comprehensive assessment of the influence of HS on CAC score alongside traditional CVD risk factors. In patients with HS it may be beneficial to offer routine CAC screening to evaluate CVD risk to enhance opportunities for earlier primary prevention strategies. Methods We performed a retrospective, observational analysis at a high-volume cardiac CT centre analysing consecutive CT coronary angiography (CTCA) studies. All patients referred for investigation of chest pain over a 28-month period (June 2014 to November 2016) were included. Patients with established CVD were excluded. The cardiac findings were reported by a cardiologist and retrospectively analysed by two independent radiologists for the presence of HS. Those with CAC of zero and those with CAC greater than zero were compared for demographic and cardiac risks. A multivariate analysis comparing the risk factors was performed to adjust for the presence of established risk factors. A binomial logistic regression model was developed to assess the association between the presence of HS and increasing strata of CAC. Results In total there were 1499 patients referred for CTCA without prior evidence of CVD. The assessment of HS was completed in 1195 (79.7%) and CAC score was performed in 1103 (92.3%). There were 466 with CVD and 637 without CVD. The prevalence of HS was significantly higher in those with CVD versus those without CVD on CTCA (51.3% versus 39.9%, p = 0.007). Male sex (50.7% versus 36.1% p= <0.001), age (59.4 ± 13.7 versus 48.1 ± 13.6, p= <0.001) and diabetes (12.4% versus 6.9%, p = 0.04) were also significantly higher in the CAC group compared to the CAC score of zero. HS was associated with increasing strata of CAC score compared with CAC of zero (CAC score 1-100 OR1.47, p = 0.01, CAC score 101-400 OR:1.68, p = 0.02, CAC score >400 OR 1.42, p = 0.14). This association became non-significant in the highest strata of CAC score. Conclusion We found a significant association between the increasing age, male sex, diabetes and HS with the presence of CAC. HS was also associated with a more severe phenotype of CVD based on the multinomial logistic regression model. Although the association reduced for the highest strata of CAC (CAC score >400) this likely reflects the overall low numbers of patients within this group and is likely a type II error. Based on these findings it may be appropriate to offer routine CVD risk stratification techniques in all those diagnosed with HS.

Download Full-text

Evaluation of sugarcane genotypes with respect to sucrose yield across three crop cycles using GGE biplot analysis

Experimental Agriculture ◽

10.1017/s0014479721000144 ◽

2021 ◽

pp. 1-13

Author(s):

Aliya Momotaz ◽

Per H. McCord ◽

R. Wayne Davidson ◽

Duli Zhao ◽

Miguel Baltazar ◽

...

Keyword(s):

Block Design ◽

Genotype By Environment Interaction ◽

Environment Interaction ◽

Gge Biplot ◽

Genotype By Environment ◽

Biplot Analysis ◽

Genotype Environment Interaction ◽

Area 5 ◽

Randomized Complete Block Design ◽

Main Effects

Summary The experiment was carried out in three crop cycles as plant cane, first ratoon, and second ratoon at five locations on Florida muck soils (histosols) to evaluate the genotypes, test locations, and identify the superior and stable sugarcane genotypes. There were 13 sugarcane genotypes along with three commercial cultivars as checks included in this study. Five locations were considered as environments to analyze genotype-by-environment interaction (GEI) in 13 genotypes in three crop cycles. The sugarcane genotypes were planted in a randomized complete block design with six replications at each location. Performance was measured by the traits of sucrose yield tons per hectare (SY) and commercial recoverable sugar (CRS) in kilograms of sugar per ton of cane. The data were subjected to genotype main effects and genotype × environment interaction (GGE) analyses. The results showed significant effects for genotype (G), locations (E), and G × E (genotype × environment interaction) with respect to both traits. The GGE biplot analysis showed that the sugarcane genotype CP 12-1417 was high yielding and stable in terms of sucrose yield. The most discriminating and non-representative locations were Knight Farm (KN) for both SY and CRS. For sucrose yield only, the most discriminating and non-representative locations were Knight Farm (KN), Duda and Sons, Inc. USSC, Area 5 (A5), and Okeelanta (OK).

Download Full-text

GENOTYPE-ENVIRONMENT INTERACTION OF YIELD IN CEREAL CROPS IN NORTHWESTERN CANADA

Canadian Journal of Plant Science ◽

10.4141/cjps81-038 ◽

1981 ◽

Vol 61 (2) ◽

pp. 255-263 ◽

Cited By ~ 3

Author(s):

R. M. De PAUW ◽

D. G. FARIS ◽

C. J. WILLIAMS

Keyword(s):

Triticum Aestivum L ◽

Regression Coefficients ◽

Cereal Crops ◽

Environment Interaction ◽

Ge Interaction ◽

Hordeum Vulgare L ◽

Genotype Environment Interaction ◽

Early Maturing ◽

Frost Free Period ◽

Main Effects

Three cultivars of each crop, wheat (Triticum aestivum L.), oats (Avena sativa L.), and barley (Hordeum vulgare L.), were grown for 4 yr at five locations north of the 55th parallel in northwestern Canada. There were highly significant differences among all main effects and interactions. Galt barley produced the highest seed yield followed by Centennial barley, Random oats and Harmon oats. Victory oats, Olli barley, Neepawa wheat and Pitic 62 wheat yielded similarly to each other while Thatcher wheat was significantly lower yielding. Mean environment yields ranged from 2080 to 5610 kg/ha. The genotype-environment (GE) interaction of species and cultivars was sufficiently complicated that it could not be characterized by one or two statistics (e.g., stability variances or regression coefficients). However, variability in frost-free period among years and locations contributed to the GE interaction because, for example, some cultivars yielded well (e.g., Pitic 62) only in those year-location environments with a relatively long frost-free period while other early maturing cultivars (e.g., Olli) performed well even in a short frost-free period environment.

Download Full-text

Genotype × Environment Interaction in Psychopathology: Fact or Artifact?

Twin Research and Human Genetics ◽

10.1375/twin.9.1.1 ◽

2006 ◽

Vol 9 (1) ◽

pp. 1-8 ◽

Cited By ~ 96

Author(s):

Lindon J. Eaves

Keyword(s):

Candidate Genes ◽

Biological Significance ◽

Simulated Data ◽

Threshold Value ◽

Environment Interaction ◽

Additive Effects ◽

Mixture Of Distributions ◽

Genotype Environment Interaction ◽

Genes And Environment ◽

Main Effects

AbstractRecent studies have claimed to detect interaction between candidate genes and specific environmental factors (Genotype × Environment interaction, G × E) in susceptibility to psychiatric disorder. The objective of the present study was to examine possible artifacts that could explain widely publicized findings. The additive effects of candidate genes and measured environment on liability to disorder were simulated under a model that allowed for mixture of distributions in liability conditional on genotype and environment. Simulated liabilities were dichotomized at a threshold value to reflect diagnosis of disorder. Multiple blocks of simulated data were analyzed by standard statistical methods to test for the main effects and interactions of genes and environment on outcome. The main outcome of this study was simulated liabilities and diagnoses of major depression and antisocial behavior. Analysis of the dichotomized data by logistic regression frequently detected significant G × E interaction even though none was present for liability. There is therefore reason to question the biological significance of published findings.

Download Full-text

Genomic Bayesian Prediction Model for Count Data with Genotype × Environment Interaction

G3 Genes|Genome|Genetics ◽

10.1534/g3.116.028118 ◽

2016 ◽

Vol 6 (5) ◽

pp. 1165-1177 ◽

Cited By ~ 6

Author(s):

Abelardo Montesinos-López ◽

Osval A. Montesinos-López ◽

José Crossa ◽

Juan Burgueño ◽

Kent M. Eskridge ◽

...

Keyword(s):

Prediction Model ◽

Count Data ◽

Bayesian Prediction ◽

Environment Interaction ◽

Genotype Environment Interaction

Download Full-text

GENOTYPE × ENVIRONMENT INTERACTION IN CACTUS PEAR (OPUNTIA SPP.), ADDITIVE MAIN EFFECTS AND MULTIPLICATIVE INTERACTION ANALYSIS OF FRUIT YIELD

Acta Horticulturae ◽

10.17660/actahortic.2006.728.12 ◽

2006 ◽

pp. 97-104 ◽

Cited By ~ 4

Author(s):

J. Potgieter ◽

M. Smith

Keyword(s):

Interaction Analysis ◽

Fruit Yield ◽

Environment Interaction ◽

Cactus Pear ◽

Multiplicative Interaction ◽

Genotype Environment Interaction ◽

Main Effects ◽

Opuntia Spp

Download Full-text

Identification of pigeonpea genotypes with wider adaptability to rainfed environments through AMMI and GGE biplot analyses

Indian Journal of Genetics and Plant Breeding (The) ◽

10.31742/ijgpb.81.1.7 ◽

2021 ◽

Vol 81 (01) ◽

pp. 63-73

Author(s):

M. V. Nagesh Kumar ◽

V. Ramya ◽

C. V. Sameer Kumar ◽

T. Raju ◽

N. M. Sunil Kumar ◽

...

Keyword(s):

Cajanus Cajan ◽

Large Scale ◽

Interaction Model ◽

Environment Interaction ◽

Gge Biplot ◽

Rainfed Agriculture ◽

Genotype By Environment ◽

Genotype Environment Interaction ◽

Wide Range ◽

Main Effects

Pigeonpea [Cajanus cajan (L.) Millspaugh] is an important pulse crop grown under Indian rainfed agriculture. Twenty eight pigeonpea genotypes were tested for stability and adaptability across ten rainfed locations in the States of Telangana and Karnataka, India using AMMI (additive main effects and multiplicative interaction) model and GGE (genotype and genotype by environment) biplot method. The grain yields were significantly affected by environment (56.8%) followed by genotype × environment interaction (27.6%) and genotype (18.6%) variances. Two mega environments were identified with several winning genotypes viz., ICPH 2740 (G15), TS 3R (G10), PRG 176 (G8) and ICPL 96058 (G22). E2 (Gulbarga, Karnataka), E3 (Bidar, Karnataka) and E6 (Vikarabad, Telangana) were the most discriminating environments. Genotypes, ICPH 2740, PRG 176 and TS 3R were the best cultivars in all the environments whereas PRG 158 (G9), ICPL 87119 (G12), ICPL 20098 (G19) and ICPL 96058 (G22) were suitable across a wide range of environments. Genotypes, ICPH 2740 and PRG 176 can be recommended on a large scale to the farmers with small holdings to enhance pigeonpea productivity and improve the food security

Download Full-text

Genotype × Environment Interaction for Resistance to Spider Mites in Fragaria

Journal of the American Society for Horticultural Science ◽

10.21273/jashs.124.4.353 ◽

1999 ◽

Vol 124 (4) ◽

pp. 353-357 ◽

Cited By ~ 3

Author(s):

José López Medina ◽

Patrick P. Moore ◽

Carl H. Shanks ◽

Fernando Flores Gil ◽

Craig K. Chandler

Keyword(s):

Spider Mite ◽

The United States ◽

Spider Mites ◽

Environment Interaction ◽

Interaction Patterns ◽

First Order ◽

Genotype Environment Interaction ◽

Main Effect ◽

Ammi Analysis ◽

Main Effects

Genotype × environment interaction for resistance to the twospotted spider mite (Tetranychus urticae Koch) of eleven clones of Fragaria L. sp. (strawberries) grown in six environments throughout the United States was examined using two multivariate analysis techniques, principal coordinate analysis (PCA) and additive main effect and multiplicative interaction (AMMI). Both techniques provided useful and interesting ways of investigating genotype × environment interaction. PCA analysis indicated that clones X-11 and E-15 were stable across both low and high environments for the number of spider mites per leaflet. The initial AMMI analysis showed that the main effects of genotype, environment, and their first-order interaction were highly significant, with genotype × environment interaction due mainly to cultivar `Totem' and environment FL94. A second AMMI analysis, which excluded `Totem' and FL94, showed that the main effects of the remaining genotypes, environments, and genotype × environment interaction were also highly significant. AMMI biplot analysis revealed that FL93 and GH93 were unstable environments, but with opposite interaction patterns; and GCL-8 and WSU2198 were unstable genotypes with similar interactions that were opposite those of WSU 2202.

Download Full-text

Bayesian Genomic Linear Regression

Multivariate Statistical Machine Learning Methods for Genomic Prediction ◽

10.1007/978-3-030-89010-0_6 ◽

2022 ◽

pp. 171-208

Author(s):

Osval Antonio Montesinos López ◽

Abelardo Montesinos López ◽

Jose Crossa

Keyword(s):

Linear Regression ◽

Bayesian Methods ◽

Environment Interaction ◽

Continuous Response ◽

Interaction Terms ◽

Bayesian Paradigm ◽

Genotype Environment Interaction ◽

Environment Variables ◽

Other Information ◽

Main Effects

AbstractThe Bayesian paradigm for parameter estimation is introduced and linked to the main problem of genomic-enabled prediction to predict the trait of interest of the non-phenotyped individuals from genotypic information, environment variables, or other information (covariates). In this situation, a convenient practice is to include the individuals to be predicted in the posterior distribution to be sampled. We explained how the Bayesian Ridge regression method is derived and exemplified with data from plant breeding genomic selection. Other Bayesian methods (Bayes A, Bayes B, Bayes C, and Bayesian Lasso) were also described and exemplified for genome-based prediction. The chapter presented several examples that were implemented in the Bayesian generalized linear regression (BGLR) library for continuous response variables. The predictor under all these Bayesian methods includes main effects (of environments and genotypes) as well as interaction terms related to genotype × environment interaction.

Download Full-text