Bayesian estimation of a surface to account for a spatial trend using penalized splines in an individual-tree mixed model

2007 ◽  
Vol 37 (12) ◽  
pp. 2677-2688 ◽  
Author(s):  
Eduardo P. Cappa ◽  
Rodolfo J.C. Cantet

Unaccounted for spatial variability leads to bias in estimating genetic parameters and predicting breeding values from forest genetic trials. Previous attempts to account for large-scale continuous spatial variation employed spatial coordinates in the direction of the rows (or columns). In this research, we use an individual-tree mixed model and the tensor product of B-spline bases with a proper covariance structure for the random knot effects to account for spatial variability. Dispersion parameters were estimated using Bayesian techniques via Gibbs sampling. The procedure is illustrated with data from a progeny trial of Eucalyptus globulus subsp. globulus Labill. Four different models were used in the sequel. The first model included block effects and the three other models included a surface on a grid of either 8 × 8, 12 × 12, or 18 × 18 knots. The three models with B-splines displayed a sizeable lower value of the deviance information criterion than the model with blocks. Also, the mixed models fitting a surface displayed a consistent reduction in the posterior mean of σ2e, an increase in the posterior means of σ2A and h2DBH, and an increase of 66% (for parents) or 60% (for offspring) in the accuracy of breeding values.

2006 ◽  
Vol 36 (5) ◽  
pp. 1276-1285 ◽  
Author(s):  
Eduardo P Cappa ◽  
Rodolfo JC Cantet

In forest genetics, restricted maximum likelihood (REML) estimation of (co)variance components from normal multiple-trait individual-tree models is affected by the absence of observations in any trait and individual. Missing records affect the form of the distribution of REML estimates of genetics parameters, or of functions of them, and the estimating equations are computationally involved when several traits are analysed. An alternative to REML estimation is a fully Bayesian approach through Markov chain Monte Carlo. The present research describes the use of the full conjugate Gibbs algorithm proposed by Cantet et al. (R.J.C. Cantet, A.N. Birchmeier, and J.P. Steibel. 2004. Genet. Sel. Evol. 36: 49–64) to estimate (co)variance components in multiple-trait individual-tree models. This algorithm converges faster to the marginal posterior densities of the parameters than regular data augmentation from multivariate normal data with missing records. An expression to calculate the deviance information criterion for the selection of linear parameters in normal multiple-trait models is also given. The developments are illustrated by means of data from different crosses of two species of Pinus.


2008 ◽  
Vol 57 (1-6) ◽  
pp. 45-56 ◽  
Author(s):  
E. P. Cappa ◽  
R. J. C. Cantet

Abstract An individual tree model with additive direct and competition effects is introduced to account for competitive effects in forest genetics evaluation. The mixed linear model includes fixed effects as well as direct and competition breeding values plus permanent environmental effects. Competition effects, either additive or environmental, are identified in the phenotype of a competitor tree by means of ‘intensity of competition’ elements (IC), which are non-zero elements of the incidence matrix of the additive competition effects. The ICs are inverse function of the distance and the number of competing individuals, either row-column wise or diagonally. The ICs allow standardization of the variance of competition effects in the phenotypic variance of any individual tree, so that the model accounts for unequal number of neighbors. Expressions are obtained for the bias in estimating additive variance using the covariance between half-sibs, when ignoring competition effects for row-plot designs and for single-tree plot designs. A data set of loblolly pines on growth at breast height is used to estimate the additive variances of direct and competition effects, the covariance between both effects, and the variance of permanent environmental effects using a Bayesian method via Gibbs sampling and Restricted Maximum Likelihood procedures (REML) via the Expectation- Maximization (EM) algorithm. No problem of convergence was detected with the model and ICs used when compared to what has been reported in the animal breeding literature for such models. Posterior means (standard error) of the estimated parameters were σ̂2Ad = 12.553 (1.447), σ̂2Ac = 1.259 (0.259), σ̂AdAc = -3.126 (0.492), σ̂2 p = 1.186 (0.289), and σ̂2e = 5.819 (1.07). Leaving permanent environmental competition effects out of the model may bias the predictions of direct breeding values. Results suggest that selection for increasing direct growth while keeping a low level of competition is feasible.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Jun Bao ◽  
Runqing Yang ◽  
Yuxin Song ◽  
Zhiyu Hao ◽  
...  

Abstract Generalized linear mixed models exhibit computationally intensive and biasness in mapping quantitative trait nucleotides for binary diseases. In genomic logit regression, we consider genomic breeding values estimated in advance as a known predictor, and then correct the deflated association test statistics by using genomic control, thereby successfully extending GRAMMAR-Lambda to analyze binary diseases in a complex structured population. Because there is no need to estimate genomic heritability and genomic breeding values can be estimated by a small number of sampling markers, the generalized mixed-model association analysis has been extremely simplified to handle large-scale data. With almost perfect genomic control, joint analysis for the candidate quantitative trait nucleotides chosen by multiple testing offered a significant improvement in statistical power.


2014 ◽  
Vol 96 ◽  
Author(s):  
JOAQUIM CASELLAS ◽  
DANIEL GIANOLA ◽  
JUAN F. MEDRANO

SummaryThe continuous uploading of polygenic additive mutational variability has been reported in several studies in laboratory species with an inbred genetic background. These studies have focused on the direct contribution of new mutations without considering the possibility of epistatic effects derived from the interaction of new mutations with pre-existing polymorphisms. In this work we focused on this main topic and analysed the statistical and biological relevance of the epistatic variance for 9 week body weight in two populations of inbred mice. We developed a new linear mixed model parameterization where founder-related additive genetic variability, additive mutational variability and the interaction terms between both sources of variation were accounted for under a Bayesian design and without requiring the inversion of a matrix of epistatic genetic covariances. The analyses focused on a six-generations data set from C57BL/6J mice (n = 3736) and a five-generations data set from C57BL/6Jhg/hg mice (n = 2843). The deviance information criterion (DIC) clearly favoured the model accounting for epistatic variability with reductions larger than 50 DIC units in both populations. Modal estimates for founder related, mutational and epistatic heritabilities were 0·068, 0·011 and 0·095 in C57BL/6J and 0·060, 0·010 and 0·113 in C57BL/6Jhg/hg, ruling out any doubt about the biological relevance of epistasis originating from new mutations in mice. These results contribute new insights on the relevance of epistasis in the genetic architecture of mammals and serve as an important component of an additional source of genetic heterogeneity for inbred strains of laboratory mice.


2011 ◽  
Vol 60 (1-6) ◽  
pp. 25-35 ◽  
Author(s):  
E. P. Cappa ◽  
M. Lstiburek ◽  
A. D. Yanchuk ◽  
Y. A. El-Kassaby

AbstractSpatial environmental heterogeneity are well known characteristics of field forest genetic trials, even in small experiments (<1ha) established under seemingly uniform conditions and intensive site management. In such trials, it is commonly assumed that any simple type of experimental field design based on randomization theory, as a completely randomized design (CRD), should account for any of the minor site variability. However, most published results indicate that in these types of trials harbor a large component of the spatial variation which commonly resides in the error term. Here we applied a two-dimensional smoothed surface in an individual-tree mixed model, using tensor product of linear, quadratic and cubic B-spline bases with different and equal number of knots for rows and columns, to account for the environmental spatial variability in two relatively small (i.e., 576 m2and 5,705 m2) forest genetic trials, with large multiple-tree contiguous plot configurations. In general, models accounting for site variability with a two-dimensional surface displayed a lower value of the deviance information criterion than the classical RCD. Linear B-spline bases may yield a reasonable description of the environmental variability, when a relatively small amount of information available. The mixed models fitting a smoothed surface resulted in a reduction in the posterior means of the error variance (σ2e), an increase in the posterior means of the additive genetic variance (σ2a) and heritability (h2HT), and an increase of 16.05% and 46.03% (for parents) or 11.86% and 44.68% (for offspring) in the accuracy of breeding values, respectively in the two experiments.


Forests ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 1169
Author(s):  
Gary R. Hodge ◽  
Juan Jose Acosta

Research Highlights: An algorithm is presented that allows for the analysis of full-sib genetic datasets using generalized mixed-model software programs. The algorithm produces variance component estimates, genetic parameter estimates, and Best Linear Unbiased Prediction (BLUP) solutions for genetic values that are, for all practical purposes, identical to those produced by dedicated genetic software packages. Background and Objectives: The objective of this manuscript is to demonstrate an approach with a simulated full-sib dataset representing a typical forest tree breeding population (40 parents, 80 full-sib crosses, 4 tests, and 6000 trees) using two widely available mixed-model packages. Materials and Methods: The algorithm involves artificially doubling the dataset, so that each observation is in the dataset twice, once with the original female and male parent identification, and once with the female and male parent identities switched. Five linear models were examined: two models using a dedicated genetic software program (ASREML) with the capacity to specify A or other pedigree-related functions, and three models with the doubled dataset and a parent (or sire) linear model (ASREML, SAS Proc Mixed, and R lme4). Results: The variance components, genetic parameters, and BLUPs of the parental breeding values, progeny breeding values, and full-sib family-specific combining abilities were compared. Genetic parameter estimates were essentially the same across all the analyses (e.g., the heritability ranged from h2 = 0.220 to 0.223, and the proportion of dominance variance ranged from d2 = 0.057 to 0.058). The correlations between the BLUPs from the baseline analysis (ASREML with an individual tree model) and the doubled-dataset/parent models using SAS Proc Mixed or R lme4 were never lower than R = 0.99997. Conclusions: The algorithm can be useful for analysts who need to analyze full-sib genetic datasets and who are familiar with general-purpose statistical packages, but less familiar with or lacking access to other software.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Jun Bao ◽  
Runqing Yang ◽  
Yuxin Song ◽  
Zhiyu Hao ◽  
...  

Abstract Generalized linear mixed models exhibit computationally intensive and biasness in mapping quantitative trait nucleotides for binary diseases. In genomic logit regression, we consider genomic breeding values estimated in advance as a known predictor, and then correct the deflated association test statistics by using genomic control, thereby successfully extending GRAMMAR-Lambda to analyze binary diseases in a complex structured population. Because there is no need to estimate genomic heritability and genomic breeding values can be estimated by a small number of sampling markers, the generalized mixed-model association analysis has been extremely simplified to handle large-scale data. With almost perfect genomic control, joint analysis for the candidate quantitative trait nucleotides chosen by multiple testing offered a significant improvement in statistical power.


Methodology ◽  
2017 ◽  
Vol 13 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Pablo Livacic-Rojas ◽  
Guillermo Vallejo ◽  
Paula Fernández ◽  
Ellián Tuero-Herrero

Abstract. Low precision of the inferences of data analyzed with univariate or multivariate models of the Analysis of Variance (ANOVA) in repeated-measures design is associated to the absence of normality distribution of data, nonspherical covariance structures and free variation of the variance and covariance, the lack of knowledge of the error structure underlying the data, and the wrong choice of covariance structure from different selectors. In this study, levels of statistical power presented the Modified Brown Forsythe (MBF) and two procedures with the Mixed-Model Approaches (the Akaike’s Criterion, the Correctly Identified Model [CIM]) are compared. The data were analyzed using Monte Carlo simulation method with the statistical package SAS 9.2, a split-plot design, and considering six manipulated variables. The results show that the procedures exhibit high statistical power levels for within and interactional effects, and moderate and low levels for the between-groups effects under the different conditions analyzed. For the latter, only the Modified Brown Forsythe shows high level of power mainly for groups with 30 cases and Unstructured (UN) and Autoregressive Heterogeneity (ARH) matrices. For this reason, we recommend using this procedure since it exhibits higher levels of power for all effects and does not require a matrix type that underlies the structure of the data. Future research needs to be done in order to compare the power with corrected selectors using single-level and multilevel designs for fixed and random effects.


Genetics ◽  
1996 ◽  
Vol 143 (4) ◽  
pp. 1819-1829 ◽  
Author(s):  
G Thaller ◽  
L Dempfle ◽  
I Hoeschele

Abstract Maximum likelihood methodology was applied to determine the mode of inheritance of rare binary traits with data structures typical for swine populations. The genetic models considered included a monogenic, a digenic, a polygenic, and three mixed polygenic and major gene models. The main emphasis was on the detection of major genes acting on a polygenic background. Deterministic algorithms were employed to integrate and maximize likelihoods. A simulation study was conducted to evaluate model selection and parameter estimation. Three designs were simulated that differed in the number of sires/number of dams within sires (10/10, 30/30, 100/30). Major gene effects of at least one SD of the liability were detected with satisfactory power under the mixed model of inheritance, except for the smallest design. Parameter estimates were empirically unbiased with acceptable standard errors, except for the smallest design, and allowed to distinguish clearly between the genetic models. Distributions of the likelihood ratio statistic were evaluated empirically, because asymptotic theory did not hold. For each simulation model, the Average Information Criterion was computed for all models of analysis. The model with the smallest value was chosen as the best model and was equal to the true model in almost every case studied.


Sign in / Sign up

Export Citation Format

Share Document