scholarly journals Genomic Selection Using Environmental Covariates Within an Integrated Factor Analytic Linear Mixed Model

Author(s):  
Daniel Tolhurst ◽  
R. Chris Gaynor ◽  
Brian Gardunia ◽  
John Hickey ◽  
Gregor Gorjanc

Abstract This paper introduces a single-stage genomic selection approach which directly integrates environmental covariates within a special factor analytic framework. The factor analytic approach of Smith et al. (2001) is an effective method of analysis for multi-environment trial (MET) datasets, but has limited biological interpretation since the underlying factors are latent so the modelled genotype by environment interaction (GEI) is observable, rather than predictable. The advantage of using known environmental covariates, such as soil moisture and daily temperature, is that the modelled GEI becomes directly interpretable, and thence predictable. This paper develops a model for both predictable and observable GEI in terms of a joint set of known and latent factors, as well as non-genetic sources of variation within trials and environments. This single-stage approach is referred to as the integrated factor analytic linear mixed model (IFA-LMM). The IFA-LMM is demonstrated on a late-stage cotton breeding MET dataset from Bayer Crop Science. The results show that the environmental covariates explain 34.6% of the genetic variance across environments (compared to only 23.3% for a conventional regression model). This represents 92.7% of the crossover GEI. The latent factors then explain 40.7% of the genetic variance, which represents 87.6% of the non-crossover GEI. This demonstrates the ability of the IFA-LMM to model crossover and non-crossover GEI in a manner that is both informative and practical to plant breeding.

2019 ◽  
Vol 136 (4) ◽  
pp. 279-300 ◽  
Author(s):  
Daniel J. Tolhurst ◽  
Ky L. Mathews ◽  
Alison B. Smith ◽  
Brian R. Cullis

2021 ◽  
Vol 12 ◽  
Author(s):  
Alison Smith ◽  
Adam Norman ◽  
Haydn Kuchel ◽  
Brian Cullis

A major challenge in the analysis of plant breeding multi-environment datasets is the provision of meaningful and concise information for variety selection in the presence of variety by environment interaction (VEI). This is addressed in the current paper by fitting a factor analytic linear mixed model (FALMM) then using the fundamental factor analytic parameters to define groups of environments in the dataset within which there is minimal crossover VEI, but between which there may be substantial crossover VEI. These groups are consequently called interaction classes (iClasses). Given that the environments within an iClass exhibit minimal crossover VEI, it is then valid to obtain predictions of overall variety performance (across environments) for each iClass. These predictions can then be used not only to select the best varieties within each iClass but also to match varieties in terms of their patterns of VEI across iClasses. The latter is aided with the use of a new graphical tool called an iClass Interaction Plot. The ideas are introduced in this paper within the framework of FALMMs in which the genetic effects for different varieties are assumed independent. The application to FALMMs which include information on genetic relatedness is the subject of a subsequent paper.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hae-Un Jung ◽  
Won Jun Lee ◽  
Tae-Woong Ha ◽  
Ji-One Kang ◽  
Jihye Kim ◽  
...  

AbstractMultiple environmental factors could interact with a single genetic factor to affect disease phenotypes. We used Struct-LMM to identify genetic variants that interacted with environmental factors related to body mass index (BMI) using data from the Korea Association Resource. The following factors were investigated: alcohol consumption, education, physical activity metabolic equivalent of task (PAMET), income, total calorie intake, protein intake, carbohydrate intake, and smoking status. Initial analysis identified 7 potential single nucleotide polymorphisms (SNPs) that interacted with the environmental factors (P value < 5.00 × 10−6). Of the 8 environmental factors, PAMET score was excluded for further analysis since it had an average Bayes Factor (BF) value < 1 (BF = 0.88). Interaction analysis using 7 environmental factors identified 11 SNPs (P value < 5.00 × 10−6). Of these, rs2391331 had the most significant interaction (P value = 7.27 × 10−9) and was located within the intron of EFNB2 (Chr 13). In addition, the gene-based genome-wide association study verified EFNB2 gene significantly interacting with 7 environmental factors (P value = 5.03 × 10−10). BF analysis indicated that most environmental factors, except carbohydrate intake, contributed to the interaction of rs2391331 on BMI. Although the replication of the results in other cohorts is warranted, these findings proved the usefulness of Struct-LMM to identify the gene–environment interaction affecting disease.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Miguel Gozalo-Marcilla ◽  
Jaap Buntjer ◽  
Martin Johnsson ◽  
Lorena Batista ◽  
Federico Diez ◽  
...  

Abstract Background Backfat thickness is an important carcass composition trait for pork production and is commonly included in swine breeding programmes. In this paper, we report the results of a large genome-wide association study for backfat thickness using data from eight lines of diverse genetic backgrounds. Methods Data comprised 275,590 pigs from eight lines with diverse genetic backgrounds (breeds included Large White, Landrace, Pietrain, Hampshire, Duroc, and synthetic lines) genotyped and imputed for 71,324 single-nucleotide polymorphisms (SNPs). For each line, we estimated SNP associations using a univariate linear mixed model that accounted for genomic relationships. SNPs with significant associations were identified using a threshold of p < 10–6 and used to define genomic regions of interest. The proportion of genetic variance explained by a genomic region was estimated using a ridge regression model. Results We found significant associations with backfat thickness for 264 SNPs across 27 genomic regions. Six genomic regions were detected in three or more lines. The average estimate of the SNP-based heritability was 0.48, with estimates by line ranging from 0.30 to 0.58. The genomic regions jointly explained from 3.2 to 19.5% of the additive genetic variance of backfat thickness within a line. Individual genomic regions explained up to 8.0% of the additive genetic variance of backfat thickness within a line. Some of these 27 genomic regions also explained up to 1.6% of the additive genetic variance in lines for which the genomic region was not statistically significant. We identified 64 candidate genes with annotated functions that can be related to fat metabolism, including well-studied genes such as MC4R, IGF2, and LEPR, and more novel candidate genes such as DHCR7, FGF23, MEDAG, DGKI, and PTN. Conclusions Our results confirm the polygenic architecture of backfat thickness and the role of genes involved in energy homeostasis, adipogenesis, fatty acid metabolism, and insulin signalling pathways for fat deposition in pigs. The results also suggest that several less well-understood metabolic pathways contribute to backfat development, such as those of phosphate, calcium, and vitamin D homeostasis.


2017 ◽  
Author(s):  
Uche Godfrey Okeke ◽  
Deniz Akdemir ◽  
Ismail Rabbi ◽  
Peter Kulakow ◽  
Jean-Luc Jannink

List of abbreviationsGSGenomic SelectionBLUPBest Linear Unbiased PredictionEBVsEstimated Breeding ValuesEGVsEstimated genetic ValuesGEBVsGenomic Estimated Breeding ValuesSNPsSingle Nucleotide polymorphismsGxEGenotype-by-environment interactionsGxEGenotype-by-environment interactionsGxGGene-by-gene interactionsGxGxEGene-by-gene-by-environment interactionsuTUnivariate single environment one-step modeluEUnivariate multi environment one-step modelMTMulti-trait single environment one-step modelMEMultivariate single trait multi environment modelAbstractBackgroundGenomic selection (GS) promises to accelerate genetic gain in plant breeding programs especially for long cycle crops like cassava. To practically implement GS in cassava breeding, it is useful to evaluate different GS models and to develop suitable models for an optimized breeding pipeline.MethodsWe compared prediction accuracies from a single-trait (uT) and a multi-trait (MT) mixed model for single environment genetic evaluation (Scenario 1) while for multi-environment evaluation accounting for genotype-by-environment interaction (Scenario 2) we compared accuracies from a univariate (uE) and a multivariate (ME) multi-environment mixed model. We used sixteen years of data for six target cassava traits for these analyses. All models for Scenario 1 and Scenario 2 were based on the one-step approach. A 5-fold cross validation scheme with 10-repeat cycles were used to assess model prediction accuracies.ResultsIn Scenario 1, the MT models had higher prediction accuracies than the uT models for most traits and locations analyzed amounting to 32 percent better prediction accuracy on average. However for Scenario 2, we observed that the ME model had on average (across all locations and traits) 12 percent better predictive power than the uE model.ConclusionWe recommend the use of multivariate mixed models (MT and ME) for cassava genetic evaluation. These models may be useful for other plant species.


2003 ◽  
Vol 54 (12) ◽  
pp. 1395 ◽  
Author(s):  
A. P. Verbyla ◽  
P. J. Eckermann ◽  
R. Thompson ◽  
B. R. Cullis

A new approach for multi-environment quantitative trait locus (QTL) analysis based on an appropriate genetic model is presented. To accommodate a multi-environment analysis, the size of a QTL effect is assumed to be a random effect. The approach results in a multiplicative mixed model for QTL × environment interaction of the factor analytic type. The full genetic model may also include a factor analytic model for the residual genotype × environment interaction, whereas the environmental model for the non-genetic variation involves local, global, and extraneous variation. The approach is used to determine QTLs for yield in the Arapiles × Franklin doubled haploid population of the National Barley Molecular Marker Program. Analysis leads to the determination of 8 QTLs. Many of these QTLs are associated with other traits.


Author(s):  
Osval Antonio Montesinos López ◽  
Abelardo Montesinos López ◽  
Jose Crossa

AbstractThe linear mixed model framework is explained in detail in this chapter. We explore three methods of parameter estimation (maximum likelihood, EM algorithm, and REML) and illustrate how genomic-enabled predictions are performed under this framework. We illustrate the use of linear mixed models by using the predictor several components such as environments, genotypes, and genotype × environment interaction. Also, the linear mixed model is illustrated under a multi-trait framework that is important in the prediction performance when the degree of correlation between traits is moderate or large. We illustrate the use of single-trait and multi-trait linear mixed models and provide the R codes for performing the analyses.


Sign in / Sign up

Export Citation Format

Share Document