lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals

AbstractBackgroundQuantitative trait locus (QTL) mapping in genetic data often involves analysis of correlated observations, which need to be accounted for to avoid false association signals. This is commonly performed by modeling such correlations as random effects in linear mixed models (LMMs). The R package lme4 is a well-established tool that implements major LMM features using sparse matrix methods; however, it is not fully adapted for QTL mapping association and linkage studies. In particular, two LMM features are lacking in the base version of lme4: the definition of random effects by custom covariance matrices; and parameter constraints, which are essential in advanced QTL models. Apart from applications in linkage studies of related individuals, such functionalities are of high interest for association studies in situations where multiple covariance matrices need to be modeled, a scenario not covered by many genome-wide association study (GWAS) software.ResultsTo address the aforementioned limitations, we developed a new R package lme4qtl as an extension of lme4. First, lme4qtl contributes new models for genetic studies within a single tool integrated with lme4 and its companion packages. Second, lme4qtl offers a flexible framework for scenarios with multiple levels of relatedness and becomes efficient when covariance matrices are sparse. We showed the value of our package using real family-based data in the Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) project.ConclusionsOur software lme4qtl enables QTL mapping models with a versatile structure of random effects and efficient computation for sparse covariances. lme4qtl is available at https://github.com/variani/lme4qtl.

Download Full-text

Basic Features of the Analysis of Germination Data with Generalized Linear Mixed Models

Data ◽

10.3390/data5010006 ◽

2020 ◽

Vol 5 (1) ◽

pp. 6 ◽

Cited By ~ 2

Author(s):

Alberto Gianinetti

Keyword(s):

Boundary Conditions ◽

Longitudinal Studies ◽

Random Effects ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Covariance Structure ◽

Conditional Models ◽

Error Terms ◽

Germination Indices

Germination data are discrete and binomial. Although analysis of variance (ANOVA) has long been used for the statistical analysis of these data, generalized linear mixed models (GzLMMs) provide a more consistent theoretical framework. GzLMMs are suitable for final germination percentages (FGP) as well as longitudinal studies of germination time-courses. Germination indices (i.e., single-value parameters summarizing the results of a germination assay by combining the level and rapidity of germination) and other data with a Gaussian error distribution can be analyzed too. There are, however, different kinds of GzLMMs: Conditional (i.e., random effects are modeled as deviations from the general intercept with a specific covariance structure), marginal (i.e., random effects are modeled solely as a variance/covariance structure of the error terms), and quasi-marginal (some random effects are modeled as deviations from the intercept and some are modeled as a covariance structure of the error terms) models can be applied to the same data. It is shown that: (a) For germination data, conditional, marginal, and quasi-marginal GzLMMs tend to converge to a similar inference; (b) conditional models are the first choice for FGP; (c) marginal or quasi-marginal models are more suited for longitudinal studies, although conditional models lead to a congruent inference; (d) in general, common random factors are better dealt with as random intercepts, whereas serial correlation is easier to model in terms of the covariance structure of the error terms; (e) germination indices are not binomial and can be easier to analyze with a marginal model; (f) in boundary conditions (when some means approach 0% or 100%), conditional models with an integral approximation of true likelihood are more appropriate; in non-boundary conditions, (g) germination data can be fitted with default pseudo-likelihood estimation techniques, on the basis of the SAS-based code templates provided here; (h) GzLMMs are remarkably good for the analysis of germination data except if some means are 0% or 100%. In this case, alternative statistical approaches may be used, such as survival analysis or linear mixed models (LMMs) with transformed data, unless an ad hoc data adjustment in estimates of limit means is considered, either experimentally or computationally. This review is intended as a basic tutorial for the application of GzLMMs, and is, therefore, of interest primarily to researchers in the agricultural sciences.

Download Full-text

Computationally feasible estimation of the covariance structure in generalized linear mixed models

Journal of Statistical Computation and Simulation ◽

10.1080/00949650701688547 ◽

2008 ◽

Vol 78 (12) ◽

pp. 1229-1239 ◽

Cited By ~ 1

Author(s):

MD. Moudud Alam ◽

Kenneth Carling

Keyword(s):

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Covariance Structure

Download Full-text

Modeling the longitudinal outcomes of congestive heart failure patients: A case study at Wachemo University Nigist Eleni Mohammed Memorial Referral Hospital

10.21203/rs.3.rs-601836/v1 ◽

2021 ◽

Author(s):

Mohammed Sultan ◽

Ritbano Ahmed

Keyword(s):

Heart Failure ◽

Congestive Heart Failure ◽

Longitudinal Data ◽

Mixed Models ◽

Mixed Model ◽

Linear Mixed Model ◽

Linear Mixed Models ◽

Covariance Structure ◽

Repeated Measurements ◽

Within Subjects

Abstract The linear mixed model is one of the common models used to analyze the longitudinal data;it may comprise of separate (Univariate), joint Bivariate, and joint Multivariate linear mixed model, which is predicted on the number of response variables incorporated in the analysis. Adjusting for correlation matrix and covariance matrix between and within subjects is one reason why modern longitudinal data analysis techniques are deemed more appropriate than some of the previous methods of analysis. Some studies assume that the correlation between observation is zero. However, it is unlikely that repeated measurements on the same individual Will actually be independent. To that end, comparing the different linear mixed models identifying the appropriate model demonstrates that the evolution of patients with congestive heart failure is necessary.In this study the separate, bivariate, and multivariate linear mixed models were compared with different covariance and correlation structures. Finally, a multivariate linear mixed model with autoregressive order one correlation structure and unstructured covariance structure for random effects, to consider within and between patient's variations, was considered as a best model to depict the evolution of patients with congestive heart failure.

Download Full-text

Misspecification of the covariance structure in generalized linear mixed models

Statistical Methods in Medical Research ◽

10.1177/0962280212462859 ◽

2012 ◽

Vol 25 (2) ◽

pp. 630-643 ◽

Cited By ~ 6

Author(s):

M Chavance ◽

S Escolano

Keyword(s):

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Covariance Structure

Download Full-text

Restricted likelihood ratio testing in linear mixed models with general error covariance structure

Electronic Journal of Statistics ◽

10.1214/11-ejs654 ◽

2011 ◽

Vol 5 (0) ◽

pp. 1718-1734 ◽

Cited By ~ 4

Author(s):

Andrea Wiencierz ◽

Sonja Greven ◽

Helmut Küchenhoff

Keyword(s):

Likelihood Ratio ◽

Mixed Models ◽

Linear Mixed Models ◽

Covariance Structure ◽

Error Covariance ◽

Restricted Likelihood ◽

Likelihood Ratio Testing ◽

Error Covariance Structure ◽

General Error

Download Full-text

Yield response of winter wheat cultivars to environments modeled by different variance-covariance structures in linear mixed models

Spanish Journal of Agricultural Research ◽

10.5424/sjar/2016142-8737 ◽

2016 ◽

Vol 14 (2) ◽

pp. e0703 ◽

Cited By ~ 1

Author(s):

Marcin Studnicki ◽

Wiesław Mądry ◽

Kinga Noras ◽

Elżbieta Wójcik-Gront ◽

Edward Gacek

Keyword(s):

Winter Wheat ◽

Mixed Models ◽

Linear Mixed Models ◽

Covariance Structure ◽

Covariance Structures ◽

Yield Response ◽

Analytic Structure ◽

Wheat Cultivars ◽

Good Tool ◽

Winter Wheat Cultivars

The main objectives of multi-environmental trials (METs) are to assess cultivar adaptation patterns under different environmental conditions and to investigate genotype by environment (G×E) interactions. Linear mixed models (LMMs) with more complex variance-covariance structures have become recognized and widely used for analyzing METs data. Best practice in METs analysis is to carry out a comparison of competing models with different variance-covariance structures. Improperly chosen variance-covariance structures may lead to biased estimation of means resulting in incorrect conclusions. In this work we focused on adaptive response of cultivars on the environments modeled by the LMMs with different variance-covariance structures. We identified possible limitations of inference when using an inadequate variance-covariance structure. In the presented study we used the dataset on grain yield for 63 winter wheat cultivars, evaluated across 18 locations, during three growing seasons (2008/2009-2010/2011) from the Polish Post-registration Variety Testing System. For the evaluation of variance-covariance structures and the description of cultivars adaptation to environments, we calculated adjusted means for the combination of cultivar and location in models with different variance-covariance structures. We concluded that in order to fully describe cultivars adaptive patterns modelers should use the unrestricted variance-covariance structure. The restricted compound symmetry structure may interfere with proper interpretation of cultivars adaptive patterns. We found, that the factor-analytic structure is also a good tool to describe cultivars reaction on environments, and it can be successfully used in METs data after determining the optimal component number for each dataset.

Download Full-text

REHE: Fast Variance Components Estimation for Linear Mixed Models

10.1101/2021.02.03.429643 ◽

2021 ◽

Author(s):

Kun Yue ◽

Jing Ma ◽

Timothy Thornton ◽

Ali Shojaie

Keyword(s):

Mixed Models ◽

Variance Components ◽

Linear Mixed Models ◽

Real Data ◽

Small Samples ◽

Simulation Studies ◽

Genetic Studies ◽

Variance Components Estimation ◽

Comparable Accuracy ◽

Estimation Of Variance

AbstractLinear mixed models are widely used in ecological and biological applications, especially in genetic studies. Reliable estimation of variance components is crucial for using linear mixed models. However, standard methods, such as the restricted maximum likelihood (REML), are computationally inefficient and may be unstable with small samples. Other commonly used methods, such as the Haseman-Elston (HE) regression, may yield negative estimates of variances. Utilizing regularized estimation strategies, we propose the restricted Haseman-Elston (REHE) regression and REHE with resampling (reREHE) estimators, along with an inference framework for REHE, as fast and robust alternatives that provide non-negative estimates with comparable accuracy to REML. The merits of REHE are illustrated using real data and benchmark simulation studies.

Download Full-text

Estimating the effective sample size in association studies of quantitative traits

10.1101/2019.12.15.877217 ◽

2019 ◽

Author(s):

Andrey Ziyatdinov ◽

Jihye Kim ◽

Dmitry Prokopenko ◽

Florian Privé ◽

Fabien Laporte ◽

...

Keyword(s):

Sample Size ◽

Mixed Models ◽

Quantitative Traits ◽

Association Studies ◽

Linear Mixed Models ◽

Analytical Form ◽

Effective Sample Size ◽

Summary Statistics ◽

Related Individuals ◽

Family Based

AbstractThe effective sample size (ESS) is a quantity estimated in genome-wide association studies (GWAS) with related individuals and/or linear mixed models used in analysis. ESS originally measured relative power in family-based GWAS and has recently become important for correcting GWAS summary statistics in post-GWAS analyses. However, existing ESS approaches have been overlooked and based on empirical estimation. This work presents an analytical form of ESS in mixed-model GWAS of quantitative traits, which is derived using the expectation of quadratic form and validated in extensive simulations. We illustrate the performance and relevance of our ESS estimator in common GWAS scenarios and analytically show that (i) family-based studies are consistently underpowered compared to studies of unrelated individuals of the same sample size; (ii) conditioning on polygenic genetic effect by linear mixed models boosts power; and (iii) power of detecting gene-environment interaction can be substantially gained or lost in family-based designs depending on exposure distribution. We further analyze UK Biobank dataset in two samples of 336,347 unrelated and 68,910 related individuals. Analysis in unrelated individuals reveals a high accuracy of our ESS estimator compared to the existing empirical approach; and analysis of related individuals suggests that the loss in effective sample size due to relatedness is at most 0.94x. Overall, we provide an analytical form of ESS for guiding GWAS designs and processing summary statistics in post-GWAS analyses.

Download Full-text

Exact distributions of statistics for making inferences on mixed models under the default covariance structure

Journal of Statistical Distributions and Applications ◽

10.1186/s40488-020-00105-w ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Samaradasa Weerahandi ◽

Ching-Ray Yu

Keyword(s):

Mixed Models ◽

Covariance Structure ◽

Exact Distributions

Download Full-text