scholarly journals Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships

Mathematics ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 1100
Author(s):  
Luiz Paulo Fávero ◽  
Joseph F. Hair ◽  
Rafael de Freitas Souza ◽  
Matheus Albergaria ◽  
Talles V. Brugni

Our article explores an underused mathematical analytical methodology in the social sciences. In addition to describing the method and its advantages, we extend a previously reported application of mixed models in a well-known database about corruption in 149 countries. The dataset in the mentioned study included a reasonable amount of zeros (13.19%) in the outcome variable, which is typical of this type of research, as well as quite a bit of social sciences research. In our paper, present detailed guidelines regarding the estimation of models where the data for the outcome variable includes an excess number of zeros, and the dataset has a natural nested structure. We believe our research is not likely to reject the hypothesis favoring the adoption of mixed modeling and the inflation of zeros over the original simpler framework. Instead, our results demonstrate the importance of considering random effects at country levels and the zero-inflated nature of the outcome variable.

Data ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 6 ◽  
Author(s):  
Alberto Gianinetti

Germination data are discrete and binomial. Although analysis of variance (ANOVA) has long been used for the statistical analysis of these data, generalized linear mixed models (GzLMMs) provide a more consistent theoretical framework. GzLMMs are suitable for final germination percentages (FGP) as well as longitudinal studies of germination time-courses. Germination indices (i.e., single-value parameters summarizing the results of a germination assay by combining the level and rapidity of germination) and other data with a Gaussian error distribution can be analyzed too. There are, however, different kinds of GzLMMs: Conditional (i.e., random effects are modeled as deviations from the general intercept with a specific covariance structure), marginal (i.e., random effects are modeled solely as a variance/covariance structure of the error terms), and quasi-marginal (some random effects are modeled as deviations from the intercept and some are modeled as a covariance structure of the error terms) models can be applied to the same data. It is shown that: (a) For germination data, conditional, marginal, and quasi-marginal GzLMMs tend to converge to a similar inference; (b) conditional models are the first choice for FGP; (c) marginal or quasi-marginal models are more suited for longitudinal studies, although conditional models lead to a congruent inference; (d) in general, common random factors are better dealt with as random intercepts, whereas serial correlation is easier to model in terms of the covariance structure of the error terms; (e) germination indices are not binomial and can be easier to analyze with a marginal model; (f) in boundary conditions (when some means approach 0% or 100%), conditional models with an integral approximation of true likelihood are more appropriate; in non-boundary conditions, (g) germination data can be fitted with default pseudo-likelihood estimation techniques, on the basis of the SAS-based code templates provided here; (h) GzLMMs are remarkably good for the analysis of germination data except if some means are 0% or 100%. In this case, alternative statistical approaches may be used, such as survival analysis or linear mixed models (LMMs) with transformed data, unless an ad hoc data adjustment in estimates of limit means is considered, either experimentally or computationally. This review is intended as a basic tutorial for the application of GzLMMs, and is, therefore, of interest primarily to researchers in the agricultural sciences.


Author(s):  
Reinhard Schunck ◽  
Francisco Perales

One typically analyzes clustered data using random- or fixed-effects models. Fixed-effects models allow consistent estimation of the effects of level-one variables, even if there is unobserved heterogeneity at level two. However, these models cannot estimate the effects of level-two variables. Hybrid and correlated random-effects models are flexible modeling specifications that separate within-and between-cluster effects and allow for both consistent estimation of level-one effects and inclusion of level-two variables. In this article, we elaborate on the separation of within- and between-cluster effects in generalized linear mixed models. These models present a unifying framework for an entire class of models whose response variables follow a distribution from the exponential family (for example, linear, logit, probit, ordered probit and logit, Poisson, and negative binomial models). We introduce the user-written command xthybrid, a shell for the meglm command. xthybrid can fit a variety of hybrid and correlated random-effects models.


Sign in / Sign up

Export Citation Format

Share Document