scholarly journals Including random effects in statistical models in ecology: fewer than five levels?

2021 ◽  
Author(s):  
Dylan G.E. Gomes

AbstractAs generalized linear mixed-effects models (GLMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of a random effect. Having such few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one’s ability to estimate fixed effects terms – which are often of primary interest in ecology. Here, I simulate ecological datasets and fit simple models and show that having too few random effects terms does not influence the parameter estimates or uncertainty around those estimates for fixed effects terms. Thus, it should be acceptable to use fewer levels of random effects if one is not interested in making inference about the random effects terms (i.e. they are ‘nuisance’ parameters used to group non-independent data). I also use simulations to assess the potential for pseudoreplication in (generalized) linear models (LMs), when random effects are explicitly ignored and find that LMs do not show increased type-I errors compared to their mixed-effects model counterparts. Instead, LM uncertainty (and p values) appears to be more conservative in an analysis with a real ecological dataset presented here. These results challenge the view that it is never appropriate to model random effects terms with fewer than five levels – specifically when inference is not being made for the random effects, but suggest that in simple cases LMs might be robust to ignored random effects terms. Given the widespread accessibility of GLMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences of both violating and blindly following simple guidelines.

2019 ◽  
Author(s):  
Joakim Nyberg ◽  
E. Niclas Jonsson ◽  
Mats O. Karlsson ◽  
Jonas Häggström ◽  

SummaryTwo full model approaches was compared with respect to their ability to handle missing covariate information. The reference data analysis approach was the full model method in which the covariate effects are estimated conventionally using fixed effects, and missing covariate data is imputed with the median of the non-missing covariate information. This approach was compared to a novel full model method which treats the covariate data as observed data and estimates the covariates as random effects. A consequence of this way of handling the covariates is that no covariate imputation is required and that any missingness in the covariates is handled implicitly. The comparison between the two analysis methods was based on simulated data from a model of height for age z-scores as a function of age. Data was simulated with increasing degrees of randomly missing covariate information (0-90%) and analyzed using each of the two analysis approaches. Not surprisingly, the precision in the parameter estimates from both methods decreased with increasing degrees of missing covariate information. However, while the bias in the parameter estimates increased in a similar fashion for the reference method, the full random effects approach provided unbiased estimates for all degrees of covariate missingness.


2019 ◽  
Author(s):  
Michael Seedorff ◽  
Jacob Oleson ◽  
Bob McMurray

Mixed effects models have become a critical tool in all areas of psychology and allied fields. This is due to their ability to account for multiple random factors, and their ability to handle proportional data in repeated measures designs. While substantial research has addressed how to structure fixed effects in such models there is less understanding of appropriate random effects structures. Recent work with linear models suggests the choice of random effects structures affects Type I error in such models (Barr, Levy, Scheepers, & Tily, 2013; Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017). This has not been examined for between subject effects, which are crucial for many areas of psychology, nor has this been examined in logistic models. Moreover, mixed models expose a number of researcher degrees of freedom: the decision to aggregate data or not, the manner in which degrees of freedom are computed, and what to do when models do not converge. However, the implications of these choices for power and Type I error are not well known. To address these issues, we conducted a series of Monte Carlo simulations which examined linear and logistic models in a mixed design with crossed random effects. These suggest that a consideration of the entire space of possible models using simple information criteria such as AIC leads to optimal power while holding Type I error constant. They also suggest data aggregation and the d.f, computation have minimal effects on Type I Error and Power, and they suggest appropriate approaches for dealing with non-convergence.


2012 ◽  
Vol 69 (11) ◽  
pp. 1881-1893 ◽  
Author(s):  
Verena M. Trenkel ◽  
Mark V. Bravington ◽  
Pascal Lorance

Catch curves are widely used to estimate total mortality for exploited marine populations. The usual population dynamics model assumes constant recruitment across years and constant total mortality. We extend this to include annual recruitment and annual total mortality. Recruitment is treated as an uncorrelated random effect, while total mortality is modelled by a random walk. Data requirements are minimal as only proportions-at-age and total catches are needed. We obtain the effective sample size for aggregated proportion-at-age data based on fitting Dirichlet-multinomial distributions to the raw sampling data. Parameter estimation is carried out by approximate likelihood. We use simulations to study parameter estimability and estimation bias of four model versions, including models treating mortality as fixed effects and misspecified models. All model versions were, in general, estimable, though for certain parameter values or replicate runs they were not. Relative estimation bias of final year total mortalities and depletion rates were lower for the proposed random effects model compared with the fixed effects version for total mortality. The model is demonstrated for the case of blue ling (Molva dypterygia) to the west of the British Isles for the period 1988 to 2011.


2017 ◽  
Author(s):  
Mirko Thalmann ◽  
Marcel Niklaus ◽  
Klaus Oberauer

Using mixed-effects models and Bayesian statistics has been advocated by statisticians in recent years. Mixed-effects models allow researchers to adequately account for the structure in the data. Bayesian statistics – in contrast to frequentist statistics – can state the evidence in favor of or against an effect of interest. For frequentist statistical methods, it is known that mixed models can lead to serious over-estimation of evidence in favor of an effect (i.e., inflated Type-I error rate) when models fail to include individual differences in the effect sizes of predictors ("random slopes") that are actually present in the data. Here, we show through simulation that the same problem exists for Bayesian mixed models. Yet, at present there is no easy-to-use application that allows for the estimation of Bayes Factors for mixed models with random slopes on continuous predictors. Here, we close this gap by introducing a new R package called BayesRS. We tested its functionality in four simulation studies. They show that BayesRS offers a reliable and valid tool to compute Bayes Factors. BayesRS also allows users to account for correlations between random effects. In a fifth simulation study we show, however, that doing so leads to slight underestimation of the evidence in favor of an actually present effect. We only recommend modeling correlations between random effects when they are of primary interest and when sample size is large enough. BayesRS is available under https://cran.r-project.org/web/packages/BayesRS/, R code for all simulations is available under https://osf.io/nse5x/?view_only=b9a7caccd26a4764a084de3b8d459388


Stats ◽  
2018 ◽  
Vol 1 (1) ◽  
pp. 48-76
Author(s):  
Freddy Hernández ◽  
Viviana Giampaoli

Mixed models are useful tools for analyzing clustered and longitudinal data. These models assume that random effects are normally distributed. However, this may be unrealistic or restrictive when representing information of the data. Several papers have been published to quantify the impacts of misspecification of the shape of the random effects in mixed models. Notably, these studies primarily concentrated their efforts on models with response variables that have normal, logistic and Poisson distributions, and the results were not conclusive. As such, we investigated the misspecification of the shape of the random effects in a Weibull regression mixed model with random intercepts in the two parameters of the Weibull distribution. Through an extensive simulation study considering six random effect distributions and assuming normality for the random effects in the estimation procedure, we found an impact of misspecification on the estimations of the fixed effects associated with the second parameter σ of the Weibull distribution. Additionally, the variance components of the model were also affected by the misspecification.


2021 ◽  
Vol 99 (Supplement_1) ◽  
pp. 158-159
Author(s):  
Chad A Russell ◽  
E J Pollak ◽  
Matthew L Spangler

Abstract The commercial beef cattle industry relies heavily on the use of natural service sires. Either due to the size of breeding herds or to safe-guard against injury during the breeding season, multiple-sire breeding pastures are utilized. Although each bull might be given an equal opportunity to produce offspring, evidence suggest that there is substantial variation in the number of calves sired by each bull in a breeding pasture. DNA-based paternity assignment enables correct assignment of calves to their respective sires in multi-sire pastures and presents an opportunity to investigate the degree to which this trait complex is under genetic control. Field data from a large commercial ranch were used to estimate genetic parameters for calf count (CC; n=623) and yearling scrotal circumference (SC; n=1962) using univariate and bivariate animal models. Average CC and SC were 12.1±11.1 calves and 35.4±2.30 cm, respectively. Average number breeding seasons per bull and bulls per contemporary group were 1.40 and 24.9, respectively. The model for CC included fixed effects of age during the breeding season (in years) and contemporary group (concatenation of breeding pasture and year). Random effects included additive genetic and permanent environmental effects, and a residual. The model for SC included fixed effects of age (in days) and contemporary group (concatenation of month and year of measurement). Random effects included an additive genetic effect and a residual. Univariate model heritability estimates for CC and SC were 0.237±0.156 and 0.456±0.072, respectively. Similarly, the bivariate model resulted in heritability estimates for CC and SC of 0.240±0.155 and 0.461±0.072, respectively. Repeatability estimates for CC from univariate and bivariate models were 0.517±0.054 and 0.518±0.053, respectively. The estimate of genetic correlation between CC and SC was 0.270±0.220. Parameter estimates suggest that both CC and SC would respond favorably to selection and that CC is moderately repeatable.


2000 ◽  
Vol 25 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Lynn Friedman

In meta-analyses, groups of study effect sizes often do not fit the model of a single population with only sampling, or estimation, variance differentiating the estimates. If the effect sizes in a group of studies are not homogeneous, a random effects model should be calculated, and a variance component for the random effect estimated. This estimate can be made in several ways, but two closed form estimators are in common use. The comparative efficiency of the two is the focus of this report. We show here that these estimators vary in relative efficiency with the actual size of the random effects model variance component. The latter depends on the study effect sizes. The closed form estimators are linear functions of quadratic forms whose moments can be calculated according to a well-known theorem in linear models. We use this theorem to derive the variances of the estimators, and show that one of them is smaller when the random effects model variance is near zero; however, the variance of the other is smaller when the model variance is larger. This leads to conclusions about their relative efficiency.


2020 ◽  
Author(s):  
Brandon LeBeau

<p>The linear mixed model is a commonly used model for longitudinal or nested data due to its ability to account for the dependency of nested data. Researchers typically rely on the random effects to adequately account for the dependency due to correlated data, however serial correlation can also be used. If the random effect structure is misspecified (perhaps due to convergence problems), can the addition of serial correlation overcome this misspecification and allow for unbiased estimation and accurate inferences? This study explored this question with a simulation. Simulation results show that the fixed effects are unbiased, however inflation of the empirical type I error rate occurs when a random effect is missing from the model. Implications for applied researchers are discussed.</p>


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 2045-2045
Author(s):  
Giuseppe Lombardi ◽  
Paola Del Bianco ◽  
Alba Ariela Brandes ◽  
Marica Eoli ◽  
Roberta Ruda ◽  
...  

2045 Background: REGOMA trial showed that regorafenib (REG) significantly improved OS and PFS in relapsed glioblastoma (GBM) patients (pts) with respect to lomustine (LOM). REG showed a different toxicity profile compared to LOM. Here, we report final results of the HRQoL assessment, a secondary end point. Methods: HRQoL was measured using the European Organization for Research and Treatment of Cancer (EORTC) core questionnaire (QLQ-C30) and brain module (QLQ-BN20) administered before any MRI assessments, every 8 weeks (+/- 2 weeks) until disease progression. To evaluate treatment impact on HRQoL, questionnaires at progression were excluded. Mixed-effect linear models were fitted for each of the HRQOL domain to examine the change over progression-free time within and between arms. The models included the time of questionnaire assessment, the treatment group and their interaction, as fixed effects, and a compound symmetry covariance structure for the random effects. Differences of at least 10 points were classified as a clinically meaningful change. To correct for multiple comparisons and to avoid type I error, the level of significance was set at P = 0.01 (2-sided). Results: Of 119 randomized pts, 117 participated in the HRQoL evaluation, and 114 had a baseline assessment (n = 56 REG; n = 58 LOM). No statistically significant differences were observed in any generic or cancer specific domain during treatment in the REG and LOM arms, or between the two arms, except for the appetite loss scale which was significantly worse in PTS treated with REG (Global mean 14.7 (SD = 28.6) vs 7.6 (SD = 16.0); p = 0.0081). The proportion of pts with a clinically meaningful worsening for appetite loss was not statistically different between the two arms (9 out of 24 and 0 out of 13 in the REG and LOM arm, respectively; p = 0.0146). Conclusions: In the REGOMA trial, HRQoL did not change during REG treatment. Pts treated with REG and LOM reported no significant difference in HRQoL. Clinical trial information: NCT02926222. [Table: see text]


Sign in / Sign up

Export Citation Format

Share Document