scholarly journals Maybe maximal: Good enough mixed models optimize power while controlling Type I error

Author(s):  
Michael Seedorff ◽  
Jacob Oleson ◽  
Bob McMurray

Mixed effects models have become a critical tool in all areas of psychology and allied fields. This is due to their ability to account for multiple random factors, and their ability to handle proportional data in repeated measures designs. While substantial research has addressed how to structure fixed effects in such models there is less understanding of appropriate random effects structures. Recent work with linear models suggests the choice of random effects structures affects Type I error in such models (Barr, Levy, Scheepers, & Tily, 2013; Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017). This has not been examined for between subject effects, which are crucial for many areas of psychology, nor has this been examined in logistic models. Moreover, mixed models expose a number of researcher degrees of freedom: the decision to aggregate data or not, the manner in which degrees of freedom are computed, and what to do when models do not converge. However, the implications of these choices for power and Type I error are not well known. To address these issues, we conducted a series of Monte Carlo simulations which examined linear and logistic models in a mixed design with crossed random effects. These suggest that a consideration of the entire space of possible models using simple information criteria such as AIC leads to optimal power while holding Type I error constant. They also suggest data aggregation and the d.f, computation have minimal effects on Type I Error and Power, and they suggest appropriate approaches for dealing with non-convergence.

Author(s):  
Aaron T. L. Lun ◽  
Gordon K. Smyth

AbstractRNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in underestimation of the genewise variances and loss of type I error control. This article proposes a formula for the reduced residual d.f. that restores error control in simulated RNA-seq data and improves detection of DE genes in a real data analysis. The new approach is implemented in the quasi-likelihood framework of the edgeR software package. The results of this article also apply to RNA-seq analyses that apply linear models to log-transformed counts, such as those in the limma software package, and more generally to any count-based GLM where exactly zero fitted values are possible.


2021 ◽  
Author(s):  
Dylan G.E. Gomes

AbstractAs generalized linear mixed-effects models (GLMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of a random effect. Having such few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one’s ability to estimate fixed effects terms – which are often of primary interest in ecology. Here, I simulate ecological datasets and fit simple models and show that having too few random effects terms does not influence the parameter estimates or uncertainty around those estimates for fixed effects terms. Thus, it should be acceptable to use fewer levels of random effects if one is not interested in making inference about the random effects terms (i.e. they are ‘nuisance’ parameters used to group non-independent data). I also use simulations to assess the potential for pseudoreplication in (generalized) linear models (LMs), when random effects are explicitly ignored and find that LMs do not show increased type-I errors compared to their mixed-effects model counterparts. Instead, LM uncertainty (and p values) appears to be more conservative in an analysis with a real ecological dataset presented here. These results challenge the view that it is never appropriate to model random effects terms with fewer than five levels – specifically when inference is not being made for the random effects, but suggest that in simple cases LMs might be robust to ignored random effects terms. Given the widespread accessibility of GLMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences of both violating and blindly following simple guidelines.


1993 ◽  
Vol 18 (4) ◽  
pp. 305-319 ◽  
Author(s):  
H. J. Keselman ◽  
Keumhee Chough Carriere ◽  
Lisa M. Lix

For balanced designs, degrees of freedom-adjusted univariate F tests or multivariate test statistics can be used to obtain a robust test of repeated measures main and interaction effect hypotheses even when the assumption of equality of the covariance matrices is not satisfied. For unbalanced designs, however, covariance heterogeneity can seriously distort the rates of Type I error of either of these approaches. This article shows how a multivariate approximate degrees of freedom procedure based on Welch (1947 , 1951 )- James (1951 , 1954) , as simplified by Johansen (1980) , can be applied to the analysis of unbalanced repeated measures designs without assuming covariance homogeneity. Through Monte Carlo methods, we demonstrate that this approach provides a robust test of the repeated measures main effect hypothesis even when the data are obtained from a skewed distribution. The Welch-James approach also provides a robust test of the interaction effect, provided that the smallest of the unequal group sizes is five to six times the number of repeated measurements minus one or provided that a reduced level of significance is employed.


2019 ◽  
Vol 3 ◽  
Author(s):  
Nicolas Haverkamp ◽  
André Beauducel

  To derive recommendations on how to analyze longitudinal data, we examined Type I error rates of Multilevel Linear Models (MLM) and repeated measures Analysis of Variance (rANOVA) using SAS and SPSS. We performed a simulation with the following specifications: To explore the effects of high numbers of measurement occasions and small sample sizes on Type I error, measurement occasions of m = 9 and 12 were investigated as well as sample sizes of n = 15, 20, 25 and 30. Effects of non-sphericity in the population on Type I error were also inspected: 5,000 random samples were drawn from two populations containing neither a within-subject nor a between-group effect. They were analyzed including the most common options to correct rANOVA and MLM-results: The Huynh-Feldt-correction for rANOVA (rANOVA-HF) and the Kenward-Roger-correction for MLM (MLM-KR), which could help to correct progressive bias of MLM with an unstructured covariance matrix (MLM-UN). Moreover, uncorrected rANOVA and MLM assuming a compound symmetry covariance structure (MLM-CS) were also taken into account. The results showed a progressive bias for MLM-UN for small samples which was stronger in SPSS than in SAS. Moreover, an appropriate bias correction for Type I error via rANOVA-HF and an insufficient correction by MLM-UN-KR for n < 30 were found. These findings suggest MLM-CS or rANOVA if sphericity holds and a correction of a violation via rANOVA-HF. If an analysis requires MLM, SPSS yields more accurate Type I error rates for MLM-CS and SAS yields more accurate Type I error rates for MLM-UN.


1995 ◽  
Vol 20 (1) ◽  
pp. 83-99 ◽  
Author(s):  
H. J. Keselman ◽  
Lisa M. Lix

Approximate degrees of freedom omnibus and pairwise test statistics of Johansen (1980) and Keselman, Keselman, and Shaffer (1991) , respectively, were used with numerous stepwise multiple comparison procedures (MCPs) to perform pairwise contrasts on repeated measures means. The MCPs were compared for their overall familywise rates of Type I error and for their sensitivity to detect true pairwise differences among means when multisample sphericity and multivariate normality assumptions were not satisfied. Results indicated that multiple range procedures which were modified according to the method described by Duncan (1957) were always robust with respect to Type I errors and were at least as powerful as the unmodified range procedures, and could result in increases in power as large as 22%. Overall, the Welsch (1977a) step-up, Peritz-Duncan ( Peritz, 1970 ), and Ryan-Welsch-Duncan ( Ryan, 1960 ; Welsch, 1977a ) multiple range procedures were found to be most powerful.


2019 ◽  
Vol 21 (Supplement_3) ◽  
pp. iii14-iii14
Author(s):  
G Lombardi ◽  
P Del Bianco ◽  
A Brandes ◽  
M Eoli ◽  
R Rudà ◽  
...  

Abstract BACKGROUND REGOMA trial showed that regorafenib (REG) significantly improved OS and PFS in patients (pts) with relapsed GBM with respect to lomustine (LOM). REG showed a different toxicity profile compared to LOM. Here, we report final results of the HRQoL assessment, a secondary end point. MATERIAL AND METHODS HRQoL was measured using the European Organization for Research and Treatment of Cancer (EORTC) core questionnaire (QLQ-C30) and brain module (QLQ-BN20) administered before any MRI assessments, every 8 weeks (+/- 2 weeks) until disease progression. To evaluate treatment impact on HRQoL, questionnaires at progression were excluded. Mixed-effect linear models were fitted for each of the HRQOL domain to examine the change over progression-free time within and between arms. The models included the time of questionnaire assessment, the treatment group and their interaction, as fixed effects, and a compound symmetry covariance structure for the random effects. Differences of at least 10 points were classified as a clinically meaningful change. To correct for multiple comparisons and to avoid type I error, the level of significance was set at P=0.01 (2-sided). RESULTS Of 119 randomized pts, 117 partecipated in the HRQoL evaluation, and 114 had a baseline assessment (n=56 REG; n=58 LOM). No statistically significant differences were observed in any generic or cancer specific domain during treatment in the REG and LOM arms, or between the two arms, except for the appetite loss scale which was significantly worse in PTS treated with REG (Global mean 14.7 (SD=28.6) vs 7.6 (SD=16.0); p=0.0081). The rate of pts with a clinically meaningful worsening for appetite loss was not statistically different between the two arms (9 out of 24 and 0 out of 13 in the REG and LOM arm, respectively;p=0.02). CONCLUSION In the REGOMA trial, HRQoL did not change during regorafenib treatment. Pts treated with regorafenib and lomustine reported no significant difference in HRQoL.


2006 ◽  
Vol 131 (2) ◽  
pp. 201-208
Author(s):  
Dawn M. VanLeeuwen ◽  
Rolston St. Hilaire ◽  
Emad Y. Bsoul

Statistical analysis of data from repeated measures experiments with missing factor combinations encounters multiple complications. Data from asynchronous cyclic drought experiments incorporate unequal numbers of drought cycles for different sources and provide an example of data both with repeated measures and missing factor combinations. Repeated measures data are problematic because typical analyses with PROC GLM do not allow the researcher to compare candidate covariance structures. In contrast, PROC MIXED allows comparison of covariance structures and several options for modeling serial correlation and variance heterogeneity. When there are missing factor combinations, the cross-classified model traditionally used for synchronized trials is inappropriate. For asynchronous data, some least squares means estimates for treatment and source main effects, and treatment by source interaction effects are inestimable. The objectives of this paper were to use an asynchronous drought cycle data set to 1) model an appropriate covariance structure using mixed models, and 2) compare the cross-classified fixed effects model to drought cycle nested within source models. We used a data set of midday water potential measurements taken during a cyclic drought study of 15 half-siblings of bigtooth maples (Acer grandidentatum Nutt.) indigenous to Arizona, New Mexico, Texas, and Utah. Data were analyzed using SAS PROC MIXED software. Information criteria lead to the selection of a model incorporating separate compound symmetric covariance structures for the two irrigation treatment groups. When using nested models in the fixed portion of the model, there are no missing factors because drought cycle is not treated as a crossed experimental factor. Nested models provided meaningful F tests and estimated all the least squares means, but the cross-classified model did not. Furthermore, the nested models adequately compared the treatment effect of sources subjected to asynchronous drought events. We conclude that researchers wishing to analyze data from asynchronous drought trials must consider using mixed models with nested fixed effects.


2020 ◽  
Vol 13 (2) ◽  
pp. 206-217
Author(s):  
Indah Rini Setyowati ◽  
Khairil Anwar Notodiputro ◽  
Anang Kurnia

In linear models, panel data often violates the assumption that the error terms should be independent. As a result, the estimated variance is usually large and the standard inferential methods are not appropriate. The previous research developed an inference method to solve this problem using a variance estimator namely the Heteroskedasticity Autocorrelation Consistent of the Cross-Section Averages (HACSC), with some improvements. The test statistic of this method converges to the fixed-b asymptotic distribution. In this paper, the performance of the proposed inferential method is evaluated by means of simulation and compared with the standard method using plm package in R. Several comparisons regarding the Type I Error of these two methods have been carried out. The results showed that the statistical inference based on fixed-b asymptotic distribution out-perform the standard method, especially for the panel data with small number of individual and time dimension.


Sign in / Sign up

Export Citation Format

Share Document