The Runs Test for Autocorrelated Errors: Unacceptable Properties

1996 ◽  
Vol 21 (4) ◽  
pp. 390-404 ◽  
Author(s):  
Bradley E. Huitema ◽  
Joseph W. McKean ◽  
Jinsheng Zhao

The runs test is frequently recommended as a method of testing for nonindependent errors in time-series regression models. A Monte Carlo investigation was carried out to evaluate the empirical properties of this test using (a) several intervention and nonintervention regression models, (b) sample sizes ranging from 12 to 100, (c) three levels of α, (d) directional and nondirectional tests, and (e) 19 levels of autocorrelation among the errors. The results indicate that the runs test yields markedly asymmetrical error rates in the two tails and that neither directional nor nondirectional tests are satisfactory with respect to Type I error, even when the ratio of degrees of freedom to sample size is as high as .98. It is recommended that the test generally not be employed in evaluating the independence of the errors in time-series regression models.

2017 ◽  
Vol 284 (1851) ◽  
pp. 20161850 ◽  
Author(s):  
Nick Colegrave ◽  
Graeme D. Ruxton

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.


1994 ◽  
Vol 19 (3) ◽  
pp. 275-291 ◽  
Author(s):  
James Algina ◽  
T. C. Oshima ◽  
Wen-Ying Lin

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.


2020 ◽  
Author(s):  
Corey Peltier ◽  
Reem Muharib ◽  
April Haas ◽  
Art Dowdy

Single-case research designs (SCRDs) are used to evaluate functional relations between an independent variable and dependent variable(s). When analyzing data related to autism spectrum disorder, SCRDs are frequently used. Namely, SCRDs allow for empirical evidence in support of practices that improve socially significant outcomes for individuals diagnosed with ASD. To determine a functional relation in SCRDs, a time-series graph is constructed and visual analysts evaluate data patterns. Preliminary evidence suggest that the approach used to scale the ordinate (i.e., y-axis) and the proportions of the x-axis length to y-axis height (i.e., data points per x- to y-axis ratio) impact visual analysts’ decisions regarding a functional relation and the magnitude of treatment effect, resulting in an increased likelihood of a Type I errors. The purpose for this systematic review was to evaluate all time-series graphs published in the last decade (i.e., 2010-2020) in four premier journals in the field of ASD: Journal of Autism and Developmental Disorders, Research in Autism Spectrum Disorders, Autism, and Focus on Autism and Other Developmental Disabilities. The systematic search yielded 348 articles including 2,675 graphs. We identified large variation across and within types of SCRDs for the standardized X:Y and DPPXYR. In addition, 73% of graphs were below a DPPXYR of 0.14, providing evidence of the Type I error rate. A majority of graphs used an appropriate ordinate scaling method that would not increase Type I error rates. Implications for future research and practice are provided.


1991 ◽  
Vol 16 (1) ◽  
pp. 53-76
Author(s):  
Lynne K. Edwards

When repeated observations are taken at equal time intervals, a simple form of a stationary time series structure may be fitted to the observations. Wallenstein and Fleiss (1979) have shown that the degrees-of-freedom correction factor for time effects has a higher lowerbound for data with a serial correlation pattern (or a simplex pattern) than for data without such a structure. The reanalysis of the example data found in Hearne, Clark, and Hatch (1983) indicated that the correction factor from a patterned matrix could be smaller than the counterpart without fitting a simplex pattern. First, an example from education was used to illustrate the computational steps in obtaining these two correction factors. Second, a simulation study was conducted to determine the conditions under which fitting a simplex pattern would be advantageous over not assuming such a pattern. Fitting a serial correlation pattern did not always produce more powerful tests of time effects than not assuming such a pattern. This was particularly true when correlations were high (ρ > .50). Furthermore, it inflated Type I error rates when the simplex shypothesis was not warranted. Indiscriminately fitting a serial correlation pattern should be discouraged.


2019 ◽  
Vol 97 (Supplement_2) ◽  
pp. 235-236
Author(s):  
Hilda Calderon Cartagena ◽  
Christopher I Vahl ◽  
Steve S Dritz

Abstract It is not unusual to come across randomized complete block designs (RCBD) replicated over a small number of sites in swine nutrition trials. For example, pens could be blocked by location or by initial body weight within three rooms or barns. One possibility is to analyze this design with the assumption of no treatment by site interaction which implies treatment differences are similar across all sites. This assumption might not always seem reasonable and site by treatment interaction could be included in the analysis to account for these differences should they exist. However, the site by treatment mean square becomes the error term for evaluating treatment. The objective of this study was to provide a recommendation of a practical strategy based on Type I error rates estimated from a simulation study. Scenarios with and without site by treatment interaction were considered with three sites and equal means across four treatments. The variance component for the error was set to 1 and the rest were either selected to be equal (σ2s = σ2b = σ2s*t =1) or one of them was set to 10. For the scenarios with no site by treatment interaction, σ2s*t = 0, for a total of 7 scenarios. Each scenario was simulated 10,000 times. For each simulation, both strategies were applied. The Kenward-Rodger approximation (KR) to the denominator degrees of freedom was also considered. Type I errors were estimated as the proportion of simulations with a significant treatment effect with α = 0.05. Overall, there was no evidence Type I error rates were inflated when the site by treatment interaction was omitted, even when σ2s*t = 10. The KR had no effect. In contrast, including the interaction term leads to a highly conservative Type I error rate far below the 5% level which results in a reduction of power; however, using KR mitigated the conservativeness.


Author(s):  
Steven T. Garren ◽  
Kate McGann Osborne

Coverage probabilities of the two-sided one-sample t-test are simulated for some symmetric and right-skewed distributions. The symmetric distributions analyzed are Normal, Uniform, Laplace, and student-t with 5, 7, and 10 degrees of freedom. The right-skewed distributions analyzed are Exponential and Chi-square with 1, 2, and 3 degrees of freedom. Left-skewed distributions were not analyzed without loss of generality. The coverage probabilities for the symmetric distributions tend to achieve or just barely exceed the nominal values. The coverage probabilities for the skewed distributions tend to be too low, indicating high Type I error rates. Percentiles for the skewness and kurtosis statistics are simulated using Normal data. For sample sizes of 5, 10, 15 and 20 the skewness statistic does an excellent job of detecting non-Normal data, except for Uniform data. The kurtosis statistic also does an excellent job of detecting non-Normal data, including Uniform data. Examined herein are Type I error rates, but not power calculations. We nd that sample skewness is unhelpful when determining whether or not the t-test should be used, but low sample kurtosis is reason to avoid using the t-test.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Lisa Avery ◽  
Nooshin Rotondi ◽  
Constance McKnight ◽  
Michelle Firestone ◽  
Janet Smylie ◽  
...  

Abstract Background It is unclear whether weighted or unweighted regression is preferred in the analysis of data derived from respondent driven sampling. Our objective was to evaluate the validity of various regression models, with and without weights and with various controls for clustering in the estimation of the risk of group membership from data collected using respondent-driven sampling (RDS). Methods Twelve networked populations, with varying levels of homophily and prevalence, based on a known distribution of a continuous predictor were simulated using 1000 RDS samples from each population. Weighted and unweighted binomial and Poisson general linear models, with and without various clustering controls and standard error adjustments were modelled for each sample and evaluated with respect to validity, bias and coverage rate. Population prevalence was also estimated. Results In the regression analysis, the unweighted log-link (Poisson) models maintained the nominal type-I error rate across all populations. Bias was substantial and type-I error rates unacceptably high for weighted binomial regression. Coverage rates for the estimation of prevalence were highest using RDS-weighted logistic regression, except at low prevalence (10%) where unweighted models are recommended. Conclusions Caution is warranted when undertaking regression analysis of RDS data. Even when reported degree is accurate, low reported degree can unduly influence regression estimates. Unweighted Poisson regression is therefore recommended.


Sign in / Sign up

Export Citation Format

Share Document