scholarly journals Inference for One-Way ANOVA with Equicorrelation Error Structure

2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Author(s):  
Weiyan Mu ◽  
Xiaojing Wang

We consider inferences in a one-way ANOVA model with equicorrelation error structures. Hypotheses of the equality of the means are discussed. A generalizedF-test has been proposed by in the literature to compare the means of all populations. However, they did not discuss the performance of that test. We propose two methods, a generalized pivotal quantities-based method and a parametric bootstrap method, to test the hypotheses of equality of the means. We compare the empirical performance of the proposed tests with the generalizedF-test. It can be seen from the simulation results that the generalizedF-test does not perform well in terms of Type I error rate, and the proposed tests perform much better. We also provide corresponding simultaneous confidence intervals for all pair-wise differences of the means, whose coverage probabilities are close to the confidence level.

2006 ◽  
Vol 3 (1) ◽  
Author(s):  
Sharipah Syed Yahaya ◽  
Abdul Othman ◽  
Harvey Keselman

Nonnormality and variance heterogeneity affect the validity of the traditional tests for treatment group equality (e.g. ANOVA F-test and t-test), particularly when group sizes are unequal. Adopting trimmed means instead of the usual least squares estimator has been shown to be mostly affective in combating the deleterious effects of nonnormality. There are, however, practical concerns regarding trimmed means, such as the predetermined amount of symmetric trimming that is typically used. Wilcox and Keselman proposed the Modified One- Step M-estimator (MOM) which empirically determines the amount of trimming. Othman et al. found that when this estimator is used with Schrader and Hettmansperger's H statistic, rates of Type I error were well controlled even though data were nonnormal in form. In this paper, we modified the criterion for choosing the sample values for MOM by replacing the default scale estimator, MADn, with two robust scale estimators, Sn and Tn , suggested by Rousseeuw and Croux (1993). To study the robustness of the modified methods, conditions that are known to negatively affect rates of Type I error were manipulated. As well, a bootstrap method was used to generate a better approximate sampling distribution since the null distribution of MOM-H is intractable. These modified methods resulted in better Type I error control especially when data were extremely skewed.


Mathematics ◽  
2018 ◽  
Vol 6 (11) ◽  
pp. 269 ◽  
Author(s):  
Sergio Camiz ◽  
Valério Pillar

The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis (PCA) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules—have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA—and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.


2017 ◽  
Vol 41 (4) ◽  
pp. 243-263 ◽  
Author(s):  
Xi Wang ◽  
Yang Liu ◽  
Ronald K. Hambleton

Repeatedly using items in high-stake testing programs provides a chance for test takers to have knowledge of particular items in advance of test administrations. A predictive checking method is proposed to detect whether a person uses preknowledge on repeatedly used items (i.e., possibly compromised items) by using information from secure items that have zero or very low exposure rates. Responses on the secure items are first used to estimate a person’s proficiency distribution, and then the corresponding predictive distribution for the person’s responses on the possibly compromised items is constructed. The use of preknowledge is identified by comparing the observed responses to the predictive distribution. Different estimation methods for obtaining a person’s proficiency distribution and different choices of test statistic in predictive checking are considered. A simulation study was conducted to evaluate the empirical Type I error and power rate of the proposed method. The simulation results suggested that the Type I error of this method is well controlled, and this method is effective in detecting preknowledge when a large proportion of items are compromised even with a short secure section. An empirical example is also presented to demonstrate its practical use.


1983 ◽  
Vol 8 (4) ◽  
pp. 289-309 ◽  
Author(s):  
Larry E. Toothaker ◽  
Martha Banz ◽  
Cindy Noble ◽  
Jill Camp ◽  
Diana Davis

Several methods have been proposed for the analysis of data from single-subject research settings. This research focuses on the modifications of ANOVA-based tests proposed by Shine and Bower, a procedure that precedes the ANOVA F test by preliminary testing of within-phase lag one serial correlation and the one-way ANOVA as presented by Gentile, Roden and Klein. Monte Carlo simulation is used to investigate these tests with respect to robustness and power. Each test was analyzed under various patterns of serial correlation, various patterns of phase and trial means, normal and exponential distributions, and equal and unequal phase variances. The findings indicate that the probability of a Type I error for these ANOVA-based tests is seriously inflated by nonzero serial correlation. These tests, therefore, cannot be recommended for use with data that have nonzero serial correlation.


2016 ◽  
Vol 154 (8) ◽  
pp. 1392-1412 ◽  
Author(s):  
Q. KANG ◽  
C. I. VAHL

SUMMARYSafety evaluation of a genetically modified crop entails assessing its equivalence to conventional crops under multi-site randomized block field designs. Despite mounting petitions for regulatory approval, there lack a scientifically sound and powerful statistical method for establishing equivalence. The current paper develops and validates two procedures for testing a recently identified class of equivalence uniquely suited to crop safety. One procedure employs the modified large sample (MLS) method; the other is based on generalized pivotal quantities (GPQs). Because both methods were originally created under balanced designs, common issues associated with incomplete and unbalanced field designs were addressed by first identifying unfulfilled theoretical assumptions and then replacing them with user-friendly approximations. Simulation indicated that the MLS procedure could be very conservative in many occasions irrespective of the balance of the design; the GPQ procedure was mildly liberal with its type I error rate near the nominal level when the design is balanced. Additional pros and cons of these two procedures are also discussed. Their utility is demonstrated in a case study using summary statistics derived from a real-world dataset.


2019 ◽  
Vol 15 (1) ◽  
Author(s):  
Tobias Hepp ◽  
Matthias Schmid ◽  
Andreas Mayr

Abstract Generalized additive models for location scale and shape (GAMLSS) offer very flexible solutions to a wide range of statistical analysis problems, but can be challenging in terms of proper model specification. This complex task can be simplified using regularization techniques such as gradient boosting algorithms, but the estimates derived from such models are shrunken towards zero and it is consequently not straightforward to calculate proper confidence intervals or test statistics. In this article, we propose two strategies to obtain p-values for linear effect estimates for Gaussian location and scale models based on permutation tests and a parametric bootstrap approach. These procedures can provide a solution for one of the remaining problems in the application of gradient boosting algorithms for distributional regression in biostatistical data analyses. Results from extensive simulations indicate that in low-dimensional data both suggested approaches are able to hold the type-I error threshold and provide reasonable test power comparable to the Wald-type test for maximum likelihood inference. In high-dimensional data, when gradient boosting is the only feasible inference for this model class, the power decreases but the type-I error is still under control. In addition, we demonstrate the application of both tests in an epidemiological study to analyse the impact of physical exercise on both average and the stability of the lung function of elderly people in Germany.


Author(s):  
Steven T. Garren ◽  
Kate McGann Osborne

Coverage probabilities of the two-sided one-sample t-test are simulated for some symmetric and right-skewed distributions. The symmetric distributions analyzed are Normal, Uniform, Laplace, and student-t with 5, 7, and 10 degrees of freedom. The right-skewed distributions analyzed are Exponential and Chi-square with 1, 2, and 3 degrees of freedom. Left-skewed distributions were not analyzed without loss of generality. The coverage probabilities for the symmetric distributions tend to achieve or just barely exceed the nominal values. The coverage probabilities for the skewed distributions tend to be too low, indicating high Type I error rates. Percentiles for the skewness and kurtosis statistics are simulated using Normal data. For sample sizes of 5, 10, 15 and 20 the skewness statistic does an excellent job of detecting non-Normal data, except for Uniform data. The kurtosis statistic also does an excellent job of detecting non-Normal data, including Uniform data. Examined herein are Type I error rates, but not power calculations. We nd that sample skewness is unhelpful when determining whether or not the t-test should be used, but low sample kurtosis is reason to avoid using the t-test.


Author(s):  
Guosheng Yin ◽  
Chenyang Zhang ◽  
Huaqing Jin

AbstractBackgroundSince the outbreak of the novel coronavirus disease 2019 (COVID-19) in December 2019, it has rapidly spread in more than 200 countries or territories with over 8 million confirmed cases and 440,000 deaths by June 17, 2020. Recently, three randomized clinical trials on COVID-19 treatments were completed, one for lopinavir-ritonavir and two for remdesivir. One trial reported that remdesivir was superior to placebo in shortening the time to recovery, while the other two showed no benefit of the treatment under investigation. However, several statistical issues in the original design and analysis of the three trials are identified, which might shed doubts on their findings and the conclusions should be evaluated with cautions.ObjectiveFrom statistical perspectives, we identify several issues in the design and analysis of three COVID-19 trials and reanalyze the data from the cumulative incidence curves in the three trials using more appropriate statistical methods.MethodsThe lopinavir-ritonavir trial enrolled 39 additional patients due to insignificant results after the sample size reached the planned number, which led to inflation of the type I error rate. The remdesivir trial of Wang et al. failed to reach the planned sample size due to a lack of eligible patients, while the bootstrap method was used to predict the quantity of clinical interest conditionally and unconditionally if the trial had continued to reach the originally planned sample size. Moreover, we used a terminal (or cure) rate model and a model-free metric known as the restricted mean survival time or the restricted mean time to improvement (RMTI) in this context to analyze the reconstructed data due to the existence of death as competing risk and a terminal event. The remdesivir trial of Beigel et al. reported the median recovery time of the remdesivir and placebo groups and the rate ratio for recovery, while both quantities depend on a particular time point representing local information. We reanalyzed the data to report other percentiles of the time to recovery and adopted the bootstrap method and permutation test to construct the confidence intervals as well as the P values. The restricted mean time to recovery (RMTR) was also computed as a global and robust measure for efficacy.ResultsFor the lopinavir-ritonavir trial, with the increase of sample size from 160 to 199, the type I error rate was inflated from 0.05 to 0.071. The difference of terminal rates was −8.74% (95% CI [-21.04, 3.55]; P=.16) and the hazards ratio (HR) adjusted for terminal rates was 1.05 (95% CI [0.78, 1.42]; P=.74), indicating no significant difference. The difference of RMTIs between the two groups evaluated at day 28 was −1.67 days (95% CI [-3.62, 0.28]; P=.09) in favor of lopinavir-ritonavir but not statistically significant. For the remdesivir trial of Wang et al., the difference of terminal rates was −0.89% (95% CI [-2.84, 1.06]; P=.19) and the HR adjusted for terminal rates was 0.92 (95% CI [0.63, 1.35]; P=.67). The difference of RMTIs at day 28 was −0.89 day (95% CI [-2.84, 1.06]; P=.37). The planned sample size was 453, yet only 236 patients were enrolled. The conditional prediction shows that the HR estimates would reach statistical significance if the target sample size had been maintained, and both conditional and unconditional prediction delivered significant HR results if the trial had continued to double the target sample size. For the remdesivir trial of Beigel et al., the difference of RMTRs between the remdesivir and placebo groups up to day 30 was −2.7 days (95% CI [-4.0, −1.2]; P<.001), confirming the superiority of remdesivir. The difference in recovery time at the 25th percentile (95% CI [-3, 0]; P=.65) was insignificant, while the differences manifested to be statistically significant at larger percentiles.ConclusionsBased on the statistical issues and lessons learned from the recent three clinical trials on COVID-19 treatments, we suggest more appropriate approaches for the design and analysis for ongoing and future COVID-19 trials.


Stats ◽  
2020 ◽  
Vol 3 (1) ◽  
pp. 40-55 ◽  
Author(s):  
Sergio Perez-Melo ◽  
B. M. Golam Kibria

Ridge regression is a popular method to solve the multicollinearity problem for both linear and non-linear regression models. This paper studied forty different ridge regression t-type tests of the individual coefficients of a linear regression model. A simulation study was conducted to evaluate the performance of the proposed tests with respect to their empirical sizes and powers under different settings. Our simulation results demonstrated that many of the proposed tests have type I error rates close to the 5% nominal level and, among those, all tests except one have considerable gain in powers over the standard ordinary least squares (OLS) t-type test. It was observed from our simulation results that seven tests based on some ridge estimators performed better than the rest in terms of achieving higher power gains while maintaining a 5% nominal size.


2018 ◽  
Author(s):  
Marie Delacre ◽  
Daniel Lakens ◽  
Youri Mora ◽  
Christophe Leys

Student's t-test and classical F-test ANOVA rely on the assumptions that two or more samples are independent, and that independent and identically distributed residuals are normal and have equal variances between groups. We focus on the assumptions of normality and equality of variances, and argue that these assumptions are often unrealistic in the field of psychology. We underline the current lack of attention to these assumptions through an analysis of researchers' practices. Through Monte Carlo simulations we illustrate the consequences of performing the classic parametric F-test for ANOVA when the test assumptions are not met on the Type I error rate and statistical power. Under realistic deviations from the assumption of equal variances the classic F-test can yield severely biased results and lead to invalid statistical inferences. We examine two common alternatives to the F-test, namely the Welch's ANOVA (W-test) and the Brown-Forsythe test (F*-test). Our simulations show that under a range of realistic scenariosthe W-test is a better alternative and we therefore recommend using the W-test by default when comparing means. We provide a detailed example explaining how to perform the W-test in SPSS and R. We summarize our conclusions in practical recommendations that researchers can use to improve their statistical practices.


Sign in / Sign up

Export Citation Format

Share Document