scholarly journals Controlling the rate of Type I error over a large set of statistical tests

2002 ◽  
Vol 55 (1) ◽  
pp. 27-39 ◽  
Author(s):  
H.J. Keselman ◽  
Robert Cribbie ◽  
Burt Holland
2020 ◽  
Vol 7 (1) ◽  
pp. 1-6
Author(s):  
João Pedro Nunes ◽  
Giovanna F. Frigoli

The online support of IBM SPSS proposes that users alter the syntax when performing post-hoc analyses for interaction effects of ANOVA tests. Other authors also suggest altering the syntax when performing GEE analyses. This being done, the number of possible comparisons (k value) is also altered, therefore influencing the results from statistical tests that k is a component of the formula, such as repeated measures-ANOVA and Bonferroni post-hoc of ANOVA and GEE. This alteration also exacerbates type I error, producing erroneous results and conferring potential misinterpretations of data. Reasoning from this, the purpose of this paper is to report the misuse and improper handling of syntax for ANOVAs and GEE post-hoc analyses in SPSS and to illustrate its consequences on statistical results and data interpretation.


1998 ◽  
Vol 10 (7) ◽  
pp. 1895-1923 ◽  
Author(s):  
Thomas G. Dietterich

This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The cross-validated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, Mc-Nemar's test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5 × 2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.


2019 ◽  
Vol 227 (1) ◽  
pp. 83-89 ◽  
Author(s):  
Michael Kossmeier ◽  
Ulrich S. Tran ◽  
Martin Voracek

Abstract. The funnel plot is widely used in meta-analyses to assess potential publication bias. However, experimental evidence suggests that informal, mere visual, inspection of funnel plots is frequently prone to incorrect conclusions, and formal statistical tests (Egger regression and others) entirely focus on funnel plot asymmetry. We suggest using the visual inference framework with funnel plots routinely, including for didactic purposes. In this framework, the type I error is controlled by design, while the explorative, holistic, and open nature of visual graph inspection is preserved. Specifically, the funnel plot of the actually observed data is presented simultaneously, in a lineup, with null funnel plots showing data simulated under the null hypothesis. Only when the real data funnel plot is identifiable from all the funnel plots presented, funnel plot-based conclusions might be warranted. Software to implement visual funnel plot inference is provided via a tailored R function.


1999 ◽  
Vol 11 (8) ◽  
pp. 1885-1892 ◽  
Author(s):  
Ethem Alpaydm

Dietterich (1998) reviews five statistical tests and proposes the 5 × 2 cvt test for determining whether there is a significant difference between the error rates of two classifiers. In our experiments, we noticed that the 5 × 2 cvt test result may vary depending on factors that should not affect the test, and we propose a variant, the combined 5 × 2 cv F test, that combines multiple statistics to get a more robust test. Simulation results show that this combined version of the test has lower type I error and higher power than 5 × 2 cv proper.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1129
Author(s):  
Christopher R. Madan

Statistical analyses are often conducted with α=.05. When multiple statistical tests are conducted, this procedure needs to be adjusted to compensate for the otherwise inflated Type I error. In some instances in tabletop gaming, sometimes it is desired to roll a 20-sided dice (or `d20') twice and take the greater outcome. Here I draw from probability theory and the case of a d20, where the probability of obtaining any specific outcome is 1/20, to determine the probability of obtaining a specific outcome (Type-I error) at least once across repeated, independent statistical tests.


Methodology ◽  
2012 ◽  
Vol 8 (1) ◽  
pp. 1-11 ◽  
Author(s):  
John Ruscio ◽  
Brendan Roche

Parametric assumptions for statistical tests include normality and equal variances. Micceri (1989) found that data frequently violate the normality assumption; variances have received less attention. We recorded within-group variances of dependent variables for 455 studies published in leading psychology journals. Sample variances differed, often substantially, suggesting frequent violation of the assumption of equal population variances. Parallel analyses of equal-variance artificial data otherwise matched to the characteristics of the empirical data show that unequal sample variances in the empirical data exceed expectations from normal sampling error and can adversely affect Type I error rates of parametric statistical tests. Variance heterogeneity was unrelated to relative group sizes or total sample size and observed across subdisciplines of psychology in experimental and correlational research. These results underscore the value of examining variances and, when appropriate, using data-analytic methods robust to unequal variances. We provide a standardized index for examining and reporting variance heterogeneity.


2008 ◽  
Vol 65 (4) ◽  
pp. 428-432 ◽  
Author(s):  
Armando Conagin ◽  
Décio Barbin ◽  
Clarice Garcia Borges Demétrio

Multiple pairwise comparison tests of treatment means are of great interest in applied research. Two modifications for the Tukey test were proposed. The power of unilateral and bilateral Student, Waller-Duncan, Duncan, SNK, REGWF, REGWQ, Tukey, Bonferroni, Sidak, unilateral Dunnet statistical tests and the modified tests, Sidak, Bonferroni 1 and 2, Tukey 1 and 2, has been compared using the Monte Carlo method. Data were generated for 600 experiments with eight treatments in a randomized block design, of which 400 had four and 200 eight blocks. The differences between the treatment means in relation to the control were 30%, 20%, 15%, 10%, 5%. Two extra treatments did not differ from the control. A coefficient of variation of 10% and a probability Type I error of α = 0.05 were adopted. The power of all the tests decreased when the differences to the control, decreased. The unilateral and bilateral Student t, Waller-Duncan and Duncan tests showed greater number of significative differences, followed by unilateral Dunnett, modified Sidak, modified Bonferroni 1 and 2, modified Tukey 1, SNK, REGWF, REGWQ, modified Tukey 2, Tukey, Sidak and Bonferroni. There is great loss of efficiency for all tests in relation to the unilateral Student t test for each difference of the treatment to the control, when the differences between means decrease. The modified tests were always more efficient than their original ones.


Sign in / Sign up

Export Citation Format

Share Document