scholarly journals Why and How to Replace Statistical Significance Tests with Better Methods to Evaluate Hypotheses

2021 ◽  
Vol 2021 (1) ◽  
pp. 13779
Author(s):  
Sam Holloway ◽  
Andreas Schwab ◽  
William H. Starbuck
1998 ◽  
Vol 21 (2) ◽  
pp. 221-222
Author(s):  
Louis G. Tassinary

Chow (1996) offers a reconceptualization of statistical significance that is reasoned and comprehensive. Despite a somewhat rough presentation, his arguments are compelling and deserve to be taken seriously by the scientific community. It is argued that his characterization of literal replication, types of research, effect size, and experimental control are in need of revision.


2016 ◽  
Vol 21 (1) ◽  
pp. 102-115 ◽  
Author(s):  
Stephen Gorard

This paper reminds readers of the absurdity of statistical significance testing, despite its continued widespread use as a supposed method for analysing numeric data. There have been complaints about the poor quality of research employing significance tests for a hundred years, and repeated calls for researchers to stop using and reporting them. There have even been attempted bans. Many thousands of papers have now been written, in all areas of research, explaining why significance tests do not work. There are too many for all to be cited here. This paper summarises the logical problems as described in over 100 of these prior pieces. It then presents a series of demonstrations showing that significance tests do not work in practice. In fact, they are more likely to produce the wrong answer than a right one. The confused use of significance testing has practical and damaging consequences for people's lives. Ending the use of significance tests is a pressing ethical issue for research. Anyone knowing the problems, as described over one hundred years, who continues to teach, use or publish significance tests is acting unethically, and knowingly risking the damage that ensues.


2013 ◽  
Vol 12 (3) ◽  
pp. 345-351 ◽  
Author(s):  
Jessica Middlemis Maher ◽  
Jonathan C. Markey ◽  
Diane Ebert-May

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.


Author(s):  
H. S. Styn ◽  
S. M. Ellis

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.


2006 ◽  
Vol 2 (6) ◽  
pp. 1277-1292
Author(s):  
P. D. Ditlevsen ◽  
K. K. Andersen ◽  
A. Svensson

Abstract. The significance of the apparent 1470 years cycle in the recurrence of the Dansgaard-Oeschger (DO) events, observed in the Greenland ice cores, is debated. Here we present statistical significance tests of this periodicity. The detection of a periodicity relies strongly on the accuracy of the dating of the DO events. Here we use both the new NGRIP GICC05 time scale based on multi-parameter annual layer counting and the GISP2 time scale where the periodicity is most pronounced. For the NGRIP dating the recurrence times are indistinguishable from a random occurrence. This is also the case for the GISP2 dating, except in the case where the DO9 event is omitted from the record. Whether or not the record shows a truly periodic beating has strong implications for identifying the underlying cause. If the recurrence is periodic it suggests an external cause. If the recurrence of DO events is not periodic it points to triggering mechanisms internal to the climate system being manifested at the millennial timescale.


1998 ◽  
Vol 15 (2) ◽  
pp. 103-118 ◽  
Author(s):  
Vinson H. Sutlive ◽  
Dale A. Ulrich

The unqualified use of statistical significance tests for interpreting the results of empirical research has been called into question by researchers in a number of behavioral disciplines. This paper reviews what statistical significance tells us and what it does not, with particular attention paid to criticisms of using the results of these tests as the sole basis for evaluating the overall significance of research findings. In addition, implications for adapted physical activity research are discussed. Based on the recent literature of other disciplines, several recommendations for evaluating and reporting research findings are made. They include calculating and reporting effect sizes, selecting an alpha level larger than the conventional .05 level, placing greater emphasis on replication of results, evaluating results in a sample size context, and employing simple research designs. Adapted physical activity researchers are encouraged to use specific modifiers when describing findings as significant.


2018 ◽  
Vol 5 (1) ◽  
pp. 205316801876487
Author(s):  
Lion Behrens ◽  
Ingo Rohlfing

Based on the statistical analysis of an original survey of young party members from six European democracies, a study concluded that three types of young members differed systematically regarding their membership objectives, activism, efficacy and perceptions of the party and self-perceived political future. We performed a technical replication of the original study, correcting four deficiencies, which led us to a different conclusion. First, we discuss substantive significance in addition to statistical significance. Second, we ran significance tests on all comparisons instead of limiting them to an arbitrary subset. Third, we performed pairwise comparisons between the three types of members instead of using pooled groups. Fourth, we avoided the inflation of the type-I error rate due to multiple testing by using the Bonferroni–Holm correction. We found that most of the differences between the types lacked substantive significance, and that statistical significance only coherently distinguished the types of members in their future membership, but not in their present behaviour and attitudes.


1998 ◽  
Vol 21 (2) ◽  
pp. 205-206 ◽  
Author(s):  
John F. Kihlstrom

Statistical significance testing has its problems, but so do the alternatives that are proposed; and the alternatives may be both more cumbersome and less informative. Significance tests remain legitimate aspects of the rhetoric of scientific persuasion.


1998 ◽  
Vol 21 (2) ◽  
pp. 218-219
Author(s):  
Michael G. Shafto

Chow's book provides a thorough analysis of the confusing array of issues surrounding conventional tests of statistical significance. This book should be required reading for behavioral and social scientists. Chow concludes that the null-hypothesis significance-testing procedure (NHSTP) plays a limited, but necessary, role in the experimental sciences. Another possibility is that – owing in part to its metaphorical underpinnings and convoluted logic – the NHSTP is declining in importance in those few sciences in which it ever played a role.


Sign in / Sign up

Export Citation Format

Share Document