When null hypothesis significance testing is unsuitable for research: a reassessment

Mapping Intimacies ◽

10.1101/095570 ◽

2016 ◽

Cited By ~ 4

Author(s):

Denes Szucs ◽

John PA Ioannidis

Keyword(s):

Null Hypothesis ◽

Likelihood Estimation ◽

Psychological Research ◽

Biomedical Science ◽

Significance Testing ◽

Contributing Factors ◽

Null Hypothesis Significance Testing ◽

Educational Approach ◽

Negative Experience ◽

Crisis Of Psychology

AbstractNull hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of psychology, cognitive neuroscience and biomedical science in general. We review these shortcomings and suggest that, after about 60 years of negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. Different inferential methods (NHST, likelihood estimation, Bayesian methods, false-discovery rate control) may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Studies should optimally be pre-registered and raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out. Instead, we should encourage either more in-depth statistical training of more researchers and/or more widespread involvement of professional statisticians in all research.

The Controversy over Null Hypothesis Significance Testing Revisited

Methodology ◽

10.1027/1614-1881.1.2.55 ◽

2005 ◽

Vol 1 (2) ◽

pp. 55-70 ◽

Cited By ~ 32

Author(s):

Nekane Balluerka ◽

Juana Gómez ◽

Dolores Hidalgo

Keyword(s):

Statistical Analysis ◽

Null Hypothesis ◽

Task Force ◽

Data Interpretation ◽

Psychological Research ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Research Activity ◽

Testing Hypotheses ◽

Rigorous Research

Abstract. Null hypothesis significance testing (NHST) is one of the most widely used methods for testing hypotheses in psychological research. However, it has remained shrouded in controversy throughout the almost seventy years of its existence. The present article reviews both the main criticisms of the method as well as the alternatives which have been put forward to complement or replace it. It focuses basically on those alternatives whose use is recommended by the Task Force on Statistical Inference (TFSI) of the APA ( Wilkinson and TFSI, 1999 ) in the interests of improving the working methods of researchers with respect to statistical analysis and data interpretation. In addition, the arguments used to reject each of the criticisms levelled against NHST are reviewed and the main problems with each of the alternatives are pointed out. It is concluded that rigorous research activity requires use of NHST in the appropriate context, the complementary use of other methods which provide information about aspects not addressed by NHST, and adherence to a series of recommendations which promote its rational use in psychological research.

Is social psychological research really so negatively biased?

Behavioral and Brain Sciences ◽

10.1017/s0140525x04340082 ◽

2004 ◽

Vol 27 (3) ◽

pp. 340-341 ◽

Cited By ~ 1

Author(s):

Aiden P. Gregg ◽

Constantine Sedikides

Keyword(s):

Social Psychology ◽

Null Hypothesis ◽

Psychological Research ◽

Negativity Bias ◽

Significance Testing ◽

Social Psychological ◽

Null Hypothesis Significance Testing ◽

Mental Abilities ◽

Social Psychological Research

Krueger & Funder (K&F) overstate the defects of Null Hypothesis Significance Testing (NHST), and with it the magnitude of negativity bias within social psychology. We argue that replication matters more than NHST, that the pitfalls of NHST are not always or necessarily realized, and that not all biases are harmless offshoots of adaptive mental abilities.

A Critical Discussion of Null Hypothesis Significance Testing and Statistical Power Analysis within Psychological Research

Nordic Psychology ◽

10.1027/1901-2276.59.3.223 ◽

2007 ◽

Vol 59 (3) ◽

pp. 223-230 ◽

Cited By ~ 2

Author(s):

Allan Jones ◽

Bo Sommerlund

Keyword(s):

Null Hypothesis ◽

Power Analysis ◽

Statistical Power ◽

Psychological Research ◽

Critical Discussion ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Statistical Power Analysis

Tests of Statistical Significance Made Sound

Educational and Psychological Measurement ◽

10.1177/0013164416667981 ◽

2016 ◽

Vol 77 (3) ◽

pp. 489-506 ◽

Cited By ~ 12

Author(s):

Brian D. Haig

Keyword(s):

Null Hypothesis ◽

Statistical Significance ◽

Psychological Research ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Alternative Account ◽

Definite Improvement ◽

Enormous Amount ◽

Uncritical Acceptance ◽

The Subject

This article considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology’s understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis significance testing (NHST), which is an indefensible amalgam of ideas adapted from Fisher’s thinking on the subject and from Neyman and Pearson’s alternative account. To correct for the deficiencies of the hybrid, it is suggested that psychology avail itself of two important and more recent viewpoints on ToSS, namely the neo-Fisherian and the error-statistical perspectives. The neo-Fisherian perspective endeavors to improve on Fisher’s original account and rejects key elements of Neyman and Pearson’s alternative. In contrast, the error-statistical perspective builds on the strengths of both statistical traditions. It is suggested that these more recent outlooks on ToSS are a definite improvement on NHST, especially the error-statistical position. It is suggested that ToSS can play a useful, if limited, role in psychological research. At the end, some lessons learnt from the extensive debates about ToSS are presented.

A Frequentist Alternative to Significance Testing, p-Values, and Confidence Intervals

Econometrics ◽

10.3390/econometrics7020026 ◽

2019 ◽

Vol 7 (2) ◽

pp. 26 ◽

Cited By ~ 7

Author(s):

David Trafimow

Keyword(s):

Present Article ◽

Confidence Intervals ◽

Null Hypothesis ◽

A Priori ◽

Significance Testing ◽

Population Parameters ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Statistical Procedures ◽

Major Section

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.

The Numbers Will Love You Back in Return—I Promise

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2016-0214 ◽

2016 ◽

Vol 11 (4) ◽

pp. 551-554 ◽

Cited By ~ 53

Author(s):

Martin Buchheit

Keyword(s):

Sample Size ◽

Null Hypothesis ◽

Clinical Medicine ◽

Statistical Significance ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Sport Science ◽

Size Dependent ◽

Research Questions ◽

Per Se

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.

In support of null hypothesis significance testing

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rsbl.2003.0105 ◽

2004 ◽

Vol 271 (suppl_3) ◽

Cited By ~ 12

Author(s):

Michael Mogie

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing

Роль показника розміру ефекту в сучасних психологічних дослідженнях

10.52363/dcpp-2021.2.9 ◽

2021 ◽

Author(s):

Валерій Боснюк

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

P Value ◽

Null Hypothesis Significance Testing

Для підтвердження результатів дослідження в психологічних наукових роботах протягом багатьох років використовується процедура перевірки значущості нульової гіпотези (загальноприйнята абревіатура NHST – Null Hypothesis Significance Testing) із застосуванням спеціальних статистичних критеріїв. При цьому здебільшого значення статистики «p» (p-value) розглядається як еквівалент важливості отриманих результатів і сили наукових доказів на користь практичного й теоретичного ефекту дослідження. Таке некоректне використання та інтерпретації p-value ставить під сумнів застосування статистики взагалі та загрожує розвитку психології як науки. Ототожнення статистичного висновку з науковим висновком, орієнтація виключно на новизну в наукових дослідженнях, ритуальна прихильність дослідників до рівня значущості 0,05, опора на статистичну категоричність «так/ні» під час прийняття рішення призводить до того, що психологія примножує тільки результати про наявність ефекту без врахування його величини, практичної цінності. Дана робота призначена для аналізу обмеженості p-value при інтерпретації результатів психологічних досліджень та переваг представлення інформації про розмір ефекту. Застосування розмірів ефекту дозволить здійснити перехід від дихотомічного мислення до оціночного, визначати цінність результатів незалежно від рівня статистичної значущості, приймати рішення більш раціонально та обґрунтовано. Обґрунтовується позиція, що автор наукової роботи при формулюванні висновків дослідження не повинен обмежуватися одним єдиним показником рівня статистичної значущості. Осмислені висновки повинні базуватися на розумному балансуванні p-value та інших не менш важливих параметрів, одним з яких виступає розмір ефекту. Ефект (відмінність, зв’язок, асоціація) може бути статистично значущим, а його практична (клінічна) цінність – незначною, тривіальною. «Статистично значущий» не означає «корисний», «важливий», «цінний», «значний». Тому звернення уваги психологів до питання аналізу виявленого розміру ефекту має стати обов’язковим при інтерпретації результатів дослідження.

The Falsificationist Foundation for Null Hypothesis Significance Testing

Statistical and Fuzzy Approaches to Data Processing, with Applications to Econometrics and Other Areas - Studies in Computational Intelligence ◽

10.1007/978-3-030-45619-1_16 ◽

2020 ◽

pp. 219-226

Author(s):

David Trafimow

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing

The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing

American Journal of Epidemiology ◽

10.1093/aje/kwx261 ◽

2017 ◽

Vol 186 (6) ◽

pp. 627-635 ◽

Cited By ~ 30

Author(s):

Timothy L. Lash

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing