Null Hypothesis Significance Testing Defended and Calibrated by Bayesian Model Checking

Classical null hypothesis significance testing is limited to the rejection of the point-null hypothesis; it does not allow the interpretation of non-significant results. Moreover, studies with a sufficiently large sample size will find statistically significant results even when the effect is negligible and may be considered practically equivalent to the null effect. This leads to a publication bias against the null hypothesis. There are two main approaches to assess null effects: shifting from the point-null to the interval-null hypothesis and considering the practical significance in the frequentist approach; using the Bayesian parameter inference based on posterior probabilities, or the Bayesian model inference based on Bayes factors. Herein, we discuss these statistical methods with particular focus on the application of the Bayesian parameter inference, as it is conceptually connected to both frequentist and Bayesian model inferences. Although Bayesian methods have been theoretically elaborated and implemented in commonly used neuroimaging software, they are not widely used for null effect assessment. To demonstrate the advantages of using the Bayesian parameter inference, we compared it with classical null hypothesis significance testing for fMRI data group analysis. We also consider the problem of choosing a threshold for a practically significant effect and discuss possible applications of Bayesian parameter inference in fMRI studies. We argue that Bayesian inference, which directly provides evidence for both the null and alternative hypotheses, may be more intuitive and convenient for practical use than frequentist inference, which only provides evidence against the null hypothesis. Moreover, it may indicate that the obtained data are not sufficient to make a confident inference. Because interim analysis is easy to perform using Bayesian inference, one can evaluate the data as the sample size increases and decide to terminate the experiment if the obtained data are sufficient to make a confident inference. To facilitate the application of the Bayesian parameter inference to null effect assessment, scripts with a simple GUI were developed.

Download Full-text

A Frequentist Alternative to Significance Testing, p-Values, and Confidence Intervals

Econometrics ◽

10.3390/econometrics7020026 ◽

2019 ◽

Vol 7 (2) ◽

pp. 26 ◽

Cited By ~ 7

Author(s):

David Trafimow

Keyword(s):

Present Article ◽

Confidence Intervals ◽

Null Hypothesis ◽

A Priori ◽

Significance Testing ◽

Population Parameters ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Statistical Procedures ◽

Major Section

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.

Download Full-text

The Numbers Will Love You Back in Return—I Promise

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2016-0214 ◽

2016 ◽

Vol 11 (4) ◽

pp. 551-554 ◽

Cited By ~ 53

Author(s):

Martin Buchheit

Keyword(s):

Sample Size ◽

Null Hypothesis ◽

Clinical Medicine ◽

Statistical Significance ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Sport Science ◽

Size Dependent ◽

Research Questions ◽

Per Se

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.

Download Full-text

In support of null hypothesis significance testing

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rsbl.2003.0105 ◽

2004 ◽

Vol 271 (suppl_3) ◽

Cited By ~ 12

Author(s):

Michael Mogie

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing

Download Full-text

Роль показника розміру ефекту в сучасних психологічних дослідженнях

10.52363/dcpp-2021.2.9 ◽

2021 ◽

Author(s):

Валерій Боснюк

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

P Value ◽

Null Hypothesis Significance Testing

Для підтвердження результатів дослідження в психологічних наукових роботах протягом багатьох років використовується процедура перевірки значущості нульової гіпотези (загальноприйнята абревіатура NHST – Null Hypothesis Significance Testing) із застосуванням спеціальних статистичних критеріїв. При цьому здебільшого значення статистики «p» (p-value) розглядається як еквівалент важливості отриманих результатів і сили наукових доказів на користь практичного й теоретичного ефекту дослідження. Таке некоректне використання та інтерпретації p-value ставить під сумнів застосування статистики взагалі та загрожує розвитку психології як науки. Ототожнення статистичного висновку з науковим висновком, орієнтація виключно на новизну в наукових дослідженнях, ритуальна прихильність дослідників до рівня значущості 0,05, опора на статистичну категоричність «так/ні» під час прийняття рішення призводить до того, що психологія примножує тільки результати про наявність ефекту без врахування його величини, практичної цінності. Дана робота призначена для аналізу обмеженості p-value при інтерпретації результатів психологічних досліджень та переваг представлення інформації про розмір ефекту. Застосування розмірів ефекту дозволить здійснити перехід від дихотомічного мислення до оціночного, визначати цінність результатів незалежно від рівня статистичної значущості, приймати рішення більш раціонально та обґрунтовано. Обґрунтовується позиція, що автор наукової роботи при формулюванні висновків дослідження не повинен обмежуватися одним єдиним показником рівня статистичної значущості. Осмислені висновки повинні базуватися на розумному балансуванні p-value та інших не менш важливих параметрів, одним з яких виступає розмір ефекту. Ефект (відмінність, зв’язок, асоціація) може бути статистично значущим, а його практична (клінічна) цінність – незначною, тривіальною. «Статистично значущий» не означає «корисний», «важливий», «цінний», «значний». Тому звернення уваги психологів до питання аналізу виявленого розміру ефекту має стати обов’язковим при інтерпретації результатів дослідження.

Download Full-text

The Falsificationist Foundation for Null Hypothesis Significance Testing

Statistical and Fuzzy Approaches to Data Processing, with Applications to Econometrics and Other Areas - Studies in Computational Intelligence ◽

10.1007/978-3-030-45619-1_16 ◽

2020 ◽

pp. 219-226

Author(s):

David Trafimow

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing

Download Full-text

The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing

American Journal of Epidemiology ◽

10.1093/aje/kwx261 ◽

2017 ◽

Vol 186 (6) ◽

pp. 627-635 ◽

Cited By ~ 30

Author(s):

Timothy L. Lash

Keyword(s):

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing

Download Full-text

Will the numbers really love you back: Re-examining Magnitude-based Inference

10.31236/osf.io/e3vs6 ◽

2017 ◽

Cited By ~ 3

Author(s):

Michael Lloyd Butson

Keyword(s):

Sports Medicine ◽

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Full Account ◽

Sports Science ◽

Theoretical Foundations

Many sports medicine and sports science researchers use Null Hypothesis Significance Testing despite it being criticized for being an amalgam of two irreconcilable methodologies. Hopkins and Batterham proposed Magnitude-based Inference as an alternative to Null Hypothesis Significance Testing; however, its validity and utility has also been questioned. Recently, it was suggested that the critics of Magnitude-based Inference lacked vision and that their objections to Magnitude-based inference should be ignored. However, a re-examination of Hopkins and Batterham’s explanation of their method indicates that they use profoundly different approaches in ways that are at odds with their theoretical foundations and intended purposes. If Hopkins and Batterham were to provide a full account of how their method is implemented, it could be comprehensively assessed. Until then, sports medicine and sports science researchers should use other theoretically valid methods that have had their utility established.

Download Full-text

On the Potential Mismatch between the Function of the Bayes Factor and Researchers’ Expectations

10.31234/osf.io/86p4k ◽

2021 ◽

Author(s):

Tsz Keung Wong ◽

Henk Kiers ◽

Jorge Tendeiro

Keyword(s):

Null Hypothesis ◽

Bayes Factor ◽

Survey Study ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Posterior Odds ◽

Statistical Tool ◽

Reporting Practices ◽

Insight Into

The aim of this study is to investigate whether there is a potential mismatch between the usability of a statistical tool and psychology researchers’ expectation of it. Bayesian statistics is often promoted as an ideal substitute for frequentists statistics since it coincides better with researchers’ expectations and needs. A particular incidence of this is the proposal of replacing Null Hypothesis Significance Testing (NHST) by Null Hypothesis Bayesian Testing (NHBT) using the Bayes factor. In this paper, it is studied to what extent the usability and expectations of NHBT match well. First, a study of the reporting practices in 73 psychological publications was carried out. It was found that eight Questionable Reporting and Interpreting Practices (QRIPs) occur more than once among the practitioners when doing NHBT. Specifically, our analysis provides insight into possible mismatches and their occurrence frequencies. A follow-up survey study has been conducted to assess such mismatches. The sample (N = 108) consisted of psychology researchers, experts in methodology (and/or statistics), and applied researchers in fields other than psychology. The data show that discrepancies exist among the participants. Interpreting the Bayes Factor as posterior odds and not acknowledging the notion of relative evidence in the Bayes Factor are arguably the most concerning ones. The results of the paper suggest that a shift of statistical paradigm cannot solve the problem of misinterpretation altogether if the users are not well acquainted with the tools.

Download Full-text

Null Hypothesis Significance Testing: Ramifications, Ruminations and Recommendations

South African Journal of Psychology ◽

10.1177/008124630503500101 ◽

2005 ◽

Vol 35 (1) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

G. K. Huysamen

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Effect Size ◽

Null Hypothesis ◽

Significance Testing ◽

Population Parameter ◽

Size Estimation ◽

Null Hypothesis Significance Testing ◽

Point Estimates ◽

Size Estimates

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.

Download Full-text