scholarly journals A Primer on p-Value Thresholds and α-Levels – Two Different Kettles of Fish

2021 ◽  
Vol 70 (2) ◽  
pp. 123-133
Author(s):  
Norbert Hirschauer ◽  
Sven Grüner ◽  
Oliver Mußhoff ◽  
Claudia Becker

It has often been noted that the “null-hypothesis-significance-testing” (NHST) framework is an inconsistent hybrid of Neyman-Pearson’s “hypothesis testing” and Fisher’s “significance testing” that almost inevitably causes misinterpretations. To facilitate a realistic assessment of the potential and the limits of statistical inference, we briefly recall widespread inferential errors and outline the two original approaches of these famous statisticians. Based on the understanding of their irreconcilable perspectives, we propose “going back to the roots” and using the initial evidence in the data in terms of the size and the uncertainty of the estimate for the purpose of statistical inference. Finally, we make six propositions that hopefully contribute to improving the quality of inferences in future research.

2020 ◽  
Author(s):  
Norbert Hirschauer ◽  
Sven Gruener ◽  
Oliver Mußhoff ◽  
Claudia Becker

It has often been noted that the “null-hypothesis-significance-testing” (NHST) framework is an inconsistent hybrid of Neyman-Pearson’s “hypotheses testing” and Fisher’s “significance test-ing” approach that almost inevitably causes misinterpretations. To facilitate a realistic assessment of the potential and the limits of statistical inference, we briefly recall widespread inferential errors and outline the two original approaches of these famous statisticians. Based on the under-standing of their irreconcilable perspectives, we propose “going back to the roots” and using the initial evidence in the data in terms of the size and the uncertainty of the estimate for the pur-pose of statistical inference. Finally, we make six propositions that hopefully contribute to im-proving the quality of inferences in future research.


2021 ◽  
Author(s):  
Валерій Боснюк

Для підтвердження результатів дослідження в психологічних наукових роботах протягом багатьох років використовується процедура перевірки значущості нульової гіпотези (загальноприйнята абревіатура NHST – Null Hypothesis Significance Testing) із застосуванням спеціальних статистичних критеріїв. При цьому здебільшого значення статистики «p» (p-value) розглядається як еквівалент важливості отриманих результатів і сили наукових доказів на користь практичного й теоретичного ефекту дослідження. Таке некоректне використання та інтерпретації p-value ставить під сумнів застосування статистики взагалі та загрожує розвитку психології як науки. Ототожнення статистичного висновку з науковим висновком, орієнтація виключно на новизну в наукових дослідженнях, ритуальна прихильність дослідників до рівня значущості 0,05, опора на статистичну категоричність «так/ні» під час прийняття рішення призводить до того, що психологія примножує тільки результати про наявність ефекту без врахування його величини, практичної цінності. Дана робота призначена для аналізу обмеженості p-value при інтерпретації результатів психологічних досліджень та переваг представлення інформації про розмір ефекту. Застосування розмірів ефекту дозволить здійснити перехід від дихотомічного мислення до оціночного, визначати цінність результатів незалежно від рівня статистичної значущості, приймати рішення більш раціонально та обґрунтовано. Обґрунтовується позиція, що автор наукової роботи при формулюванні висновків дослідження не повинен обмежуватися одним єдиним показником рівня статистичної значущості. Осмислені висновки повинні базуватися на розумному балансуванні p-value та інших не менш важливих параметрів, одним з яких виступає розмір ефекту. Ефект (відмінність, зв’язок, асоціація) може бути статистично значущим, а його практична (клінічна) цінність – незначною, тривіальною. «Статистично значущий» не означає «корисний», «важливий», «цінний», «значний». Тому звернення уваги психологів до питання аналізу виявленого розміру ефекту має стати обов’язковим при інтерпретації результатів дослідження.


2017 ◽  
Author(s):  
Ivan Flis

The goal of the study was to descriptively analyze the understanding of null hypothesis significance testing among Croatian psychology students considering how it is usually understood in textbooks, which is subject to Bayesian and interpretative criticism. Also, the thesis represents a short overview of the discussions on the meaning of significance testing and how it is taught to students. There were 350 participants from undergraduate and graduate programs at five faculties in Croatia (Zagreb – Centre for Croatian Studies and Faculty of Humanities and Social Sciences, Rijeka, Zadar, Osijek). Another goal was to ascertain if the understanding of null hypothesis testing among psychology students can be predicted by their grades, attitudes and interests. The level of understanding of null hypothesis testing was measured by the Test of statistical significance misinterpretations (NHST test) (Oakes, 1986; Haller and Krauss, 2002). The attitudes toward null hypothesis significance testing were measured by a questionnaire that was constructed for this study. The grades were operationalized as the grade average of courses taken during undergraduate studies, and as a separate grade average of methodological courses taken during undergraduate and graduate studies. The students have shown limited understanding of null hypothesis testing – the percentage of correct answers in the NHST test was not higher than 56% for any of the six items. Croatian students have also shown less understanding on each item when compared to the German students in Haller and Krauss’s (2002) study. None of the variables – general grade average, average in the methodological courses, two variables measuring the attitude toward null hypothesis significance testing, failing at least one methodological course, and the variable of main interest in psychology – were predictive for the odds of answering the items in the NHST test correctly. The conclusion of the study is that education practices in teaching students the meaning and interpretation of null hypothesis significance testing have to be taken under consideration at Croatian psychology departments.


2019 ◽  
Author(s):  
Jan Sprenger

The replication crisis poses an enormous challenge to the epistemic authority of science and the logic of statistical inference in particular. Two prominent features of Null Hypothesis Significance Testing (NHST) arguably contribute to the crisis: the lack of guidance for interpreting non-significant results and the impossibility of quantifying support for the null hypothesis. In this paper, I argue that also popular alternatives to NHST, such as confidence intervals and Bayesian inference, do not lead to a satisfactory logic of evaluating hypothesis tests. As an alternative, I motivate and explicate the concept of corroboration of the null hypothesis. Finally I show how degrees of corroboration give an interpretation to non-significant results, combat publication bias and mitigate the replication crisis.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Matthias Haucke ◽  
Jonas Miosga ◽  
Rink Hoekstra ◽  
Don van Ravenzwaaij

A majority of statistically educated scientists draw incorrect conclusions based on the most commonly used statistical technique: null hypothesis significance testing (NHST). Frequentist techniques are often claimed to be incorrectly interpreted as Bayesian outcomes, which suggests that a Bayesian framework may fit better to inferences researchers frequently want to make (Briggs, 2012). The current study set out to test this proposition. Firstly, we investigated whether there is a discrepancy between what researchers think they can conclude and what they want to be able to conclude from NHST. Secondly, we investigated to what extent researchers want to incorporate prior study results and their personal beliefs in their statistical inference. Results show the expected discrepancy between what researchers think they can conclude from NHST and what they want to be able to conclude. Furthermore, researchers were interested in incorporating prior study results, but not their personal beliefs, into their statistical inference.


F1000Research ◽  
2017 ◽  
Vol 4 ◽  
pp. 621
Author(s):  
Cyril Pernet

Although thoroughly criticized, null hypothesis significance testing (NHST) remains the statistical method of choice used to provide evidence for an effect, in biological, biomedical and social sciences. In this short guide, I first summarize the concepts behind the method, distinguishing test of significance (Fisher) and test of acceptance (Newman-Pearson) and point to common interpretation errors regarding the p-value. I then present the related concepts of confidence intervals and again point to common interpretation errors. Finally, I discuss what should be reported in which context. The goal is to clarify concepts to avoid interpretation errors and propose simple reporting practices.


Author(s):  
David McGiffin ◽  
Geoff Cumming ◽  
Paul Myles

Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor’s fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three- number summary provided by effect estimates and confidence intervals is more informative and minimises over-interpretation of a “significant” result. P-values are an unreliable measure of strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.


2016 ◽  
Vol 77 (4) ◽  
pp. 663-672 ◽  
Author(s):  
Jeff Miller

Critics of null hypothesis significance testing suggest that (a) its basic logic is invalid and (b) it addresses a question that is of no interest. In contrast to (a), I argue that the underlying logic of hypothesis testing is actually extremely straightforward and compelling. To substantiate that, I present examples showing that hypothesis testing logic is routinely used in everyday life. These same examples also refute (b) by showing circumstances in which the logic of hypothesis testing addresses a question of prime interest. Null hypothesis significance testing may sometimes be misunderstood or misapplied, but these problems should be addressed by improved education.


F1000Research ◽  
2016 ◽  
Vol 4 ◽  
pp. 621 ◽  
Author(s):  
Cyril Pernet

Although thoroughly criticized, null hypothesis significance testing (NHST) remains the statistical method of choice used to provide evidence for an effect, in biological, biomedical and social sciences. In this short tutorial, I first summarize the concepts behind the method, distinguishing test of significance (Fisher) and test of acceptance (Newman-Pearson) and point to common interpretation errors regarding the p-value. I then present the related concepts of confidence intervals and again point to common interpretation errors. Finally, I discuss what should be reported in which context. The goal is to clarify concepts to avoid interpretation errors and propose reporting practices.


Sign in / Sign up

Export Citation Format

Share Document