The logic of null hypothesis testing

In this commentary, I agree with Chow's treatment of null hypothesis significance testing as a noninferential procedure. However, I dispute his reconstruction of the logic of theory corroboration. I also challenge recent criticisms of NHSTP based on power analysis and meta-analysis.

Download Full-text

Do psychology students interpret null hypothesis significance testing critically?

10.31237/osf.io/9dz8w ◽

2017 ◽

Author(s):

Ivan Flis

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Statistical Significance ◽

Psychology Students ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Graduate Studies ◽

Null Hypothesis Testing ◽

Level Of Understanding ◽

Grade Average

The goal of the study was to descriptively analyze the understanding of null hypothesis significance testing among Croatian psychology students considering how it is usually understood in textbooks, which is subject to Bayesian and interpretative criticism. Also, the thesis represents a short overview of the discussions on the meaning of significance testing and how it is taught to students. There were 350 participants from undergraduate and graduate programs at five faculties in Croatia (Zagreb – Centre for Croatian Studies and Faculty of Humanities and Social Sciences, Rijeka, Zadar, Osijek). Another goal was to ascertain if the understanding of null hypothesis testing among psychology students can be predicted by their grades, attitudes and interests. The level of understanding of null hypothesis testing was measured by the Test of statistical significance misinterpretations (NHST test) (Oakes, 1986; Haller and Krauss, 2002). The attitudes toward null hypothesis significance testing were measured by a questionnaire that was constructed for this study. The grades were operationalized as the grade average of courses taken during undergraduate studies, and as a separate grade average of methodological courses taken during undergraduate and graduate studies. The students have shown limited understanding of null hypothesis testing – the percentage of correct answers in the NHST test was not higher than 56% for any of the six items. Croatian students have also shown less understanding on each item when compared to the German students in Haller and Krauss’s (2002) study. None of the variables – general grade average, average in the methodological courses, two variables measuring the attitude toward null hypothesis significance testing, failing at least one methodological course, and the variable of main interest in psychology – were predictive for the odds of answering the items in the NHST test correctly. The conclusion of the study is that education practices in teaching students the meaning and interpretation of null hypothesis significance testing have to be taken under consideration at Croatian psychology departments.

Download Full-text

Statistical significance testing, hypothetico-deductive method, and theory evaluation

Behavioral and Brain Sciences ◽

10.1017/s0140525x00262444 ◽

2000 ◽

Vol 23 (2) ◽

pp. 292-293 ◽

Cited By ~ 1

Author(s):

Brian D. Haig

Keyword(s):

Null Hypothesis ◽

Meta Analysis ◽

Statistical Significance ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Statistical Significance Testing ◽

Theory Evaluation ◽

Limited Role ◽

Deductive Method

Chow's endorsement of a limited role for null hypothesis significance testing is a needed corrective of research malpractice, but his decision to place this procedure in a hypothetico-deductive framework of Popperian cast is unwise. Various failures of this version of the hypothetico-deductive method have negative implications for Chow's treatment of significance testing, meta-analysis, and theory evaluation.

Download Full-text

Adopting a Meta-Generative Way of Thinking in the Field of Education via the Use of Bayesian Methods: A Multimethod Approach in a Post-Truth and COVID-19 Era1

International Journal of Multiple Research Approaches ◽

10.29034/ijmra.v12n1editorial2 ◽

2020 ◽

pp. 4-19 ◽

Cited By ~ 1

Author(s):

Prathiba Natesan Batley ◽

Peter Boedeker ◽

Anthony J. Onwuegbuzie

Keyword(s):

Bayesian Methods ◽

Null Hypothesis ◽

Quantitative Research ◽

Meta Analysis ◽

Research Synthesis ◽

Research Data ◽

Policy Implications ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Multimethod Approach

In this editorial, we introduce the multimethod concept of thinking meta-generatively, which we define as directly integrating findings from the extant literature during the data collection, analysis, and interpretation phases of primary studies. We demonstrate that meta-generative thinking goes further than do other research synthesis techniques (e.g., meta-analysis) because it involves meta-synthesis not only across studies but also within studies—thereby representing a multimethod approach. We describe how meta-generative thinking can be maximized/optimized with respect to quantitative research data/findings via the use of Bayesian methodology that has been shown to be superior to the inherently flawed null hypothesis significance testing. We contend that Bayesian meta-generative thinking is essential, given the potential for divisiveness and far-reaching sociopolitical, educational, and health policy implications of findings that lack generativity in a post-truth and COVID-19 era.

Download Full-text

Hypothesis Testing in the Real World

Educational and Psychological Measurement ◽

10.1177/0013164416667984 ◽

2016 ◽

Vol 77 (4) ◽

pp. 663-672 ◽

Cited By ~ 1

Author(s):

Jeff Miller

Keyword(s):

Hypothesis Testing ◽

Everyday Life ◽

Real World ◽

Null Hypothesis ◽

Significance Testing ◽

Basic Logic ◽

Null Hypothesis Significance Testing ◽

The Real

Critics of null hypothesis significance testing suggest that (a) its basic logic is invalid and (b) it addresses a question that is of no interest. In contrast to (a), I argue that the underlying logic of hypothesis testing is actually extremely straightforward and compelling. To substantiate that, I present examples showing that hypothesis testing logic is routinely used in everyday life. These same examples also refute (b) by showing circumstances in which the logic of hypothesis testing addresses a question of prime interest. Null hypothesis significance testing may sometimes be misunderstood or misapplied, but these problems should be addressed by improved education.

Download Full-text

A Primer on p-Value Thresholds and α-Levels – Two Different Kettles of Fish

German Journal of Agricultural Economics ◽

10.30430/70.2021.2.123-133 ◽

2021 ◽

Vol 70 (2) ◽

pp. 123-133

Author(s):

Norbert Hirschauer ◽

Sven Grüner ◽

Oliver Mußhoff ◽

Claudia Becker

Keyword(s):

Hypothesis Testing ◽

Statistical Inference ◽

Null Hypothesis ◽

Significance Testing ◽

Future Research ◽

P Value ◽

Null Hypothesis Significance Testing ◽

Realistic Assessment

It has often been noted that the “null-hypothesis-significance-testing” (NHST) framework is an inconsistent hybrid of Neyman-Pearson’s “hypothesis testing” and Fisher’s “significance testing” that almost inevitably causes misinterpretations. To facilitate a realistic assessment of the potential and the limits of statistical inference, we briefly recall widespread inferential errors and outline the two original approaches of these famous statisticians. Based on the understanding of their irreconcilable perspectives, we propose “going back to the roots” and using the initial evidence in the data in terms of the size and the uncertainty of the estimate for the purpose of statistical inference. Finally, we make six propositions that hopefully contribute to improving the quality of inferences in future research.

Download Full-text

When the Numbers Do Not Add Up: The Practical Limits of Stochastologicals for Soft Psychology

Perspectives on Psychological Science ◽

10.1177/1745691620970557 ◽

2021 ◽

pp. 174569162097055

Author(s):

Nick J. Broers

Keyword(s):

High Risk ◽

Null Hypothesis ◽

Meta Analysis ◽

Effect Sizes ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Psychological Science ◽

Body Of Knowledge ◽

Psychological Theories ◽

Time Continuum

One particular weakness of psychology that was left implicit by Meehl is the fact that psychological theories tend to be verbal theories, permitting at best ordinal predictions. Such predictions do not enable the high-risk tests that would strengthen our belief in the verisimilitude of theories but instead lead to the practice of null-hypothesis significance testing, a practice Meehl believed to be a major reason for the slow theoretical progress of soft psychology. The rising popularity of meta-analysis has led some to argue that we should move away from significance testing and focus on the size and stability of effects instead. Proponents of this reform assume that a greater emphasis on quantity can help psychology to develop a cumulative body of knowledge. The crucial question in this endeavor is whether the resulting numbers really have theoretical meaning. Psychological science lacks an undisputed, preexisting domain of observations analogous to the observations in the space-time continuum in physics. It is argued that, for this reason, effect sizes do not really exist independently of the adopted research design that led to their manifestation. Consequently, they can have no bearing on the verisimilitude of a theory.

Download Full-text

Inferential Statistics in Psychology

Psychology ◽

10.1093/obo/9780199828340-0264 ◽

2020 ◽

Author(s):

David Trafimow

Keyword(s):

Confidence Intervals ◽

Null Hypothesis ◽

Power Analysis ◽

A Priori ◽

Testing Procedure ◽

Significance Testing ◽

Sample Sizes ◽

Null Hypothesis Significance Testing ◽

Inferential Statistics

There are two main inferential statistical camps in psychology: frequentists and Bayesians. Within the frequentist camp, most researchers support the null hypothesis significance testing procedure but support is growing for using confidence intervals. The Bayesian camp holds a diversity of views that cannot be covered adequately here. Many researchers advocate power analysis to determine sample sizes. Finally, the a priori procedure is a promising new way to think about inferential statistics.

Download Full-text

Power Analysis for Null Hypothesis Significance Testing

Clinical Spine Surgery ◽

10.1097/bsd.0000000000001079 ◽

2020 ◽

Vol Publish Ahead of Print ◽

Author(s):

Kristen J. Nicholson ◽

Ariana A. Reyes ◽

Matthew Sherman ◽

Srikanth N. Divi ◽

Alex R. Vaccaro

Keyword(s):

Null Hypothesis ◽

Power Analysis ◽

Significance Testing ◽

Null Hypothesis Significance Testing

Download Full-text

Bayesian evaluation of effect size after replicating an original study

10.31234/osf.io/g3sne ◽

2017 ◽

Cited By ~ 1

Author(s):

Robbie Cornelis Maria van Aert ◽

Marcel A. L. M. van Assen

Keyword(s):

Effect Size ◽

Web Application ◽

Null Hypothesis ◽

Meta Analysis ◽

Original Study ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

True Effect Size ◽

Psychology And Economics ◽

Zero Effect

The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method.

Download Full-text