Degrees of Corroboration: An Antidote to the Replication Crisis

Mapping Intimacies ◽

10.31234/osf.io/fdkqg ◽

2019 ◽

Author(s):

Jan Sprenger

Keyword(s):

Bayesian Inference ◽

Statistical Inference ◽

Confidence Intervals ◽

Publication Bias ◽

Null Hypothesis ◽

Significance Testing ◽

Epistemic Authority ◽

Hypothesis Tests ◽

Null Hypothesis Significance Testing ◽

Replication Crisis

The replication crisis poses an enormous challenge to the epistemic authority of science and the logic of statistical inference in particular. Two prominent features of Null Hypothesis Significance Testing (NHST) arguably contribute to the crisis: the lack of guidance for interpreting non-significant results and the impossibility of quantifying support for the null hypothesis. In this paper, I argue that also popular alternatives to NHST, such as confidence intervals and Bayesian inference, do not lead to a satisfactory logic of evaluating hypothesis tests. As an alternative, I motivate and explicate the concept of corroboration of the null hypothesis. Finally I show how degrees of corroboration give an interpretation to non-significant results, combat publication bias and mitigate the replication crisis.

Download Full-text

Scientific Self-Correction: The Bayesian Way

10.31234/osf.io/daw3q ◽

2019 ◽

Author(s):

Felipe Romero ◽

Jan Sprenger

Keyword(s):

Bayesian Inference ◽

Bayesian Statistics ◽

Experimental Research ◽

Null Hypothesis ◽

Effect Sizes ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Scientific Disciplines ◽

Replication Crisis ◽

Reliable Knowledge

The enduring replication crisis in many scientific disciplines casts doubt on the ability of science to self-correct its findings and to produce reliable knowledge. Amongst a variety of possible methodological, social, and statistical reforms to address the crisis, we focus on replacing null hypothesis significance testing (NHST) with Bayesian inference. On the basis of a simulation study for meta-analytic aggregation of effect sizes, we study the relative advantages of this Bayesian reform, and its interaction with widespread limitations in experimental research. Moving to Bayesian statistics will not solve the replication crisis single-handely, but would eliminate important sources of effect size overestimation for the conditions we study.

Download Full-text

A Frequentist Alternative to Significance Testing, p-Values, and Confidence Intervals

Econometrics ◽

10.3390/econometrics7020026 ◽

2019 ◽

Vol 7 (2) ◽

pp. 26 ◽

Cited By ~ 7

Author(s):

David Trafimow

Keyword(s):

Present Article ◽

Confidence Intervals ◽

Null Hypothesis ◽

A Priori ◽

Significance Testing ◽

Population Parameters ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Statistical Procedures ◽

Major Section

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.

Download Full-text

Null Hypothesis Significance Testing: Ramifications, Ruminations and Recommendations

South African Journal of Psychology ◽

10.1177/008124630503500101 ◽

2005 ◽

Vol 35 (1) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

G. K. Huysamen

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Effect Size ◽

Null Hypothesis ◽

Significance Testing ◽

Population Parameter ◽

Size Estimation ◽

Null Hypothesis Significance Testing ◽

Point Estimates ◽

Size Estimates

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.

Download Full-text

Confidence Intervals

Zeitschrift für Psychologie / Journal of Psychology ◽

10.1027/0044-3409.217.1.15 ◽

2009 ◽

Vol 217 (1) ◽

pp. 15-26 ◽

Cited By ~ 43

Author(s):

Geoff Cumming ◽

Fiona Fidler

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Null Hypothesis ◽

Prediction Intervals ◽

Effect Sizes ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Standardized Effect Sizes ◽

Mainstream Science

Most questions across science call for quantitative answers, ideally, a single best estimate plus information about the precision of that estimate. A confidence interval (CI) expresses both efficiently. Early experimental psychologists sought quantitative answers, but for the last half century psychology has been dominated by the nonquantitative, dichotomous thinking of null hypothesis significance testing (NHST). The authors argue that psychology should rejoin mainstream science by asking better questions – those that demand quantitative answers – and using CIs to answer them. They explain CIs and a range of ways to think about them and use them to interpret data, especially by considering CIs as prediction intervals, which provide information about replication. They explain how to calculate CIs on means, proportions, correlations, and standardized effect sizes, and illustrate symmetric and asymmetric CIs. They also argue that information provided by CIs is more useful than that provided by p values, or by values of Killeen’s prep, the probability of replication.

Download Full-text

When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing.

10.31234/osf.io/qc9kf ◽

2021 ◽

Author(s):

Mark Rubin

Keyword(s):

Present Article ◽

Null Hypothesis ◽

Multiple Testing ◽

Multiple Comparisons ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Significance Threshold ◽

Alpha Level ◽

Replication Crisis ◽

Test Result

Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.

Download Full-text

Ο έλεγχος μηδενικών υποθέσεων: διαδικασία, παρανοήσεις και μερικές προτάσεις για καλύτερες πρακτικές

Psychology: the Journal of the Hellenic Psychological Society ◽

10.12681/psy_hps.23720 ◽

2020 ◽

Vol 18 (2) ◽

pp. 224

Author(s):

Πέτρος Ρούσσος

Keyword(s):

Confidence Intervals ◽

Null Hypothesis ◽

Research Data ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Research Results ◽

Error Bars ◽

Reform Debate

The rationale of Null Hypothesis Significance Testing (NHST) is described, and the consequences of its hybridism are discussed. The paper presents examples published in “PSYCHOLOGY: The Journal of the HPS” refer to NHST and interpret its outcomes. We examined the 445 articles published between 1992 and 2010. We noted misuses of NHST and searched for any use of confidence intervals or error bars or use of these to support interpretation. Part of the paper focuses on the statistical-reform debate and provides detailed guidance about good statistical practices in the analysis of research data and the interpretation of findings. The proposed guide does not fall into the trap of mandating the use of particular procedures; it rather aims to support readers’ understanding of research results.

Download Full-text

Bayesian Frequentists: Examining the Paradox Between What Researchers Can Conclude Versus What They Want to Conclude From Statistical Results

Collabra Psychology ◽

10.1525/collabra.19026 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Matthias Haucke ◽

Jonas Miosga ◽

Rink Hoekstra ◽

Don van Ravenzwaaij

Keyword(s):

Statistical Inference ◽

Null Hypothesis ◽

Statistical Technique ◽

Bayesian Framework ◽

Significance Testing ◽

Personal Beliefs ◽

Null Hypothesis Significance Testing ◽

Study Results ◽

Statistical Results

A majority of statistically educated scientists draw incorrect conclusions based on the most commonly used statistical technique: null hypothesis significance testing (NHST). Frequentist techniques are often claimed to be incorrectly interpreted as Bayesian outcomes, which suggests that a Bayesian framework may fit better to inferences researchers frequently want to make (Briggs, 2012). The current study set out to test this proposition. Firstly, we investigated whether there is a discrepancy between what researchers think they can conclude and what they want to be able to conclude from NHST. Secondly, we investigated to what extent researchers want to incorporate prior study results and their personal beliefs in their statistical inference. Results show the expected discrepancy between what researchers think they can conclude from NHST and what they want to be able to conclude. Furthermore, researchers were interested in incorporating prior study results, but not their personal beliefs, into their statistical inference.

Download Full-text

Null hypothesis significance testing: a guide to commonly misunderstood concepts and recommendations for good practice

F1000Research ◽

10.12688/f1000research.6963.5 ◽

2017 ◽

Vol 4 ◽

pp. 621

Author(s):

Cyril Pernet

Keyword(s):

Social Sciences ◽

Confidence Intervals ◽

Null Hypothesis ◽

Good Practice ◽

Significance Testing ◽

P Value ◽

Null Hypothesis Significance Testing ◽

Reporting Practices ◽

Interpretation Errors ◽

Test Of Significance

Although thoroughly criticized, null hypothesis significance testing (NHST) remains the statistical method of choice used to provide evidence for an effect, in biological, biomedical and social sciences. In this short guide, I first summarize the concepts behind the method, distinguishing test of significance (Fisher) and test of acceptance (Newman-Pearson) and point to common interpretation errors regarding the p-value. I then present the related concepts of confidence intervals and again point to common interpretation errors. Finally, I discuss what should be reported in which context. The goal is to clarify concepts to avoid interpretation errors and propose simple reporting practices.

Download Full-text

A peculiar prevalence of p values just below .05

Quarterly Journal of Experimental Psychology ◽

10.1080/17470218.2012.711335 ◽

2012 ◽

Vol 65 (11) ◽

pp. 2271-2279 ◽

Cited By ~ 127

Author(s):

E.J. Masicampo ◽

Daniel R. Lalande

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Degrees Of Freedom ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

P Values ◽

Large Subset ◽

Potential Sources ◽

Psychology Literature

In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the distribution of p values reported in the psychology literature. We examined a large subset of papers from three highly regarded journals. Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges. This prevalence of p values just below the arbitrary criterion for significance was observed in all three journals. We discuss potential sources of this pattern, including publication bias and researcher degrees of freedom.

Download Full-text

Probabilistic Misconceptions Are Pervasive Among Communication Researchers

10.31235/osf.io/h8zbe ◽

2018 ◽

Author(s):

Eike Mark Rinke ◽

Frank M. Schneider

Keyword(s):

Confidence Intervals ◽

Null Hypothesis ◽

Significance Test ◽

Significance Testing ◽

Communication Research ◽

Null Hypothesis Significance Testing ◽

Popular Approach ◽

Probabilistic Misconceptions ◽

Evidential Basis

Across all areas of communication research, the most popular approach to generating insights about communication is the classical significance test (also called null hypothesis significance testing, NHST). The predominance of NHST in communication research is in spite of serious concerns about the ability of researchers to properly interpret its results. We draw on data from a survey of the ICA membership to assess the evidential basis of these concerns. The vast majority of communication researchers misinterpreted NHST (91%) and the most prominent alternative, confidence intervals (96%), while overestimating their competence. Academic seniority and statistical experience did not predict better interpretation outcomes. These findings indicate major problems regarding the generation of knowledge in the field of communication research.

Download Full-text