scholarly journals A Frequentist Alternative to Significance Testing, p-Values, and Confidence Intervals

Econometrics ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 26 ◽  
Author(s):  
David Trafimow

There has been much debate about null hypothesis significance testing, p-values without null hypothesis significance testing, and confidence intervals. The first major section of the present article addresses some of the main reasons these procedures are problematic. The conclusion is that none of them are satisfactory. However, there is a new procedure, termed the a priori procedure (APP), that validly aids researchers in obtaining sample statistics that have acceptable probabilities of being close to their corresponding population parameters. The second major section provides a description and review of APP advances. Not only does the APP avoid the problems that plague other inferential statistical procedures, but it is easy to perform too. Although the APP can be performed in conjunction with other procedures, the present recommendation is that it be used alone.

2009 ◽  
Vol 217 (1) ◽  
pp. 15-26 ◽  
Author(s):  
Geoff Cumming ◽  
Fiona Fidler

Most questions across science call for quantitative answers, ideally, a single best estimate plus information about the precision of that estimate. A confidence interval (CI) expresses both efficiently. Early experimental psychologists sought quantitative answers, but for the last half century psychology has been dominated by the nonquantitative, dichotomous thinking of null hypothesis significance testing (NHST). The authors argue that psychology should rejoin mainstream science by asking better questions – those that demand quantitative answers – and using CIs to answer them. They explain CIs and a range of ways to think about them and use them to interpret data, especially by considering CIs as prediction intervals, which provide information about replication. They explain how to calculate CIs on means, proportions, correlations, and standardized effect sizes, and illustrate symmetric and asymmetric CIs. They also argue that information provided by CIs is more useful than that provided by p values, or by values of Killeen’s prep, the probability of replication.


2010 ◽  
Vol 3 (2) ◽  
pp. 106-112 ◽  
Author(s):  
Matthew J. Rinella ◽  
Jeremy J. James

AbstractNull hypothesis significance testing (NHST) forms the backbone of statistical inference in invasive plant science. Over 95% of research articles in Invasive Plant Science and Management report NHST results such as P-values or statistics closely related to P-values such as least significant differences. Unfortunately, NHST results are less informative than their ubiquity implies. P-values are hard to interpret and are regularly misinterpreted. Also, P-values do not provide estimates of the magnitudes and uncertainties of studied effects, and these effect size estimates are what invasive plant scientists care about most. In this paper, we reanalyze four datasets (two of our own and two of our colleagues; studies put forth as examples in this paper are used with permission of their authors) to illustrate limitations of NHST. The re-analyses are used to build a case for confidence intervals as preferable alternatives to P-values. Confidence intervals indicate effect sizes, and compared to P-values, confidence intervals provide more complete, intuitively appealing information on what data do/do not indicate.


Psychology ◽  
2020 ◽  
Author(s):  
David Trafimow

There are two main inferential statistical camps in psychology: frequentists and Bayesians. Within the frequentist camp, most researchers support the null hypothesis significance testing procedure but support is growing for using confidence intervals. The Bayesian camp holds a diversity of views that cannot be covered adequately here. Many researchers advocate power analysis to determine sample sizes. Finally, the a priori procedure is a promising new way to think about inferential statistics.


2016 ◽  
Vol 77 (5) ◽  
pp. 831-854 ◽  
Author(s):  
David Trafimow

There has been much controversy over the null hypothesis significance testing procedure, with much of the criticism centered on the problem of inverse inference. Specifically, p gives the probability of the finding (or one more extreme) given the null hypothesis, whereas the null hypothesis significance testing procedure involves drawing a conclusion about the null hypothesis given the finding. Many critics have called for null hypothesis significance tests to be replaced with confidence intervals. However, confidence intervals also suffer from a version of the inverse inference problem. The only known solution to the inverse inference problem is to use the famous theorem by Bayes, but this involves commitments that many researchers are not willing to make. However, it is possible to ask a useful question for which inverse inference is not a problem and that leads to the computation of the coefficient of confidence. In turn, and much more important, using the coefficient of confidence implies the desirability of switching from the current emphasis on a posteriori inferential statistics to an emphasis on a priori inferential statistics.


2005 ◽  
Vol 35 (1) ◽  
pp. 1-20 ◽  
Author(s):  
G. K. Huysamen

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S773-S773
Author(s):  
Christopher Brydges ◽  
Allison A Bielak

Abstract Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research. Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported. Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis. Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


Author(s):  
Tamás Ferenci ◽  
Levente Kovács

Null hypothesis significance testing dominates the current biostatistical practice. However, this routine has many flaws, in particular p-values are very often misused and misinterpreted. Several solutions has been suggested to remedy this situation, the application of Bayes Factors being perhaps the most well-known. Nevertheless, even Bayes Factors are very seldom applied in medical research. This paper investigates the application of Bayes Factors in the analysis of a realistic medical problem using actual data from a representative US survey, and compares the results to those obtained with traditional means. Linear regression is used as an example as it is one of the most basic tools in biostatistics. The effect of sample size and sampling variation is investigated (with resampling) as well as the impact of the choice of prior. Results show that there is a strong relationship between p-values and Bayes Factors, especially for large samples. The application of Bayes Factors should be encouraged evenin spite of this, as the message they convey is much more instructive and scientifically correct than the current typical practice.


2019 ◽  
Author(s):  
Jan Sprenger

The replication crisis poses an enormous challenge to the epistemic authority of science and the logic of statistical inference in particular. Two prominent features of Null Hypothesis Significance Testing (NHST) arguably contribute to the crisis: the lack of guidance for interpreting non-significant results and the impossibility of quantifying support for the null hypothesis. In this paper, I argue that also popular alternatives to NHST, such as confidence intervals and Bayesian inference, do not lead to a satisfactory logic of evaluating hypothesis tests. As an alternative, I motivate and explicate the concept of corroboration of the null hypothesis. Finally I show how degrees of corroboration give an interpretation to non-significant results, combat publication bias and mitigate the replication crisis.


2021 ◽  
Author(s):  
Mark Rubin

Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.


Sign in / Sign up

Export Citation Format

Share Document