scholarly journals Witt_Criteria_For_Statistical_Significance_v1

2018 ◽  
Author(s):  
Jessica K. Witt

What is best criterion for determining statistical significance? In psychology, the criterion has been p < .05. This criterion has been criticized since its inception, and the criticisms have only heighted with recent failures to replicate studies published in top psychology journals. Several replacement criteria have been suggested including reducing the alpha level to .005 or switching to other types of criteria such as Bayes factors or effect sizes. Here, various criteria for statistical significance were evaluated using signal detection analysis on the outcomes of simulated data. With respect to the ability to discriminate between true effects and null effects, both p-values and Bayes factors resulted in fairly high discriminability, and performance was equivalent across both. Discriminability was better for effect size. With respect to bias, the specific thresholds that produced maximally-optimal utility depended on sample size, although this dependency was particularly notable for p-values and less so for Bayes factors. These simulations help illustrate some of the main themes regarding the interpretation of p-values and statistical significance. Importantly, the novel application of signal detection theory to the issue of statistical significance highlights the need to focus on both false alarms and misses, rather than false alarms alone.

2019 ◽  
Vol 3 ◽  
Author(s):  
Jessica K. Witt

What is best criterion for determining statistical significance? In psychology, the criterion has been p < .05. This criterion has been criticized since its inception, and the criticisms have been rejuvenated with recent failures to replicate studies published in top psychology journals. Several replacement criteria have been suggested including reducing the alpha level to .005 or switching to other types of criteria such as Bayes factors or effect sizes. Here, various decision criteria for statistical significance were evaluated using signal detection analysis on the outcomes of simulated data. The signal detection measure of area under the curve (AUC) is a measure of discriminability with a value of 1 indicating perfect discriminability and 0.5 indicating chance performance. Applied to criteria for statistical significance, it provides an estimate of the decision criterion’s performance in discriminating real effects from null effects. AUCs were high (M = .96, median = .97) for p values, suggesting merit in using p values to discriminate significant effects. AUCs can be used to assess methodological questions such as how much improvement will be gained with increased sample size, how much discriminability will be lost with questionable research practices, and whether it is better to run a single high-powered study or a study plus a replication at lower powers. AUCs were also used to compare performance across p values, Bayes factors, and effect size (Cohen’s d). AUCs were equivalent for p values and Bayes factors and were slightly higher for effect size. Signal detection analysis provides separate measures of discriminability and bias. With respect to bias, the specific thresholds that produced maximally-optimal utility depended on sample size, although this dependency was particularly notable for p values and less so for Bayes factors. The application of signal detection theory to the issue of statistical significance highlights the need to focus on both false alarms and misses, rather than false alarms alone.


2017 ◽  
Vol 32 (3) ◽  
pp. 243-258 ◽  
Author(s):  
Melissa F. Colloff ◽  
Kimberley A. Wade ◽  
John T. Wixted ◽  
Elizabeth A. Maylor

Sign in / Sign up

Export Citation Format

Share Document