Witt_Criteria_For_Statistical_Significance_v1
What is best criterion for determining statistical significance? In psychology, the criterion has been p < .05. This criterion has been criticized since its inception, and the criticisms have only heighted with recent failures to replicate studies published in top psychology journals. Several replacement criteria have been suggested including reducing the alpha level to .005 or switching to other types of criteria such as Bayes factors or effect sizes. Here, various criteria for statistical significance were evaluated using signal detection analysis on the outcomes of simulated data. With respect to the ability to discriminate between true effects and null effects, both p-values and Bayes factors resulted in fairly high discriminability, and performance was equivalent across both. Discriminability was better for effect size. With respect to bias, the specific thresholds that produced maximally-optimal utility depended on sample size, although this dependency was particularly notable for p-values and less so for Bayes factors. These simulations help illustrate some of the main themes regarding the interpretation of p-values and statistical significance. Importantly, the novel application of signal detection theory to the issue of statistical significance highlights the need to focus on both false alarms and misses, rather than false alarms alone.