scholarly journals Problems in usingp-curve analysis and text-mining to detect rate ofp-hacking and evidential value

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1715 ◽  
Author(s):  
Dorothy V.M. Bishop ◽  
Paul A. Thompson

Background.Thep-curve is a plot of the distribution ofp-values reported in a set of scientific studies. Comparisons between ranges ofp-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication,p-hacking.Methods.p-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigatingp-hacking.Results.We show that when there is ghostp-hacking, the shape of thep-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulatedp-hacked data do not give the “p-hacking bump” just below .05 that is regarded as evidence ofp-hacking, though there is a negative skew when simulated variables are inter-correlated. The wayp-curves vary according to features of underlying data poses problems when automated text mining is used to detectp-values in heterogeneous sets of published papers.Conclusions.The absence of a bump in thep-curve is not indicative of lack ofp-hacking. Furthermore, while studies with evidential value will usually generate a right-skewedp-curve, we cannot treat a right-skewedp-curve as an indicator of the extent of evidential value, unless we have a model specific to the type ofp-values entered into the analysis. We conclude that it is not feasible to use thep-curve to estimate the extent ofp-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular,p-hacking with ghost variables is likely to be missed.

2015 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values below .05 reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. We argue that binomial tests on the p-curve are not robust enough to be used for this purpose. Methods: P-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results: We first show that a p-curve suggestive of p-hacking can be obtained if researchers misapply parametric tests to data that depart from normality, even when no p-hacking occurs. We go on to show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions: A significant bump in the p-curve just below .05 is not necessarily evidence of p-hacking, and lack of a bump is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis.


2016 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods: P-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results: We first show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions: The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.


2016 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods: P-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results: We first show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions: The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.


2015 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values below .05 reported in a set of scientific studies. It has been used to identify bias in the selection of variables and analyses for publication, p-hacking. A recent study by Head et al. (2015) combined this approach with automated text-mining of p-values from a large corpus of published papers and concluded that although there was evidence of p-hacking, its effect was weak in relation to real effect sizes, and not likely to cause serious distortions in the literature. We argue that the methods used by these authors do not support this inference. Methods: P-hacking can take various forms. For the current paper, we developed R code to simulate the use of ghost variables, where an experimenter gathers data on numerous variables but reports only those with statistically significant effects. We also examined the text-mined dataset used by Head et al. to assess its suitability for investigating p-hacking. Results: For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" that is evidence of p-hacking. The p-curve develops a positive slope when simulated variables are highly intercorrelated, but does not show the excess of p-values in the p-curve just below .05 that has been regarded as indicative of extreme p-hacking. A right-skewed p-curve is obtained, as expected, when there is a true difference between groups, but it was also obtained in p-hacked datasets containing a high proportion of cases with a true null effect. The results of Head et al are further compromised because their automated text mining detected any p-value mentioned in the Results or Abstract of a paper, including those reported in the course of validation of materials or methods, or confirmation of well-established facts, as opposed to hypothesis-testing. There was no information on the statistical power of studies, nor on the statistical test conducted. Conclusions: We find two problems with the analysis by Head et al. First, though a significant bump in the p-curve just below .05 is good evidence of p-hacking, lack of a bump is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value. This is particularly the case when there is no control over the type of p-values entered into the analysis. The analysis presented here suggests that the potential for systematic bias is substantial. We conclude that the study by Head et al. provides evidence of p-hacking in the scientific literature, but it cannot be used to estimate the extent and consequences of p-hacking. Analysis of meta-analysed datasets avoids some of these problems, but will still miss an important type of p-hacking.


Sign in / Sign up

Export Citation Format

Share Document