scholarly journals Do infants have a sense of numerosity? A p-curve analysis of infant numerosity discrimination studies

2017 ◽  
Author(s):  
Rachael Smyth ◽  
Daniel Ansari

Research demonstrating that infants discriminate between small and large numerosities is central to theories concerning the origins of human numerical abilities. To date, there has been no quantitative meta-analysis of the infant numerical competency data. Here, we quantitatively synthesize the evidential value of the available literature on infant numerosity discrimination using a meta-analytic tool called p-curve, in which the distribution of available p-values is analyzed to determine whether the published literature examining particular hypotheses contains evidential value. P-curves demonstrated evidential value for the hypotheses that infants can discriminate between both small and large numerosities. However, the analyses also revealed that the data on infants’ ability to discriminate between large numerosities is less robust and statistically powered than the data on their ability to discriminate small numerosities. We argue there is a need for adequately powered replication studies to enable stronger inferences when grounding theories concerning the ontogenesis of numerical cognition.

2015 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values below .05 reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. We argue that binomial tests on the p-curve are not robust enough to be used for this purpose. Methods: P-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results: We first show that a p-curve suggestive of p-hacking can be obtained if researchers misapply parametric tests to data that depart from normality, even when no p-hacking occurs. We go on to show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions: A significant bump in the p-curve just below .05 is not necessarily evidence of p-hacking, and lack of a bump is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis.


2018 ◽  
Author(s):  
Iris van Kuijk ◽  
Peter Verkoeijen ◽  
Katinka Dijkstra ◽  
Rolf Antonius Zwaan

The results reported by Kidd and Castano (2013) indicated that reading a short passage of literary fiction improves theory of mind (ToM) relative to reading popular fiction. However, when we entered Kidd and Castano’s results in a p-curve analysis, it turned out that the evidential value of their findings is low. It is good practice to back up a p-curve analysis of a single paper with an adequately powered direct replication of at least one of the studies in the p-curve analysis. Therefore, we conducted a direct replication of the literary fiction condition and the popular fiction condition from Kidd and Castano’s Experiment 5 to scrutinize the effect of reading literary fiction on ToM. The results of this replication were largely consistent with Kidd and Castano’s original findings. Furthermore, we conducted a small-scale meta-analysis on the findings of the present study, those of Kidd and Castano and those reported in other published direct replications. The meta-analytic effect of reading literary fiction on ToM was small and non-significant but there was considerable heterogeneity between the included studies. The results of the present study and of the small-scale meta-analysis are discussed in the light of reading-times exclusion criteria as well as reliability and validity of ToM measured.


2018 ◽  
Vol 4 (1) ◽  
Author(s):  
Iris van Kuijk ◽  
Peter Verkoeijen ◽  
Katinka Dijkstra ◽  
Rolf A. Zwaan

The results reported by Kidd and Castano (2013) indicated that reading a short passage of literary fiction improves theory of mind (ToM) relative to reading popular fiction. However, when we entered Kidd and Castano’s results in a p-curve analysis, it turned out that the evidential value of their findings is low. It is good practice to back up a p-curve analysis of a single paper with an adequately powered direct replication of at least one of the studies in the p-curve analysis. Therefore, we conducted a direct replication of the literary fiction condition and the popular fiction condition from Kidd and Castano’s Experiment 5 to scrutinize the effect of reading literary fiction on ToM. The results of this replication were largely consistent with Kidd and Castano’s original findings. Furthermore, we conducted a small-scale meta-analysis on the findings of the present study, those of Kidd and Castano and those reported in other published direct replications. The meta-analytic effect of reading literary fiction on ToM was small and non-significant but there was considerable heterogeneity between the included studies. The results of the present study and of the small-scale meta-analysis are discussed in the light of reading-times exclusion criteria as well as reliability and validity of ToM measures.


2016 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods: P-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results: We first show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions: The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.


2016 ◽  
Author(s):  
Dorothy V Bishop ◽  
Paul A Thompson

Background: The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods: P-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results: We first show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions: The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.


2020 ◽  
Vol 228 (1) ◽  
pp. 43-49 ◽  
Author(s):  
Michael Kossmeier ◽  
Ulrich S. Tran ◽  
Martin Voracek

Abstract. Currently, dedicated graphical displays to depict study-level statistical power in the context of meta-analysis are unavailable. Here, we introduce the sunset (power-enhanced) funnel plot to visualize this relevant information for assessing the credibility, or evidential value, of a set of studies. The sunset funnel plot highlights the statistical power of primary studies to detect an underlying true effect of interest in the well-known funnel display with color-coded power regions and a second power axis. This graphical display allows meta-analysts to incorporate power considerations into classic funnel plot assessments of small-study effects. Nominally significant, but low-powered, studies might be seen as less credible and as more likely being affected by selective reporting. We exemplify the application of the sunset funnel plot with two published meta-analyses from medicine and psychology. Software to create this variation of the funnel plot is provided via a tailored R function. In conclusion, the sunset (power-enhanced) funnel plot is a novel and useful graphical display to critically examine and to present study-level power in the context of meta-analysis.


2019 ◽  
Vol 227 (1) ◽  
pp. 64-82 ◽  
Author(s):  
Martin Voracek ◽  
Michael Kossmeier ◽  
Ulrich S. Tran

Abstract. Which data to analyze, and how, are fundamental questions of all empirical research. As there are always numerous flexibilities in data-analytic decisions (a “garden of forking paths”), this poses perennial problems to all empirical research. Specification-curve analysis and multiverse analysis have recently been proposed as solutions to these issues. Building on the structural analogies between primary data analysis and meta-analysis, we transform and adapt these approaches to the meta-analytic level, in tandem with combinatorial meta-analysis. We explain the rationale of this idea, suggest descriptive and inferential statistical procedures, as well as graphical displays, provide code for meta-analytic practitioners to generate and use these, and present a fully worked real example from digit ratio (2D:4D) research, totaling 1,592 meta-analytic specifications. Specification-curve and multiverse meta-analysis holds promise to resolve conflicting meta-analyses, contested evidence, controversial empirical literatures, and polarized research, and to mitigate the associated detrimental effects of these phenomena on research progress.


2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


Sign in / Sign up

Export Citation Format

Share Document