scholarly journals Infants’ Social Evaluation of Helpers and Hinderers: A Large-Scale, Multi-Lab, Coordinated Replication Study

2021 ◽  
Author(s):  
Kelsey Lucca ◽  
Arthur Capelier-Mourguy ◽  
Krista Byers-Heinlein ◽  
Laura Cirelli ◽  
Rodrigo Dal Ben ◽  
...  

Evaluating others’ actions as praiseworthy or blameworthy is a fundamental aspect of human nature. A seminal study published in 2007 suggested that the ability to form social evaluations based on third-party interactions emerges within the first year of life, considerably earlier than previously thought (Hamlin, Wynn, & Bloom, 2007). In this study, infants demonstrated a preference for a character (i.e., a shape with eyes) who helped, over one who hindered, another character who tried but failed to climb a hill. This study sparked a new line of inquiry into infants’ social evaluations; however, numerous attempts to replicate the original findings yielded mixed results, with some reporting effects not reliably different from chance. These failed replications point to at least two possibilities: (1) the original study may have overestimated the true effect size of infants’ preference for helpers, or (2) key methodological or contextual differences from the original study may have compromised the replication attempts. Here we present a pre-registered, closely coordinated, multi-laboratory, standardized study aimed at replicating the helping/hindering finding using a well-controlled video version of the hill show. We intended to (1) provide a precise estimate of the true effect size of infants’ preference for helpers over hinderers, and (2) determine the degree to which infants’ preferences are based on social features of the Helper/Hinderer scenarios. XYZ labs participated in the study yielding a total sample size of XYZ infants between the ages of 5.5 and 10.5 months. Brief summary of results will be added after data collection.

2018 ◽  
Author(s):  
Robbie Cornelis Maria van Aert

More and more scientific research gets published nowadays, asking for statistical methods that enable researchers to get an overview of the literature in a particular research field. For that purpose, meta-analysis methods were developed that can be used for statistically combining the effect sizes from independent primary studies on the same topic. My dissertation focuses on two issues that are crucial when conducting a meta-analysis: publication bias and heterogeneity in primary studies’ true effect sizes. Accurate estimation of both the meta-analytic effect size as well as the between-study variance in true effect size is crucial since the results of meta-analyses are often used for policy making. Publication bias distorts the results of a meta-analysis since it refers to situations where publication of a primary study depends on its results. We developed new meta-analysis methods, p-uniform and p-uniform*, which estimate effect sizes corrected for publication bias and also test for publication bias. Although the methods perform well in many conditions, these and the other existing methods are shown not to perform well when researchers use questionable research practices. Additionally, when publication bias is absent or limited, traditional methods that do not correct for publication bias outperform p¬-uniform and p-uniform*. Surprisingly, we found no strong evidence for the presence of publication bias in our pre-registered study on the presence of publication bias in a large-scale data set consisting of 83 meta-analyses and 499 systematic reviews published in the fields of psychology and medicine. We also developed two methods for meta-analyzing a statistically significant published original study and a replication of that study, which reflects a situation often encountered by researchers. One method is a frequentist whereas the other method is a Bayesian statistical method. Both methods are shown to perform better than traditional meta-analytic methods that do not take the statistical significance of the original study into account. Analytical studies of both methods also show that sometimes the original study is better discarded for optimal estimation of the true effect size. Finally, we developed a program for determining the required sample size in a replication analogous to power analysis in null hypothesis testing. Computing the required sample size with the method revealed that large sample sizes (approximately 650 participants) are required to be able to distinguish a zero from a small true effect.Finally, in the last two chapters we derived a new multi-step estimator for the between-study variance in primary studies’ true effect sizes, and examined the statistical properties of two methods (Q-profile and generalized Q-statistic method) to compute the confidence interval of the between-study variance in true effect size. We proved that the multi-step estimator converges to the Paule-Mandel estimator which is nowadays one of the recommended methods to estimate the between-study variance in true effect sizes. Two Monte-Carlo simulation studies showed that the coverage probabilities of Q-profile and generalized Q-statistic method can be substantially below the nominal coverage rate if the assumptions underlying the random-effects meta-analysis model were violated.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2017 ◽  
Author(s):  
Hilde Augusteijn ◽  
Robbie Cornelis Maria van Aert ◽  
Marcel A. L. M. van Assen

One of the main goals of meta-analysis is to test and estimate the heterogeneity of effect size. We examined the effect of publication bias on the Q-test and assessments of heterogeneity, as a function of true heterogeneity, publication bias, true effect size, number of studies, and variation of sample sizes. The expected values of heterogeneity measures H2 and I2 were analytically derived, and the power and the type I error rate of the Q-test were examined in a Monte-Carlo simulation study. Our results show that the effect of publication bias on the Q-test and assessment of heterogeneity is large, complex, and non-linear. Publication bias can both dramatically decrease and increase heterogeneity. Extreme homogeneity can occur even when the population heterogeneity is large. Particularly if the number of studies is large and population effect size is small, publication bias can cause both extreme type I error rates and power of the Q-test close to 0 or 1. We therefore conclude that the Q-test of homogeneity and heterogeneity measures H2 and I2 are generally not valid in assessing and testing heterogeneity when publication bias is present, especially when the true effect size is small and the number of studies is large. We introduce a web application, Q-sense, which can be used to assess the sensitivity of the Q-test to publication bias, and we apply it to two published meta-analysis. Meta-analytic methods should be enhanced in order to be able to deal with publication bias in their assessment and tests of heterogeneity.


2018 ◽  
Vol 35 (1) ◽  
pp. 156-165
Author(s):  
Laura Redondo ◽  
Francisca Fariña ◽  
Dolores Seijo ◽  
Mercedes Novo ◽  
Ramón Arce

Parental attribute evaluation in relation to child custody comprises psychological and psychopathology. Additionally, defensiveness must be suspected on this setting. The worldwide reference psychometric measurement instrument for this purpose is the MMPI. With the aim of knowing the responses of parents litigating by child custody, a meta-analytic review of the responses to clinical and restructured scales was performed. A total of 21 primary studies (studies with a simulation design i.e., participants were instructed to answer as parents litigating by child custody were found were disregarded) were found, obtaining 291 effect sizes for clinical scales and 1 for restructured scales. The results showed positive, significant and generalizable mean true effect size in the Hy, Pd and Pa scales; a negative, significant and generalizable in the Ma and Si scales, and non-generalizable in the Pt y Sc scales; and a trivial mean true effect size in the Hs and D scales. Parent gender was studied as a moderator having no found differences between the responses of mothers and fathers. The implications of the results for forensic evaluation practices are discussed.


2020 ◽  
Author(s):  
Robbie Cornelis Maria van Aert ◽  
Joris Mulder

Meta-analysis methods are used to synthesize results of multiple studies on the same topic. The most frequently used statistical model in meta-analysis is the random-effects model containing parameters for the average effect, between-study variance in primary study's true effect size, and random effects for the study specific effects. We propose Bayesian hypothesis testing and estimation methods using the marginalized random-effects meta-analysis (MAREMA) model where the study specific true effects are regarded as nuisance parameters which are integrated out of the model. A flat prior distribution is placed on the overall effect size in case of estimation and a proper unit information prior for the overall effect size is proposed in case of hypothesis testing. For the between-study variance in true effect size, a proper uniform prior is placed on the proportion of total variance that can be attributed to between-study variability. Bayes factors are used for hypothesis testing that allow testing point and one-sided hypotheses. The proposed methodology has several attractive properties. First, the proposed MAREMA model encompasses models with a zero, negative, and positive between-study variance, which enables testing a zero between-study variance as it is not a boundary problem. Second, the methodology is suitable for default Bayesian meta-analyses as it requires no prior information about the unknown parameters. Third, the methodology can even be used in the extreme case when only two studies are available, because Bayes factors are not based on large sample theory. We illustrate the developed methods by applying it to two meta-analyses and introduce easy-to-use software in the R package BFpack to compute the proposed Bayes factors.


2018 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates observed in the social sciences. Both of these problems do not only increase the proportion of false positives in the literature but can also lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect and correct such bias in meta-analytic results. We present an evaluation of the performance of six of these tools in detecting bias. To assess the Type I error rate and the statistical power of these tools we simulated a large variety of literatures that differed with regard to underlying true effect size, heterogeneity, number of available primary studies and variation of sample sizes in these primary studies. Furthermore, simulated primary studies were subjected to different degrees of publication bias. Our results show that the power of the detection methods follows a complex pattern. Across all simulated conditions, no method consistently outperformed all others. Hence, choosing an optimal method would require knowledge about parameters (e.g., true effect size, heterogeneity) that meta-analysts cannot have. Additionally, all methods performed badly when true effect sizes were heterogeneous or primary studies had a small chance of being published irrespective of their results. This suggests, that in many actual meta-analyses in psychology bias will remain undiscovered no matter which detection method is used.


1997 ◽  
Vol 85 (2) ◽  
pp. 719-722 ◽  
Author(s):  
M. T. Bradley ◽  
R. D. Gupta

Although meta-analysis appears to be a useful technique to verify the existence of an effect and to summarize large bodies of literature, there are problems associated with its use and interpretation. Amongst difficulties is the “file drawer problem.” With this problem it is assumed that a certain percentage of studies are not published or are not available to be included in any given meta-analysis. We present a cautionary table to quantify the magnitude of this problem. The table shows that distortions exaggerating the effect size are substantial and that the exaggerations of effects are strongest when the true effect size approaches zero. A meta-analysis could be very misleading were the true effect size close to zero.


2017 ◽  
Author(s):  
Robbie Cornelis Maria van Aert ◽  
Marcel A. L. M. van Assen

The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method.


2020 ◽  
Author(s):  
Anton Olsson-Collentine ◽  
Marcel A. L. M. van Assen ◽  
Jelte M. Wicherts

We examined the evidence for heterogeneity (of effect sizes) when only minor changes to sample population and settings were made between studies and explored the association between heterogeneity and average effect size in a sample of 68 meta-analyses from thirteen pre-registered multi-lab direct replication projects in social and cognitive psychology. Amongst the many examined effects, examples include the Stroop effect, the “verbal overshadowing” effect, and various priming effects such as “anchoring” effects. We found limited heterogeneity; 48/68 (71%) meta-analyses had non-significant heterogeneity, and most (49/68; 72%) were most likely to have zero to small heterogeneity. Power to detect small heterogeneity (as defined by Higgins, 2003) was low for all projects (mean 43%), but good to excellent for medium and large heterogeneity. Our findings thus show little evidence of widespread heterogeneity in direct replication studies in social and cognitive psychology, suggesting that minor changes in sample population and settings are unlikely to affect research outcomes in these fields of psychology. We also found strong correlations between observed average effect sizes (standardized mean differences and log odds ratios) and heterogeneity in our sample. Our results suggest that heterogeneity and moderation of effects is unlikely for a zero average true effect size, but increasingly likely for larger average true effect size.


Sign in / Sign up

Export Citation Format

Share Document