scholarly journals Using simulation and resampling to improve the statistical power and reproducibility of psychological research

2019 ◽  
Author(s):  
Roger W. Strong ◽  
George Alvarez

The replication crisis has brought about an increased focus on improving the reproducibility of psychological research (Open Science Collaboration, 2015). Although some failed replications reflect false-positives in original research findings, many are likely the result of low statistical power, which can cause failed replications even when an effect is real, no questionable research practices are used, and an experiment’s methodology is repeated perfectly. The present paper describes a simulation method (bootstrap resampling) that can be used in combination with pilot data or synthetic data to produce highly powered experimental designs. Unlike other commonly used power analysis approaches (e.g., G*Power), bootstrap resampling can be adapted to any experimental design to account for various factors that influence statistical power, including sample size, number of trials per condition, and participant exclusion criteria. Ignoring some of these factors (e.g., by using G*Power) can overestimate the power of a study or replication, increasing the likelihood that your findings will not replicate. By demonstrating how these factors influence the consistency of experimental findings, this paper provides examples of how simulation can be used to improve statistical power and reproducibility. Further, we provide a MATLAB toolbox that can be used to implement these simulation-based methods on existing pilot data (https://harvard-visionlab.github.io/power-sim).

2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2017 ◽  
Author(s):  
Etienne P. LeBel ◽  
Derek Michael Berger ◽  
Lorne Campbell ◽  
Timothy Loving

Finkel, Eastwick, and Reis (2016; FER2016) argued the post-2011 methodological reform movement has focused narrowly on replicability, neglecting other essential goals of research. We agree multiple scientific goals are essential, but argue, however, a more fine-grained language, conceptualization, and approach to replication is needed to accomplish these goals. Replication is the general empirical mechanism for testing and falsifying theory. Sufficiently methodologically similar replications, also known as direct replications, test the basic existence of phenomena and ensure cumulative progress is possible a priori. In contrast, increasingly methodologically dissimilar replications, also known as conceptual replications, test the relevance of auxiliary hypotheses (e.g., manipulation and measurement issues, contextual factors) required to productively investigate validity and generalizability. Without prioritizing replicability, a field is not empirically falsifiable. We also disagree with FER2016’s position that “bigger samples are generally better, but … that very large samples could have the downside of commandeering resources that would have been better invested in other studies” (abstract). We identify problematic assumptions involved in FER2016’s modifications of our original research-economic model, and present an improved model that quantifies when (and whether) it is reasonable to worry that increasing statistical power will engender potential trade-offs. Sufficiently-powering studies (i.e., >80%) maximizes both research efficiency and confidence in the literature (research quality). Given we are in agreement with FER2016 on all key open science points, we are eager to start seeing the accelerated rate of cumulative knowledge development of social psychological phenomena such a sufficiently transparent, powered, and falsifiable approach will generate.


2019 ◽  
Vol 6 (12) ◽  
pp. 190738 ◽  
Author(s):  
Jerome Olsen ◽  
Johanna Mosen ◽  
Martin Voracek ◽  
Erich Kirchler

The replicability of research findings has recently been disputed across multiple scientific disciplines. In constructive reaction, the research culture in psychology is facing fundamental changes, but investigations of research practices that led to these improvements have almost exclusively focused on academic researchers. By contrast, we investigated the statistical reporting quality and selected indicators of questionable research practices (QRPs) in psychology students' master's theses. In a total of 250 theses, we investigated utilization and magnitude of standardized effect sizes, along with statistical power, the consistency and completeness of reported results, and possible indications of p -hacking and further testing. Effect sizes were reported for 36% of focal tests (median r = 0.19), and only a single formal power analysis was reported for sample size determination (median observed power 1 − β = 0.67). Statcheck revealed inconsistent p -values in 18% of cases, while 2% led to decision errors. There were no clear indications of p -hacking or further testing. We discuss our findings in the light of promoting open science standards in teaching and student supervision.


Author(s):  
Anđela Keljanović

At the time when social psychologists believed they could be proud of their discipline, there was the devastating news that Diederik Stapel had committed a major scientific fraud. This event coincided with the start of the discussion on trust in psychological findings. It was soon followed by the report of a series of nine studies that failed to replicate the 'professor's study'. These replication results were astounding due to earlier reports of successful replications. Due to the crisis of confidence in the results of field research, the Open Science Collaboration subsequently replicated 100 correlation and experimental studies published in 2008 in Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition. Of the 97% of the original studies that had a positive effect, 36% were replicated. However, their findings have also been called into question by calculating the Bayesian factor. In addition to fraud, questionable research practices resulting from publication bias that results in false positives undermine confidence in the validity of psychological research findings. Perhaps the most costly mistake of false-positive findings is to erroneously reject the null hypothesis. However, that Stapel (2011) confirmed the null hypothesis, or that Bargh (1996) found that admission of participants did not affect walking speed, or that Dijksterhuis and van Knipenberg (1998) reported that participants received with the word 'professor' did not improve their performance on task, no one would be interested in their findings. Zero findings are only interesting if they contradict the main hypothesis derived from the theory or contradict a number of previous studies. The fact that good experimental research is usually conducted in order to test theories, researchers can never be sure whether they have chosen the optimal operationalization of a given construct. As researchers can never be sure that they have properly operationalized the theoretical constructs they are evaluating and whether they have been successful in controlling the third variables that may be responsible for their results, the theory can never be proven true.


2021 ◽  
Author(s):  
Bradley David McAuliff ◽  
Melanie B. Fessinger ◽  
Anthony Perillo ◽  
Jennifer Torkildson Perillo

As the field of psychology and law begins to embrace more transparent and accessible science, many questions arise about what open science actually is and how to do it. In this chapter, we contextualize this reform by examining fundamental concerns about psychological research—irreproducibility and replication failures, false-positive errors, and questionable research practices—that threaten its validity and credibility. Next, we turn to psychology’s response by reviewing the concept of open science and explaining how to implement specific practices—preregistration, registered reports, open materials/data/code, and open access publishing—designed to make research more transparent and accessible. We conclude by weighing the implications of open science for the field of psychology and law, specifically with respect to how we conduct and evaluate research, as well as how we train the next generation of psychological scientists and share scientific findings in applied settings.


2018 ◽  
Vol 5 (9) ◽  
pp. 181190 ◽  
Author(s):  
Michael Ingre ◽  
Gustav Nilsonne

In this paper, we show how Bayes' theorem can be used to better understand the implications of the 36% reproducibility rate of published psychological findings reported by the Open Science Collaboration. We demonstrate a method to assess publication bias and show that the observed reproducibility rate was not consistent with an unbiased literature. We estimate a plausible range for the prior probability of this body of research, suggesting expected statistical power in the original studies of 48–75%, producing (positive) findings that were expected to be true 41–62% of the time. Publication bias was large, assuming a literature with 90% positive findings, indicating that negative evidence was expected to have been observed 55–98 times before one negative result was published. These findings imply that even when studied associations are truly NULL, we expect the literature to be dominated by statistically significant findings.


2020 ◽  
Vol 54 (22) ◽  
pp. 1365-1371
Author(s):  
Fionn Büttner ◽  
Elaine Toomey ◽  
Shane McClean ◽  
Mark Roe ◽  
Eamonn Delahunt

Questionable research practices (QRPs) are intentional and unintentional practices that can occur when designing, conducting, analysing, and reporting research, producing biased study results. Sport and exercise medicine (SEM) research is vulnerable to the same QRPs that pervade the biomedical and psychological sciences, producing false-positive results and inflated effect sizes. Approximately 90% of biomedical research reports supported study hypotheses, provoking suspicion about the field-wide presence of systematic biases to facilitate study findings that confirm researchers’ expectations. In this education review, we introduce three common QRPs (ie, HARKing, P-hacking and Cherry-picking), perform a cross-sectional study to assess the proportion of original SEM research that reports supported study hypotheses, and draw attention to existing solutions and resources to overcome QRPs that manifest in exploratory research. We hypothesised that ≥ 85% of original SEM research studies would report supported study hypotheses. Two independent assessors systematically identified, screened, included, and extracted study data from original research articles published between 1 January 2019 and 31 May 2019 in the British Journal of Sports Medicine, Sports Medicine, the American Journal of Sports Medicine, and the Journal of Orthopaedic & Sports Physical Therapy. We extracted data relating to whether studies reported that the primary hypothesis was supported or rejected by the results. Study hypotheses, methodologies, and analysis plans were preregistered at the Open Science Framework. One hundred and twenty-nine original research studies reported at least one study hypothesis, of which 106 (82.2%) reported hypotheses that were supported by study results. Of 106 studies reporting that primary hypotheses were supported by study results, 75 (70.8%) studies reported that the primary hypothesis was fully supported by study results. The primary study hypothesis was partially supported by study results in 28 (26.4%) studies. We detail open science practices and resources that aim to safe-guard against QRPs that bely the credibility and replicability of original research findings.


Methodology ◽  
2017 ◽  
Vol 13 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Pablo Livacic-Rojas ◽  
Guillermo Vallejo ◽  
Paula Fernández ◽  
Ellián Tuero-Herrero

Abstract. Low precision of the inferences of data analyzed with univariate or multivariate models of the Analysis of Variance (ANOVA) in repeated-measures design is associated to the absence of normality distribution of data, nonspherical covariance structures and free variation of the variance and covariance, the lack of knowledge of the error structure underlying the data, and the wrong choice of covariance structure from different selectors. In this study, levels of statistical power presented the Modified Brown Forsythe (MBF) and two procedures with the Mixed-Model Approaches (the Akaike’s Criterion, the Correctly Identified Model [CIM]) are compared. The data were analyzed using Monte Carlo simulation method with the statistical package SAS 9.2, a split-plot design, and considering six manipulated variables. The results show that the procedures exhibit high statistical power levels for within and interactional effects, and moderate and low levels for the between-groups effects under the different conditions analyzed. For the latter, only the Modified Brown Forsythe shows high level of power mainly for groups with 30 cases and Unstructured (UN) and Autoregressive Heterogeneity (ARH) matrices. For this reason, we recommend using this procedure since it exhibits higher levels of power for all effects and does not require a matrix type that underlies the structure of the data. Future research needs to be done in order to compare the power with corrected selectors using single-level and multilevel designs for fixed and random effects.


2019 ◽  
Author(s):  
Jennifer L Tackett ◽  
Josh Miller

As psychological research comes under increasing fire for the crisis of replicability, attention has turned to methods and practices that facilitate (or hinder) a more replicable and veridical body of empirical evidence. These trends have focused on “open science” initiatives, including an emphasis on replication, transparency, and data sharing. Despite this broader movement in psychology, clinical psychologists and psychiatrists have been largely absent from the broader conversation on documenting the extent of existing problems as well as generating solutions to problematic methods and practices in our area (Tackett et al., 2017). The goal of the current special section was to bring together psychopathology researchers to explore these and related areas as they pertain to the types of research conducted in clinical psychology and allied disciplines.


2020 ◽  
Author(s):  
Madeleine Pownall

Currently under review at Psychology Teaching Review. Over recent years, Psychology has become increasingly concerned with reproducibility and replicability of research findings (Munafò et al., 2017). One method of ensuring that research is hypothesis driven, as opposed to data driven, is the process of publicly pre-registering a study’s hypotheses, data analysis plan, and procedure prior to data collection (Nosek, Ebersole, DeHaven, & Mellor, 2018). This paper discusses the potential benefits of introducing pre-registration to the undergraduate dissertation. The utility of pre-registration as a pedagogic practice within dissertation supervision is also critically appraised, with reference to open science literature. Here, it is proposed that encouraging pre-registration of undergraduate dissertation work may alleviate some pedagogic challenges, such as statistics anxiety, questionable research practices, and research clarity and structure. Perceived barriers, such as time and resource constraints, are also discussed.


Sign in / Sign up

Export Citation Format

Share Document