Statistical pearls: Importance of effect-size, blinding, randomization, publication bias, and the overestimated p-values

2013 ◽  
Vol 4 (4) ◽  
pp. 217-219 ◽  
Author(s):  
Harald Breivik ◽  
Leiv Arne Rosseland ◽  
Audun Stubhaug
2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2020 ◽  
Vol 132 (2) ◽  
pp. 662-670
Author(s):  
Minh-Son To ◽  
Alistair Jukes

OBJECTIVEThe objective of this study was to evaluate the trends in reporting of p values in the neurosurgical literature from 1990 through 2017.METHODSAll abstracts from the Journal of Neurology, Neurosurgery, and Psychiatry (JNNP), Journal of Neurosurgery (JNS) collection (including Journal of Neurosurgery: Spine and Journal of Neurosurgery: Pediatrics), Neurosurgery (NS), and Journal of Neurotrauma (JNT) available on PubMed from 1990 through 2017 were retrieved. Automated text mining was performed to extract p values from relevant abstracts. Extracted p values were analyzed for temporal trends and characteristics.RESULTSThe search yielded 47,889 relevant abstracts. A total of 34,324 p values were detected in 11,171 abstracts. Since 1990 there has been a steady, proportionate increase in the number of abstracts containing p values. There were average absolute year-on-year increases of 1.2% (95% CI 1.1%–1.3%; p < 0.001), 0.93% (95% CI 0.75%–1.1%; p < 0.001), 0.70% (95% CI 0.57%–0.83%; p < 0.001), and 0.35% (95% CI 0.095%–0.60%; p = 0.0091) of abstracts reporting p values in JNNP, JNS, NS, and JNT, respectively. There have also been average year-on-year increases of 0.045 (95% CI 0.031–0.059; p < 0.001), 0.052 (95% CI 0.037–0.066; p < 0.001), 0.042 (95% CI 0.030–0.054; p < 0.001), and 0.041 (95% CI 0.026–0.056; p < 0.001) p values reported per abstract for these respective journals. The distribution of p values showed a positive skew and strong clustering of values at rounded decimals (i.e., 0.01, 0.02, etc.). Between 83.2% and 89.8% of all reported p values were at or below the “significance” threshold of 0.05 (i.e., p ≤ 0.05).CONCLUSIONSTrends in reporting of p values and the distribution of p values suggest publication bias remains in the neurosurgical literature.


2021 ◽  
Author(s):  
Michelle Renee Ellefson ◽  
Daniel Oppenheimer

Failure of replication attempts in experimental psychology might extend beyond p-hacking, publication bias or hidden moderators; reductions in experimental power can be caused by violations of fidelity to a set of experimental protocols. In this paper, we run a series of simulations to systematically explore how manipulating fidelity influences effect size. We find statistical patterns that mimic those found in ManyLabs style replications and meta-analyses, suggesting that fidelity violations are present in many replication attempts in psychology. Scholars in intervention science, medicine, and education have developed methods of improving and measuring fidelity, and as replication becomes more mainstream in psychology, the field would benefit from adopting such approaches as well.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Fushun Zhang ◽  
Yuanyuan Zhang ◽  
Nan Jiang ◽  
Qiao Zhai ◽  
Juanjuan Hu ◽  
...  

Background. Some studies published previously have shown a strong correlation between hypertension and psychological nature including impulsion emotion or mindfulness and relaxation temperament, among which mindfulness and relaxation temperament might have a benign influence on blood pressure, ameliorating the hypertension. However, the conclusion was not confirmed. Objective. The meta-analysis was performed to investigate the influence of mindfulness and relaxation on essential hypertension interventions and confirm the effects. Methods. Systematic searches were conducted in common English and Chinese electronic databases (i.e., PubMed/MEDLINE, EMBASE, Web of Science, CINAHL, PsycINFO, Cochrane Library, and Chinese Biomedical Literature Database) from 1980 to 2020. A meta-analysis including 5 studies was performed using Rev Man 5.4.1 software to estimate the influence of mindfulness and relaxation on blood pressure, ameliorating the hypertension. Publication bias and heterogeneity of samples were tested using a funnel plot. Studies were analyzed using either a random-effect model or a fixed-effect model. Results. All the 5 studies investigated the influence of mindfulness and relaxation on diastolic and systolic blood pressure, with total 205 participants in the control group and 204 in the intervention group. The random-effects model (REM) was used to calculate the pooled effect for mindfulness and relaxation on diastolic blood pressure (I2 = 0%, t2 = 0.000, P = 0.41 ). The random pooled effect size (MD) was 0.30 (95% CI = −0.81–1.42, P = 0.59 ). REM was used to calculate the pooled effect for mindfulness and relaxation on systolic blood pressure (I2 = 49%, t2 = 3.05, P = 0.10 ). The random pooled effect size (MD) was −1.05 (95% CI = −3.29–1.18, P = 0.36 ). The results of this meta-analysis were influenced by publication bias to some degree. Conclusion. All the results showed less influence of mindfulness and relaxation might act on diastolic or systolic blood pressure, when mindfulness and relaxation are used to intervene in treating CVD and hypertension.


2020 ◽  
Author(s):  
D. Stephen Lindsay

Psychological scientists strive to advance understanding of how and why we animals do and think and feel as we do. This is difficult, in part because flukes of chance and measurement error obscure researchers’ perceptions. Many psychologists use inferential statistical tests to peer through the murk of chance and discern relationships between variables. Those tests are powerful tools, but they must be wielded with skill. Moreover, research reports must convey to readers a detailed and accurate understanding of how the data were obtained and analyzed. Research psychologists often fall short in those regards. This paper attempts to motivate and explain ways to enhance the transparency and replicability of psychological science. Specifically, I speak to how publication bias and p hacking contribute to effect-size exaggeration in the published literature, and how effect-size exaggeration contributes, in turn, to replication failures. Then I present seven steps toward addressing these problems: Telling the truth; upgrading statistical knowledge; standardizing aspects of research practices; documenting lab procedures in a lab manual; making materials, data, and analysis scripts transparent; addressing constraints on generality; and collaborating.


Author(s):  
Valentin Amrhein ◽  
Fränzi Korner-Nievergelt ◽  
Tobias Roth

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.


2021 ◽  
Author(s):  
Alessandro Sparacio ◽  
Ivan Ropovik ◽  
Gabriela M. Jiga-Boy ◽  
Hans IJzerman

This meta-analysis explored whether being in nature and emotional social support are effective in reducing levels of stress through a Registered Report. We retrieved all the relevant articles that investigated a connection between one of these two strategies and various components of stress (physiological, affective and cognitive) as well as affective consequences of stress. We followed a stringent analysis workflow (including permutation-based selection models and multilevel regression-based models) to provide publication bias-corrected estimates. We found [no evidence for the efficacy of either strategy/evidence for one of the two strategies/evidence for both strategies] with an estimated mean effect size of [xx/xx] and we recommend [recommendation will be provided if necessary].


2010 ◽  
Vol 10 (2) ◽  
pp. 545-555 ◽  
Author(s):  
Guillermo Macbeth ◽  
Eugenia Razumiejczyk ◽  
Rubén Daniel Ledesma

The Cliff´s Delta statistic is an effect size measure that quantifies the amount of difference between two non-parametric variables beyond p-values interpretation. This measure can be understood as a useful complementary analysis for the corresponding hypothesis testing. During the last two decades the use of effect size measures has been strongly encouraged by methodologists and leading institutions of behavioral sciences. The aim of this contribution is to introduce the Cliff´s Delta Calculator software that performs such analysis and offers some interpretation tips. Differences and similarities with the parametric case are analysed and illustrated. The implementation of this free program is fully described and compared with other calculators. Alternative algorithmic approaches are mathematically analysed and a basic linear algebra proof of its equivalence is formally presented. Two worked examples in cognitive psychology are commented. A visual interpretation of Cliff´s Delta is suggested. Availability, installation and applications of the program are presented and discussed.


Sign in / Sign up

Export Citation Format

Share Document