A meta-analytical answer to the crisis of confidence of Psychology

Meta-analysis is a firmly established methodology and an integral part of the process of generating knowledge across the empirical sciences. Meta-analysis has also focused on methodology and has become a dominant critic of methodological shortcomings. We highlight several problematic issues on how we research in psychology: excess of heterogeneity in the results and difficulties for replication, publication bias, suboptimal methodological quality, and questionable practices of the researchers. These and other problems led to a “crisis of confidence” in psychology. We discuss how the meta-analytical perspective and its procedures can help to overcome the crisis. A more cooperative perspective, instead of a competitive one, can shift to consider replication as a more valuable contribution. Knowledge cannot be based in isolated studies. Given the nature of the object of study of psychology the natural unit to generate knowledge must be the estimated distribution of the effect sizes, not the dichotomous decision on statistical significance in specific studies. Some suggestions are offered on how to redirect researchers' research and practices, so that their personal interests and those of science as such are better aligned.

Download Full-text

Identification of and Correction for Publication Bias: Comment

10.31222/osf.io/dh87m ◽

2019 ◽

Author(s):

Amanda Kvarven ◽

Eirik Strømland ◽

Magnus Johannesson

Keyword(s):

Publication Bias ◽

False Positive ◽

Large Scale ◽

Meta Analysis ◽

False Positive Rate ◽

Effect Sizes ◽

Replication Studies ◽

Moderate Reduction ◽

Positive Rate ◽

Meta Analyses

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.

Download Full-text

A Meta-Analysis of the Facial Feedback Literature: Effects of Facial Feedback on Emotional Experience Are Small and Variable

10.31234/osf.io/svjru ◽

2017 ◽

Cited By ~ 2

Author(s):

Nicholas Alvaro Coles ◽

Jeff T. Larsen ◽

Heather Lench

Keyword(s):

Publication Bias ◽

Emotional Experience ◽

Meta Analysis ◽

Effect Sizes ◽

Detection Methods ◽

Feedback Effects ◽

Facial Feedback ◽

Facial Movements ◽

Facial Feedback Hypothesis ◽

Variance Estimates

The facial feedback hypothesis suggests that an individual’s experience of emotion is influenced by feedback from their facial movements. To evaluate the cumulative evidence for this hypothesis, we conducted a meta-analysis on 286 effect sizes derived from 138 studies that manipulated facial feedback and collected emotion self-reports. Using random effects meta-regression with robust variance estimates, we found that the overall effect of facial feedback was significant, but small. Results also indicated that feedback effects are stronger in some circumstances than others. We examined 12 potential moderators, and three were associated with differences in effect sizes. 1. Type of emotional outcome: Facial feedback influenced emotional experience (e.g., reported amusement) and, to a greater degree, affective judgments of a stimulus (e.g., the objective funniness of a cartoon). Three publication bias detection methods did not reveal evidence of publication bias in studies examining the effects of facial feedback on emotional experience, but all three methods revealed evidence of publication bias in studies examining affective judgments. 2. Presence of emotional stimuli: Facial feedback effects on emotional experience were larger in the absence of emotionally evocative stimuli (e.g., cartoons). 3. Type of stimuli: When participants were presented with emotionally evocative stimuli, facial feedback effects were larger in the presence of some types of stimuli (e.g., emotional sentences) than others (e.g., pictures). The available evidence supports the facial feedback hypothesis’ central claim that facial feedback influences emotional experience, although these effects tend to be small and heterogeneous.

Download Full-text

Individual Resilience Interventions: A Systematic Review in Adult Population Samples over the Last Decade

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18147564 ◽

2021 ◽

Vol 18 (14) ◽

pp. 7564

Author(s):

Mafalda Ferreira ◽

António Marques ◽

Paulo Veloso Gomes

Keyword(s):

Systematic Review ◽

Methodological Quality ◽

Meta Analysis ◽

Statistical Significance ◽

Adult Population ◽

Well Being ◽

Eligibility Criteria ◽

Individual Resilience ◽

Literature Searches ◽

Great Heterogeneity

Resilience interventions have been gaining importance among researchers due to their potential to provide well-being and reduce the prevalence of mental disorders that are becoming an increasing concern, especially in Western countries, because of the costs associated. The purpose of this systematic review is to identify the intervention studies carried out in the last decade in adult population samples, evaluate their methodological quality and highlight the trends of these types of interventions. This review was performed using systematic literature searches in the following electronic databases: B-on, PubMed, PsycNet and Science Direct. The application of eligibility criteria resulted in the inclusion of 38 articles, of which 33 were randomized controlled trials and the other five were nonrandomized controlled studies. Although most studies showed statistical significance for their results, these were constrained by the great heterogeneity of the studies, the lack of power of the samples and their fair methodological quality. Therefore, it is important to consolidate the theoretical basis and standardize certain methodologies so that the effects of the interventions can be compared through a meta-analysis.

Download Full-text

The effect of 0.01% atropine on ocular axial elongation for myopia children: a meta-analysis

10.1101/2021.08.17.456658 ◽

2021 ◽

Author(s):

Yan Yu ◽

Jiasu Liu

Keyword(s):

Publication Bias ◽

Meta Analysis ◽

Statistical Significance ◽

Cochrane Library ◽

Axial Elongation ◽

Funnel Plots ◽

Mean Differences ◽

Q Statistic ◽

Regression Tests ◽

Review Manager

Objectives: This meta-analysis aimed to identify the therapeutic effect of 0.01% atropine with on ocular axial elongation for myopia children. Methods: We searched PubMed, Cochrane Library, and CBM databases from inception to July 2021. Meta-analysis was conducted using STATA version 14.0 and Review Manager version 5.3 softwares. We calculated the weighted mean differences(WMD) to analyze the change of ocular axial length (AL) between orthokeratology combined with 0.01% atropine (OKA) and orthokeratology (OA) alone. The Cochran's Q-statistic and I2 test were used to evaluate potential heterogeneity between studies. To evaluate the influence of single studies on the overall estimate, a sensitivity analysis was performed. We also performed sub group and meta-regression analyses to investigate potential sources of heterogeneity. We conducted Begger's funnel plots and Egger's linear regression tests to investigate publication bias. Results: Nine studies that met all inclusion criteria were included in this meta-analysis. A total of 191 children in OKA group and 196 children in OK group were assessed. The pooled summary WMD of AL change was -0.90(95%CI=-1.25~-0.55) with statistical significance(t=-5.03, p<0.01), which indicated there was obvious difference between OKA and OK in myopic children. Subgroup analysis also showed that OKA treatment resulted in significantly less axial elongation compared to OK treatment alone according to SER. We found no evidence for publication bias. Conclusions: Our meta-analysis indicates 0.01% atropine atropine is effective in slowing axial elongation in myopia children with orthokeratology.

Download Full-text

Cognitive Behavioral Therapy combined with physical exercise for depression, anxiety, fatigue and pain in adults with chronic diseases: systematic review and meta-analysis

10.31236/osf.io/as6c7 ◽

2018 ◽

Author(s):

Paquito Bernard ◽

Romain Ahmed Jérôme ◽

Johan Caudroit ◽

Guillaume Chevance ◽

Carayol Marion ◽

...

Keyword(s):

Physical Exercise ◽

Methodological Quality ◽

Randomized Clinical Trials ◽

Behavioral Therapy ◽

Meta Analysis ◽

Effect Sizes ◽

Cognitive Behavior ◽

Cochrane Central Register ◽

Depression And Anxiety ◽

Central Register

Objective. The present meta-analysis aimed to determine the overall effect of cognitive behavior therapy combined with physical exercise (CBTEx) interventions on depression, anxiety, fatigue, and pain in adults with chronic illness; to identify the potential moderators of efficacy; and to compare the efficacy of CBTEx versus each condition alone (CBT and physical exercise). Methods. Relevant randomized clinical trials, published before July 2017, were identified through database searches in Pubmed, PsycArticles, CINAHL, SportDiscus and the Cochrane Central Register for Controlled Trials.Results. A total of 30 studies were identified. CBTEx interventions yielded small-to-large effect sizes for depression (SMC = -0.34, 95% CI [-0.53; -0.14]), anxiety (SMC = -0.18, 95% CI [-0.34; -0.03]) and fatigue (SMC = -0.96, 95% CI [-1.43; -0.49]). Moderation analyses revealed that longer intervention was associated with greater effect sizes for depression and anxiety outcomes. Low methodological quality was also associated with increased CBTEx efficacy for depression. When compared directly, CBTEx interventions did not show greater efficacy than CBT alone or physical exercise alone for any of the outcomes. Conclusion. The current literature suggests that CBTEx interventions are effective for decreasing depression, anxiety, and fatigue symptoms, but not pain. However, the findings do not support an additive effect of CBT and exercise on any of the four outcomes compared to each condition alone.

Download Full-text

The earth is flat (p>0.05): Significance thresholds and the crisis of unreplicable research

10.7287/peerj.preprints.2921v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Valentin Amrhein ◽

Fränzi Korner-Nievergelt ◽

Tobias Roth

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Statistical Power ◽

Alternative Hypothesis ◽

Statistical Significance ◽

Practical Importance ◽

Decision Rules ◽

Effect Sizes ◽

P Values ◽

True Effect

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.

Download Full-text

The Identification and Prevention of Publication Bias in the Social Sciences and Economics

Jahrbücher für Nationalökonomie und Statistik ◽

10.1515/jbnst-2011-5-608 ◽

2011 ◽

Vol 231 (5-6) ◽

Cited By ~ 1

Author(s):

Bernd Weiß ◽

Michael Wagner

Keyword(s):

Social Sciences ◽

Missing Data ◽

Publication Bias ◽

Meta Analysis ◽

Statistical Significance ◽

Empirical Studies ◽

Systematic Research ◽

Research Results ◽

The Social ◽

Research Findings

SummarySystematic research reviews have become essential in all empirical sciences. However, the validity of research syntheses is threatened by the fact that not all studies on a given topic can be summarized. Research reviews may suffer from missing data, and this is especially crucial in those cases where the selectivity of studies and their findings affects the summarized result. So-called publication bias is a type of missing data and a phenomenon that jeopardizes the validity of systematic or quantitative, as well as narrative, reviews. Publication bias exists if the preparation, submission or publication of research findings depend on characteristics of just these research results, e. g. their direction or statistical significance. This article describes methods to identify publication bias in the context of meta-analysis. It also reviews empirical studies on the prevalence of publication bias, especially in the social and economic sciences, where publication bias also seems to be prevalent. Several proposals to prevent publication bias are discussed.

Download Full-text

Is psychotherapy effective? A re-analysis of treatments for depression

Epidemiology and Psychiatric Sciences ◽

10.1017/s2045796018000355 ◽

2018 ◽

Vol 28 (03) ◽

pp. 268-274 ◽

Cited By ~ 13

Author(s):

T. Munder ◽

C. Flückiger ◽

F. Leichsenring ◽

A. A. Abbass ◽

M. J. Hilsenroth ◽

...

Keyword(s):

Natural History ◽

Publication Bias ◽

Meta Analysis ◽

Effect Sizes ◽

Wait List ◽

Control Groups ◽

Data Set ◽

A Value ◽

History Of ◽

History Of Depression

AbstractAimsThe aim of this study was to reanalyse the data from Cuijpers et al.'s (2018) meta-analysis, to examine Eysenck's claim that psychotherapy is not effective. Cuijpers et al., after correcting for bias, concluded that the effect of psychotherapy for depression was small (standardised mean difference, SMD, between 0.20 and 0.30), providing evidence that psychotherapy is not as effective as generally accepted.MethodsThe data for this study were the effect sizes included in Cuijpers et al. (2018). We removed outliers from the data set of effects, corrected for publication bias and segregated psychotherapy from other interventions. In our study, we considered wait-list (WL) controls as the most appropriate estimate of the natural history of depression without intervention.ResultsThe SMD for all interventions and for psychotherapy compared to WL controls was approximately 0.70, a value consistent with past estimates of the effectiveness of psychotherapy. Psychotherapy was also more effective than care-as-usual (SMD = 0.31) and other control groups (SMD = 0.43).ConclusionsThe re-analysis reveals that psychotherapy for adult patients diagnosed with depression is effective.

Download Full-text

Can Reliance be Placed on a Single Meta-Analysis?

Australian & New Zealand Journal of Psychiatry ◽

10.3109/00048679009077710 ◽

1990 ◽

Vol 24 (3) ◽

pp. 405-415 ◽

Cited By ~ 16

Author(s):

Nathaniel McConaghy

Keyword(s):

Literature Review ◽

Effect Size ◽

Meta Analysis ◽

Statistical Significance ◽

Effect Sizes ◽

Control Groups ◽

Consistent Finding ◽

Placebo Controls ◽

Effect Of Treatment ◽

Meta Analyses

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.

Download Full-text

Influence of Pilot and Small Trials in Meta-Analyses of Behavioral Interventions: A Meta-epidemiological Study

10.21203/rs.3.rs-46722/v1 ◽

2020 ◽

Author(s):

Michael W. Beets ◽

R. Glenn Weaver ◽

John P.A. Ioannidis ◽

Alexis Jones ◽

Lauren von Klinggraeff ◽

...

Keyword(s):

Sample Size ◽

Behavioral Interventions ◽

Meta Analysis ◽

Statistical Significance ◽

Feasibility Studies ◽

Small Sample ◽

Effect Sizes ◽

Absolute Difference ◽

Meta Analyses ◽

The Impact

Abstract Background: Pilot/feasibility or studies with small sample sizes may be associated with inflated effects. This study explores the vibration of effect sizes (VoE) in meta-analyses when considering different inclusion criteria based upon sample size or pilot/feasibility status. Methods: Searches were conducted for meta-analyses of behavioral interventions on topics related to the prevention/treatment of childhood obesity from 01-2016 to 10-2019. The computed summary effect sizes (ES) were extracted from each meta-analysis. Individual studies included in the meta-analyses were classified into one of the following four categories: self-identified pilot/feasibility studies or based upon sample size (N≤100, N>100, and N>370 the upper 75th of sample size). The VoE was defined as the absolute difference (ABS) between the re-estimations of summary ES restricted to study classifications compared to the originally reported summary ES. Concordance (kappa) of statistical significance between summary ES was assessed. Fixed and random effects models and meta-regressions were estimated. Three case studies are presented to illustrate the impact of including pilot/feasibility and N≤100 studies on the estimated summary ES.Results: A total of 1,602 effect sizes, representing 145 reported summary ES, were extracted from 48 meta-analyses containing 603 unique studies (avg. 22 avg. meta-analysis, range 2-108) and included 227,217 participants. Pilot/feasibility and N≤100 studies comprised 22% (0-58%) and 21% (0-83%) of studies. Meta-regression indicated the ABS between the re-estimated and original summary ES where summary ES were comprised of ≥40% of N≤100 studies was 0.29. The ABS ES was 0.46 when summary ES comprised of >80% of both pilot/feasibility and N≤100 studies. Where ≤40% of the studies comprising a summary ES had N>370, the ABS ES ranged from 0.20-0.30. Concordance was low when removing both pilot/feasibility and N≤100 studies (kappa=0.53) and restricting analyses only to the largest studies (N>370, kappa=0.35), with 20% and 26% of the originally reported statistically significant ES rendered non-significant. Reanalysis of the three case study meta-analyses resulted in the re-estimated ES rendered either non-significant or half of the originally reported ES. Conclusions: When meta-analyses of behavioral interventions include a substantial proportion of both pilot/feasibility and N≤100 studies, summary ES can be affected markedly and should be interpreted with caution.

Download Full-text