Influence of Pilot and Small Trials in Meta-Analyses of Behavioral Interventions: A Meta-epidemiological Study

Whether in meta-analysis or single experiments, selecting results based on statistical significance leads to overestimated effect sizes, impeding falsification. We critique a quantitative synthesis that used significance to score and select previously published effects for situation awareness-performance associations (Endsley, 2019). How much does selection using statistical significance quantitatively impact results in a meta-analytic context? We evaluate and compare results using significance-filtered effects versus analyses with all effects as-reported. Endsley reported high predictiveness scores and large positive mean correlations but used atypical methods: the hypothesis was used to select papers and effects. Papers were assigned the maximum predictiveness scores if they contained at-least-one significant effect, yet most papers reported multiple effects, and the number of non-significant effects did not impact the score. Thus, the predictiveness score was rarely less than the maximum. In addition, only significant effects were included in Endsley’s quantitative synthesis. Filtering excluded half of all reported effects, with guaranteed minimum effect sizes based on sample size. Results for filtered compared to as-reported effects clearly diverged. Compared to the mean of as-reported effects, the filtered mean was overestimated by 56%. Furthermore, 92% (or 222 out of 241) of the as-reported effects were below the mean of filtered effects. We conclude that outcome-dependent selection of effects is circular, predetermining results and running contrary to the purpose of meta-analysis. Instead of using significance to score and filter effects, meta-analyses should follow established research practices.

Download Full-text

Statistical Significance Filtering Overestimates Effects and Impedes Falsification: A Critique of Endsley (2019)

10.31222/osf.io/zgha6 ◽

2021 ◽

Author(s):

Jonathan Z Bakdash ◽

Laura Ranee Marusich ◽

Jared Kenworthy ◽

Elyssa Twedt ◽

Erin Zaroukian

Keyword(s):

Sample Size ◽

Situation Awareness ◽

Meta Analysis ◽

Statistical Significance ◽

Effect Sizes ◽

Quantitative Synthesis ◽

The Mean ◽

Meta Analyses ◽

Filter Effects ◽

Selection Of

Whether in meta-analysis or single experiments, selecting results based on statistical significance leads to overestimated effect sizes, impeding falsification. We critique a quantitative synthesis that used significance to score and select previously published effects for situation awareness-performance associations (Endsley, 2019). How much does selection using statistical significance quantitatively impact results in a meta-analytic context? We evaluate and compare results using significance-filtered effects versus analyses with all effects as-reported. Endsley reported high predictiveness scores and large positive mean correlations but used atypical methods: the hypothesis was used to select papers and effects. Papers were assigned the maximum predictiveness scores if they contained at-least-one significant effect, yet most papers reported multiple effects, and the number of non-significant effects did not impact the score. Thus, the predictiveness score was rarely less than the maximum. In addition, only significant effects were included in Endsley’s quantitative synthesis. Filtering excluded half of all reported effects, with guaranteed minimum effect sizes based on sample size. Results for filtered compared to as-reported effects clearly diverged. Compared to the mean of as-reported effects, the filtered mean was overestimated by 56%. Furthermore, 92% (or 222 out of 241) of the as-reported effects were below the mean of filtered effects. We conclude that outcome-dependent selection of effects is circular, predetermining results and running contrary to the purpose of meta-analysis. Instead of using significance to score and filter effects, meta-analyses should follow established research practices.

Download Full-text

Statistical Significance Filtering Overestimates Effects and Impedes Falsification: A Critique of Endsley (2019)

Frontiers in Psychology ◽

10.3389/fpsyg.2020.609647 ◽

2020 ◽

Vol 11 ◽

Author(s):

Jonathan Z. Bakdash ◽

Laura R. Marusich ◽

Jared B. Kenworthy ◽

Elyssa Twedt ◽

Erin G. Zaroukian

Keyword(s):

Sample Size ◽

Situation Awareness ◽

Meta Analysis ◽

Statistical Significance ◽

Effect Sizes ◽

Quantitative Synthesis ◽

The Mean ◽

Meta Analyses ◽

Filter Effects ◽

Selection Of

Whether in meta-analysis or single experiments, selecting results based on statistical significance leads to overestimated effect sizes, impeding falsification. We critique a quantitative synthesis that used significance to score and select previously published effects for situation awareness-performance associations (Endsley, 2019). How much does selection using statistical significance quantitatively impact results in a meta-analytic context? We evaluate and compare results using significance-filtered effects versus analyses with all effects as-reported. Endsley reported high predictiveness scores and large positive mean correlations but used atypical methods: the hypothesis was used to select papers and effects. Papers were assigned the maximum predictiveness scores if they contained at-least-one significant effect, yet most papers reported multiple effects, and the number of non-significant effects did not impact the score. Thus, the predictiveness score was rarely less than the maximum. In addition, only significant effects were included in Endsley’s quantitative synthesis. Filtering excluded half of all reported effects, with guaranteed minimum effect sizes based on sample size. Results for filtered compared to as-reported effects clearly diverged. Compared to the mean of as-reported effects, the filtered mean was overestimated by 56%. Furthermore, 92% (or 222 out of 241) of the as-reported effects were below the mean of filtered effects. We conclude that outcome-dependent selection of effects is circular, predetermining results and running contrary to the purpose of meta-analysis. Instead of using significance to score and filter effects, meta-analyses should follow established research practices.

Download Full-text

A protocol for the systematic review and meta-analysis of thigmotactic behaviour in the open field test in rodent models associated with persistent pain

BMJ Open Science ◽

10.1136/bmjos-2020-100135 ◽

2021 ◽

Vol 5 (1) ◽

pp. e100135

Author(s):

Xue Ying Zhang ◽

Jan Vollert ◽

Emily S Sena ◽

Andrew SC Rice ◽

Nadia Soliman

Keyword(s):

Systematic Review ◽

Outcome Measure ◽

Meta Analysis ◽

Persistent Pain ◽

Effect Sizes ◽

Bias Assessment ◽

Animal Pain ◽

Trim And Fill ◽

Meta Analyses ◽

The Impact

ObjectiveThigmotaxis is an innate predator avoidance behaviour of rodents and is enhanced when animals are under stress. It is characterised by the preference of a rodent to seek shelter, rather than expose itself to the aversive open area. The behaviour has been proposed to be a measurable construct that can address the impact of pain on rodent behaviour. This systematic review will assess whether thigmotaxis can be influenced by experimental persistent pain and attenuated by pharmacological interventions in rodents.Search strategyWe will conduct search on three electronic databases to identify studies in which thigmotaxis was used as an outcome measure contextualised to a rodent model associated with persistent pain. All studies published until the date of the search will be considered.Screening and annotationTwo independent reviewers will screen studies based on the order of (1) titles and abstracts, and (2) full texts.Data management and reportingFor meta-analysis, we will extract thigmotactic behavioural data and calculate effect sizes. Effect sizes will be combined using a random-effects model. We will assess heterogeneity and identify sources of heterogeneity. A risk-of-bias assessment will be conducted to evaluate study quality. Publication bias will be assessed using funnel plots, Egger’s regression and trim-and-fill analysis. We will also extract stimulus-evoked limb withdrawal data to assess its correlation with thigmotaxis in the same animals. The evidence obtained will provide a comprehensive understanding of the strengths and limitations of using thigmotactic outcome measure in animal pain research so that future experimental designs can be optimised. We will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses reporting guidelines and disseminate the review findings through publication and conference presentation.

Download Full-text

Can Reliance be Placed on a Single Meta-Analysis?

Australian & New Zealand Journal of Psychiatry ◽

10.3109/00048679009077710 ◽

1990 ◽

Vol 24 (3) ◽

pp. 405-415 ◽

Cited By ~ 16

Author(s):

Nathaniel McConaghy

Keyword(s):

Literature Review ◽

Effect Size ◽

Meta Analysis ◽

Statistical Significance ◽

Effect Sizes ◽

Control Groups ◽

Consistent Finding ◽

Placebo Controls ◽

Effect Of Treatment ◽

Meta Analyses

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.

Download Full-text

Collecting and Delivering Client Feedback

Psychotherapy Relationships that Work ◽

10.1093/med-psych/9780190843953.003.0017 ◽

2019 ◽

pp. 580-630 ◽

Cited By ~ 2

Author(s):

Michael J. Lambert ◽

Jason L. Whipple ◽

Maria Kleinstäuber

Keyword(s):

Meta Analysis ◽

Effect Sizes ◽

Routine Outcome Monitoring ◽

Outcome Questionnaire ◽

Client Feedback ◽

Outcome Monitoring ◽

Clinically Significant ◽

Meta Analyses ◽

Client Progress ◽

The Impact

This meta-analysis examines the impact of measuring, monitoring, and feeding back information on client progress to clinicians while they deliver psychotherapy. It considers the effects of the two most frequently studied routine outcome monitoring practices: the Partners for Change Outcome System and the Outcome Questionnaire System. Meta-analyses of 24 studies produced effect sizes from small to moderate. Feedback practices reduced deterioration rates and nearly doubled clinically significant/reliable change rates in clients who were predicted to have a poor outcome. Clinical examples, diversity considerations, and therapeutic advances are provided.

Download Full-text

Abstract P173: Physical Activity In Relation To Fasting And Post-meal Blood Glucose In Pregnant Women With Diabetes

Circulation ◽

10.1161/circ.143.suppl_1.p173 ◽

2021 ◽

Vol 143 (Suppl_1) ◽

Author(s):

Kimberly L Savin ◽

Linda C Gallo ◽

Britta A Larsen

Keyword(s):

Physical Activity ◽

Blood Glucose ◽

Pregnant Women ◽

Sample Size ◽

Gestational Age ◽

Small Sample Size ◽

Statistical Significance ◽

Small Sample ◽

Effect Sizes ◽

Individual Effects

Introduction: Pregnant women with diabetes often show low levels of physical activity (PA) and high sedentary behavior (SED). Longitudinal studies with objective measures are needed to understand the relationships of daily PA with daily and next-day blood glucose (BG). Hypothesis: Increased steps or moderate to vigorous PA (MVPA) and decreased SED are linked with lower post-meal BG and next day fasting BG in pregnant women. Methods: Participants were 10 pregnant women with diabetes [mean age= 29.3 (SD= 3.6); mean gestational age= 21.9 (SD= 3.9); 90% (9 of 10) Latina] enrolled in a 12-week pilot PA intervention. Participants self-reported demographic and BG data (morning fasting BG, up to 3 daily post-meal BGs). Steps, MVPA (mins/day), and SED (mins/day) were measured using a Fitbit Alta HR. Participants had on average 49 (range: 21 to 77) days with valid PA and BG data, for a total of 469 observations. Multi-level models (MLMs) were fit to examine mean and day-level effects of steps, MVPA, and SED on post-meal and next-day fasting BG after adjusting for age, gestational age, education, and participant mean PA or SED. Due to the small sample size, effect sizes are emphasized in results instead of statistical significance. Results: The mean post-meal BG was 122.5 mg/dL and mean fasting BG was 92.81 mg/dL. After adjustment, an increase of mean steps by 1000 was linked to a lower mean post-meal BG by 11.79 mg/dL (p=0.22) and fasting BG by 7.26 mg/dL (p=0.54), though neither between effect was statistically significant. The within-individual effects of daily steps on post-meal and fasting BG were very small and non-significant (b=-1.78; p=0.59; b=0.72; p=0.30, respectively). A 1-minute increase in mean MVPA was associated with a slight increase in mean post-meal BG by 1.53 mg/dL (p=0.07). The within-individual effect of daily MVPA on daily post-meal BG was negligible and non-significant (b=-0.39, p=0.51). Between-individual effects showed SED had small, positive, non-significant associations with post-meal BG. Specifically, per 60-minute mean SED increase, mean post-meal BG increased by 1.02 mg/dL (p=0.44). Within-individual daily SED increases of 60 minutes were associated with increases of 1.87 mg/dL (p=0.63) in daily post-meal BG. MVPA and SED were not associated with fasting BG. Conclusions: Greater mean steps were linked to lower post-meal and fasting BG while greater SED and MVPA were linked to greater post-meal BG. However, within individual daily increases in MVPA and decreases in SED, were protective for post-meal BG, while controlling for individual mean MVPA and SED. Most effect sizes were small and results were not statistically significant in part due to the small sample size. Participants generally had well-controlled post-meal and fasting BGs, so results may not be generalizable to larger populations.

Download Full-text

Changing the logic of replication: A case from infant studies

10.31234/osf.io/xw6qt ◽

2019 ◽

Cited By ~ 2

Author(s):

Francesco Margoni ◽

Martin Shepperd

Keyword(s):

Statistical Power ◽

Sampling Error ◽

Meta Analysis ◽

Statistical Significance ◽

Small Sample ◽

Infant Research ◽

Replication Studies ◽

Wide Range ◽

Small Sample Sizes ◽

Meta Analyses

Infant research is making considerable progresses. However, among infant researchers there is growing concern regarding the widespread habit of undertaking studies that have small sample sizes and employ tests with low statistical power (to detect a wide range of possible effects). For many researchers, issues of confidence may be partially resolved by relying on replications. Here, we bring further evidence that the classical logic of confirmation, according to which the result of a replication study confirms the original finding when it reaches statistical significance, could be usefully abandoned. With real examples taken from the infant literature and Monte Carlo simulations, we show that a very wide range of possible replication results would in a formal statistical sense constitute confirmation as they can be explained simply due to sampling error. Thus, often no useful conclusion can be derived from a single or small number of replication studies. We suggest that, in order to accumulate and generate new knowledge, the dichotomous view of replication as confirmatory/disconfirmatory can be replaced by an approach that emphasizes the estimation of effect sizes via meta-analysis. Moreover, we discuss possible solutions for reducing problems affecting the validity of conclusions drawn from meta-analyses in infant research.

Download Full-text

Peto odds ratios demonstrate no advantage over classic odds ratios in meta-analysis of binary rare outcomes

10.1101/2020.10.13.20212290 ◽

2020 ◽

Author(s):

Chang Xu ◽

Luis Furuya-Kanamori ◽

Lifeng Lin ◽

Suhail A. Doi

Keyword(s):

Event Rate ◽

Meta Analysis ◽

Rare Event ◽

Effect Sizes ◽

Binary Outcomes ◽

Odds Ratios ◽

P Values ◽

Cochrane Database ◽

Meta Analyses ◽

The Impact

AbstractIn this study, we examined the discrepancy between large studies and small studies in meta-analyses of rare event outcomes and the impact of Peto versus the classic odds ratios (ORs) through empirical data from the Cochrane Database of Systematic Reviews that collected from January 2003 to May 2018. Meta-analyses of binary outcomes with rare events (event rate ≤5%), with at least 5 studies, and with at least one large study (N≥1000) were extracted. The Peto and classic ORs were used as the effect sizes in the meta-analyses, and the magnitude and direction of the ORs of the meta-analyses of large studies versus small studies were compared. The p-values of the meta-analyses of small studies were examined to assess if the Peto and the classic OR methods gave similar results. Totally, 214 meta-analyses were included. Over the total 214 pairs of pooled ORs of large studies versus pooled small studies, 66 (30.84%) had a discordant direction (kappa=0.33) when measured by Peto OR and 69 (32.24%) had a discordant direction (kappa=0.22) when measured by classic OR. The Peto ORs resulted in smaller p-values compared to classic ORs in a substantial (83.18%) number of cases. In conclusion, there is considerable discrepancy between large studies and small studies among the results of meta-analyses of sparse data. The use of Peto odds ratios does not improve this situation and is not recommended as it may result in less conservative error estimation.

Download Full-text

Efficacy of interventions used by physiotherapists for patients with headache and migraine—systematic review and meta-analysis

Cephalalgia ◽

10.1177/0333102415597889 ◽

2015 ◽

Vol 36 (5) ◽

pp. 474-492 ◽

Cited By ~ 48

Author(s):

Kerstin Luedtke ◽

Angie Allers ◽

Laura H Schulte ◽

Arne May

Keyword(s):

Systematic Review ◽

Pain Intensity ◽

Statistical Significance ◽

Small Sample ◽

Risk Of Bias ◽

Effect Sizes ◽

Sensitivity Analyses ◽

Controlled Trials ◽

Sample Sizes ◽

Meta Analyses

Aim We aimed to conduct a systematic review evaluating the effectiveness of interventions used by physiotherapists on the intensity, frequency and duration of migraine, tension-type (TTH) and cervicogenic headache (CGH). Methods We performed a systematic search of electronic databases and a hand search for controlled trials. A risk of bias analysis was conducted using the Cochrane risk of bias tool (RoB). Meta-analyses present the combined mean effects; sensitivity analyses evaluate the influence of methodological quality. Results Of 77 eligible trials, 26 were included in the RoB assessment. Twenty trials were included in meta-analyses. Nineteen out of 26 trials had a high RoB in >1 domain. Meta-analyses of all trials indicated a reduction of TTH ( p < 0.0001; mean reduction −1.11 on a 0–10 visual analog scale (VAS); 95% CI −1.64 to −0.57) and CGH ( p = 0.0002; mean reduction −2.52 on a 0–10 VAS; 95% CI −3.86 to −1.19) pain intensity, CGH frequency ( p < 0.00001; mean reduction −1.34 days per month; 95% CI −1.40 to −1.28), and migraine ( p = 0.0001; mean reduction −22.39 hours without relief; 95% CI −33.90 to −10.88) and CGH ( p < 0.00001; mean reduction −1.68 hours per day; 95% CI −2.09 to −1.26) duration. Excluding high RoB trials increased the effect sizes and reached additional statistical significance for migraine pain intensity ( p < 0.00001; mean reduction −1.94 on a 0–10 VAS; 95% CI −2.61 to −1.27) and frequency ( p < 0.00001; mean reduction −9.07 days per month; 95% CI −9.52 to −8.62). Discussion Results suggest a statistically significant reduction in the intensity, frequency and duration of migraine, TTH and CGH. Pain reduction and reduction in CGH frequency do not reach clinically relevant effect sizes. Small sample sizes, inadequate use of headache classification, and other methodological shortcomings reduce the confidence in these results. Methodologically sound, randomized controlled trials with adequate sample sizes are required to provide information on whether and which physiotherapy approach is effective. According to Grading of Recommendations Assessment, Development and Evaluation (GRADE), the current level of evidence is low.

Download Full-text