scholarly journals Influence of Pilot and Small Trials in Meta-Analyses of Behavioral Interventions: A Meta-epidemiological Study

2020 ◽  
Author(s):  
Michael W. Beets ◽  
R. Glenn Weaver ◽  
John P.A. Ioannidis ◽  
Alexis Jones ◽  
Lauren von Klinggraeff ◽  
...  

Abstract Background: Pilot/feasibility or studies with small sample sizes may be associated with inflated effects. This study explores the vibration of effect sizes (VoE) in meta-analyses when considering different inclusion criteria based upon sample size or pilot/feasibility status. Methods: Searches were conducted for meta-analyses of behavioral interventions on topics related to the prevention/treatment of childhood obesity from 01-2016 to 10-2019. The computed summary effect sizes (ES) were extracted from each meta-analysis. Individual studies included in the meta-analyses were classified into one of the following four categories: self-identified pilot/feasibility studies or based upon sample size (N≤100, N>100, and N>370 the upper 75th of sample size). The VoE was defined as the absolute difference (ABS) between the re-estimations of summary ES restricted to study classifications compared to the originally reported summary ES. Concordance (kappa) of statistical significance between summary ES was assessed. Fixed and random effects models and meta-regressions were estimated. Three case studies are presented to illustrate the impact of including pilot/feasibility and N≤100 studies on the estimated summary ES.Results: A total of 1,602 effect sizes, representing 145 reported summary ES, were extracted from 48 meta-analyses containing 603 unique studies (avg. 22 avg. meta-analysis, range 2-108) and included 227,217 participants. Pilot/feasibility and N≤100 studies comprised 22% (0-58%) and 21% (0-83%) of studies. Meta-regression indicated the ABS between the re-estimated and original summary ES where summary ES were comprised of ≥40% of N≤100 studies was 0.29. The ABS ES was 0.46 when summary ES comprised of >80% of both pilot/feasibility and N≤100 studies. Where ≤40% of the studies comprising a summary ES had N>370, the ABS ES ranged from 0.20-0.30. Concordance was low when removing both pilot/feasibility and N≤100 studies (kappa=0.53) and restricting analyses only to the largest studies (N>370, kappa=0.35), with 20% and 26% of the originally reported statistically significant ES rendered non-significant. Reanalysis of the three case study meta-analyses resulted in the re-estimated ES rendered either non-significant or half of the originally reported ES. Conclusions: When meta-analyses of behavioral interventions include a substantial proportion of both pilot/feasibility and N≤100 studies, summary ES can be affected markedly and should be interpreted with caution.

2020 ◽  
Author(s):  
Jonathan Z Bakdash ◽  
Laura Ranee Marusich ◽  
Jared Kenworthy ◽  
Elyssa Twedt ◽  
Erin Zaroukian

Whether in meta-analysis or single experiments, selecting results based on statistical significance leads to overestimated effect sizes, impeding falsification. We critique a quantitative synthesis that used significance to score and select previously published effects for situation awareness-performance associations (Endsley, 2019). How much does selection using statistical significance quantitatively impact results in a meta-analytic context? We evaluate and compare results using significance-filtered effects versus analyses with all effects as-reported. Endsley reported high predictiveness scores and large positive mean correlations but used atypical methods: the hypothesis was used to select papers and effects. Papers were assigned the maximum predictiveness scores if they contained at-least-one significant effect, yet most papers reported multiple effects, and the number of non-significant effects did not impact the score. Thus, the predictiveness score was rarely less than the maximum. In addition, only significant effects were included in Endsley’s quantitative synthesis. Filtering excluded half of all reported effects, with guaranteed minimum effect sizes based on sample size. Results for filtered compared to as-reported effects clearly diverged. Compared to the mean of as-reported effects, the filtered mean was overestimated by 56%. Furthermore, 92% (or 222 out of 241) of the as-reported effects were below the mean of filtered effects. We conclude that outcome-dependent selection of effects is circular, predetermining results and running contrary to the purpose of meta-analysis. Instead of using significance to score and filter effects, meta-analyses should follow established research practices.


2021 ◽  
Author(s):  
Jonathan Z Bakdash ◽  
Laura Ranee Marusich ◽  
Jared Kenworthy ◽  
Elyssa Twedt ◽  
Erin Zaroukian

Whether in meta-analysis or single experiments, selecting results based on statistical significance leads to overestimated effect sizes, impeding falsification. We critique a quantitative synthesis that used significance to score and select previously published effects for situation awareness-performance associations (Endsley, 2019). How much does selection using statistical significance quantitatively impact results in a meta-analytic context? We evaluate and compare results using significance-filtered effects versus analyses with all effects as-reported. Endsley reported high predictiveness scores and large positive mean correlations but used atypical methods: the hypothesis was used to select papers and effects. Papers were assigned the maximum predictiveness scores if they contained at-least-one significant effect, yet most papers reported multiple effects, and the number of non-significant effects did not impact the score. Thus, the predictiveness score was rarely less than the maximum. In addition, only significant effects were included in Endsley’s quantitative synthesis. Filtering excluded half of all reported effects, with guaranteed minimum effect sizes based on sample size. Results for filtered compared to as-reported effects clearly diverged. Compared to the mean of as-reported effects, the filtered mean was overestimated by 56%. Furthermore, 92% (or 222 out of 241) of the as-reported effects were below the mean of filtered effects. We conclude that outcome-dependent selection of effects is circular, predetermining results and running contrary to the purpose of meta-analysis. Instead of using significance to score and filter effects, meta-analyses should follow established research practices.


2020 ◽  
Vol 11 ◽  
Author(s):  
Jonathan Z. Bakdash ◽  
Laura R. Marusich ◽  
Jared B. Kenworthy ◽  
Elyssa Twedt ◽  
Erin G. Zaroukian

Whether in meta-analysis or single experiments, selecting results based on statistical significance leads to overestimated effect sizes, impeding falsification. We critique a quantitative synthesis that used significance to score and select previously published effects for situation awareness-performance associations (Endsley, 2019). How much does selection using statistical significance quantitatively impact results in a meta-analytic context? We evaluate and compare results using significance-filtered effects versus analyses with all effects as-reported. Endsley reported high predictiveness scores and large positive mean correlations but used atypical methods: the hypothesis was used to select papers and effects. Papers were assigned the maximum predictiveness scores if they contained at-least-one significant effect, yet most papers reported multiple effects, and the number of non-significant effects did not impact the score. Thus, the predictiveness score was rarely less than the maximum. In addition, only significant effects were included in Endsley’s quantitative synthesis. Filtering excluded half of all reported effects, with guaranteed minimum effect sizes based on sample size. Results for filtered compared to as-reported effects clearly diverged. Compared to the mean of as-reported effects, the filtered mean was overestimated by 56%. Furthermore, 92% (or 222 out of 241) of the as-reported effects were below the mean of filtered effects. We conclude that outcome-dependent selection of effects is circular, predetermining results and running contrary to the purpose of meta-analysis. Instead of using significance to score and filter effects, meta-analyses should follow established research practices.


2021 ◽  
Vol 5 (1) ◽  
pp. e100135
Author(s):  
Xue Ying Zhang ◽  
Jan Vollert ◽  
Emily S Sena ◽  
Andrew SC Rice ◽  
Nadia Soliman

ObjectiveThigmotaxis is an innate predator avoidance behaviour of rodents and is enhanced when animals are under stress. It is characterised by the preference of a rodent to seek shelter, rather than expose itself to the aversive open area. The behaviour has been proposed to be a measurable construct that can address the impact of pain on rodent behaviour. This systematic review will assess whether thigmotaxis can be influenced by experimental persistent pain and attenuated by pharmacological interventions in rodents.Search strategyWe will conduct search on three electronic databases to identify studies in which thigmotaxis was used as an outcome measure contextualised to a rodent model associated with persistent pain. All studies published until the date of the search will be considered.Screening and annotationTwo independent reviewers will screen studies based on the order of (1) titles and abstracts, and (2) full texts.Data management and reportingFor meta-analysis, we will extract thigmotactic behavioural data and calculate effect sizes. Effect sizes will be combined using a random-effects model. We will assess heterogeneity and identify sources of heterogeneity. A risk-of-bias assessment will be conducted to evaluate study quality. Publication bias will be assessed using funnel plots, Egger’s regression and trim-and-fill analysis. We will also extract stimulus-evoked limb withdrawal data to assess its correlation with thigmotaxis in the same animals. The evidence obtained will provide a comprehensive understanding of the strengths and limitations of using thigmotactic outcome measure in animal pain research so that future experimental designs can be optimised. We will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses reporting guidelines and disseminate the review findings through publication and conference presentation.


1990 ◽  
Vol 24 (3) ◽  
pp. 405-415 ◽  
Author(s):  
Nathaniel McConaghy

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.


Author(s):  
Michael J. Lambert ◽  
Jason L. Whipple ◽  
Maria Kleinstäuber

This meta-analysis examines the impact of measuring, monitoring, and feeding back information on client progress to clinicians while they deliver psychotherapy. It considers the effects of the two most frequently studied routine outcome monitoring practices: the Partners for Change Outcome System and the Outcome Questionnaire System. Meta-analyses of 24 studies produced effect sizes from small to moderate. Feedback practices reduced deterioration rates and nearly doubled clinically significant/reliable change rates in clients who were predicted to have a poor outcome. Clinical examples, diversity considerations, and therapeutic advances are provided.


Circulation ◽  
2021 ◽  
Vol 143 (Suppl_1) ◽  
Author(s):  
Kimberly L Savin ◽  
Linda C Gallo ◽  
Britta A Larsen

Introduction: Pregnant women with diabetes often show low levels of physical activity (PA) and high sedentary behavior (SED). Longitudinal studies with objective measures are needed to understand the relationships of daily PA with daily and next-day blood glucose (BG). Hypothesis: Increased steps or moderate to vigorous PA (MVPA) and decreased SED are linked with lower post-meal BG and next day fasting BG in pregnant women. Methods: Participants were 10 pregnant women with diabetes [mean age= 29.3 (SD= 3.6); mean gestational age= 21.9 (SD= 3.9); 90% (9 of 10) Latina] enrolled in a 12-week pilot PA intervention. Participants self-reported demographic and BG data (morning fasting BG, up to 3 daily post-meal BGs). Steps, MVPA (mins/day), and SED (mins/day) were measured using a Fitbit Alta HR. Participants had on average 49 (range: 21 to 77) days with valid PA and BG data, for a total of 469 observations. Multi-level models (MLMs) were fit to examine mean and day-level effects of steps, MVPA, and SED on post-meal and next-day fasting BG after adjusting for age, gestational age, education, and participant mean PA or SED. Due to the small sample size, effect sizes are emphasized in results instead of statistical significance. Results: The mean post-meal BG was 122.5 mg/dL and mean fasting BG was 92.81 mg/dL. After adjustment, an increase of mean steps by 1000 was linked to a lower mean post-meal BG by 11.79 mg/dL (p=0.22) and fasting BG by 7.26 mg/dL (p=0.54), though neither between effect was statistically significant. The within-individual effects of daily steps on post-meal and fasting BG were very small and non-significant (b=-1.78; p=0.59; b=0.72; p=0.30, respectively). A 1-minute increase in mean MVPA was associated with a slight increase in mean post-meal BG by 1.53 mg/dL (p=0.07). The within-individual effect of daily MVPA on daily post-meal BG was negligible and non-significant (b=-0.39, p=0.51). Between-individual effects showed SED had small, positive, non-significant associations with post-meal BG. Specifically, per 60-minute mean SED increase, mean post-meal BG increased by 1.02 mg/dL (p=0.44). Within-individual daily SED increases of 60 minutes were associated with increases of 1.87 mg/dL (p=0.63) in daily post-meal BG. MVPA and SED were not associated with fasting BG. Conclusions: Greater mean steps were linked to lower post-meal and fasting BG while greater SED and MVPA were linked to greater post-meal BG. However, within individual daily increases in MVPA and decreases in SED, were protective for post-meal BG, while controlling for individual mean MVPA and SED. Most effect sizes were small and results were not statistically significant in part due to the small sample size. Participants generally had well-controlled post-meal and fasting BGs, so results may not be generalizable to larger populations.


2019 ◽  
Author(s):  
Francesco Margoni ◽  
Martin Shepperd

Infant research is making considerable progresses. However, among infant researchers there is growing concern regarding the widespread habit of undertaking studies that have small sample sizes and employ tests with low statistical power (to detect a wide range of possible effects). For many researchers, issues of confidence may be partially resolved by relying on replications. Here, we bring further evidence that the classical logic of confirmation, according to which the result of a replication study confirms the original finding when it reaches statistical significance, could be usefully abandoned. With real examples taken from the infant literature and Monte Carlo simulations, we show that a very wide range of possible replication results would in a formal statistical sense constitute confirmation as they can be explained simply due to sampling error. Thus, often no useful conclusion can be derived from a single or small number of replication studies. We suggest that, in order to accumulate and generate new knowledge, the dichotomous view of replication as confirmatory/disconfirmatory can be replaced by an approach that emphasizes the estimation of effect sizes via meta-analysis. Moreover, we discuss possible solutions for reducing problems affecting the validity of conclusions drawn from meta-analyses in infant research.


2020 ◽  
Author(s):  
Chang Xu ◽  
Luis Furuya-Kanamori ◽  
Lifeng Lin ◽  
Suhail A. Doi

AbstractIn this study, we examined the discrepancy between large studies and small studies in meta-analyses of rare event outcomes and the impact of Peto versus the classic odds ratios (ORs) through empirical data from the Cochrane Database of Systematic Reviews that collected from January 2003 to May 2018. Meta-analyses of binary outcomes with rare events (event rate ≤5%), with at least 5 studies, and with at least one large study (N≥1000) were extracted. The Peto and classic ORs were used as the effect sizes in the meta-analyses, and the magnitude and direction of the ORs of the meta-analyses of large studies versus small studies were compared. The p-values of the meta-analyses of small studies were examined to assess if the Peto and the classic OR methods gave similar results. Totally, 214 meta-analyses were included. Over the total 214 pairs of pooled ORs of large studies versus pooled small studies, 66 (30.84%) had a discordant direction (kappa=0.33) when measured by Peto OR and 69 (32.24%) had a discordant direction (kappa=0.22) when measured by classic OR. The Peto ORs resulted in smaller p-values compared to classic ORs in a substantial (83.18%) number of cases. In conclusion, there is considerable discrepancy between large studies and small studies among the results of meta-analyses of sparse data. The use of Peto odds ratios does not improve this situation and is not recommended as it may result in less conservative error estimation.


Cephalalgia ◽  
2015 ◽  
Vol 36 (5) ◽  
pp. 474-492 ◽  
Author(s):  
Kerstin Luedtke ◽  
Angie Allers ◽  
Laura H Schulte ◽  
Arne May

Aim We aimed to conduct a systematic review evaluating the effectiveness of interventions used by physiotherapists on the intensity, frequency and duration of migraine, tension-type (TTH) and cervicogenic headache (CGH). Methods We performed a systematic search of electronic databases and a hand search for controlled trials. A risk of bias analysis was conducted using the Cochrane risk of bias tool (RoB). Meta-analyses present the combined mean effects; sensitivity analyses evaluate the influence of methodological quality. Results Of 77 eligible trials, 26 were included in the RoB assessment. Twenty trials were included in meta-analyses. Nineteen out of 26 trials had a high RoB in >1 domain. Meta-analyses of all trials indicated a reduction of TTH ( p < 0.0001; mean reduction −1.11 on a 0–10 visual analog scale (VAS); 95% CI −1.64 to −0.57) and CGH ( p = 0.0002; mean reduction −2.52 on a 0–10 VAS; 95% CI −3.86 to −1.19) pain intensity, CGH frequency ( p < 0.00001; mean reduction −1.34 days per month; 95% CI −1.40 to −1.28), and migraine ( p = 0.0001; mean reduction −22.39 hours without relief; 95% CI −33.90 to −10.88) and CGH ( p < 0.00001; mean reduction −1.68 hours per day; 95% CI −2.09 to −1.26) duration. Excluding high RoB trials increased the effect sizes and reached additional statistical significance for migraine pain intensity ( p < 0.00001; mean reduction −1.94 on a 0–10 VAS; 95% CI −2.61 to −1.27) and frequency ( p < 0.00001; mean reduction −9.07 days per month; 95% CI −9.52 to −8.62). Discussion Results suggest a statistically significant reduction in the intensity, frequency and duration of migraine, TTH and CGH. Pain reduction and reduction in CGH frequency do not reach clinically relevant effect sizes. Small sample sizes, inadequate use of headache classification, and other methodological shortcomings reduce the confidence in these results. Methodologically sound, randomized controlled trials with adequate sample sizes are required to provide information on whether and which physiotherapy approach is effective. According to Grading of Recommendations Assessment, Development and Evaluation (GRADE), the current level of evidence is low.


Sign in / Sign up

Export Citation Format

Share Document