The Clinical Significance of Effect Sizes for Survival and Tumor Response Endpoints Using the Empirical Rule Effect Size

Abstract Background. Cancer related anemia (CRA) is associated with increased symptom burden and reduced quality of life (QoL) compared to non-anemic subjects. Standard treatment guidelines for CRA recomend treatment when hemoglobin (Hb) declines to ≤ 10 g/dl. We compared patients with mild anemia to those with normal Hb level to assess clinical significance of mild anemia. Methods. 3416 patients from a large community oncology database sorted by gender and Hb were retrospectively reviewed. Patients receiving chemotherapy (<30 days) and/or growth factor (< 60 days) were excluded. Each case provided one Hb determination and same day self reported scores on the Cancer Care Monitor (CCM), a validated measure of symptom burden, functioning, and health related QoL comprising 6 scales. All CCM items are rated for symptom severity on a 0–10 scale. Effect sizes for male groups are calculated relative to normal males (Hb > 14), and effect sizes for females are calculated relative to normal females (Hb > 12). Positive effect size (Cohen’s exact d) values indicate greater symptom burden, and negative effect size values indicate lower quality of life. Results. Group differences on CCM measures were not accounted for by demographic, cancer diagnosis, and chemotherapy history differences. Table 1 shows a pattern of greater symptom burden, lower functioning, and worse QOL for males and females with mild anemia (p < .05). As compared to standards for minimal clinically important differences such as found with SF-36 (0.09 < Cohen’s d < 0.28). patients with mild anemia showed clinically significant differences in terms of effect size (0.11 < Cohen’s d < 0.61). Conclusions. The QoL impact of mild CRA is significant and failure to treat mild anemia may result in unnecessary symptom burden and noteworthy decrements in health related quality of life. Differences in QoL for Patients with Mild Anemia M Hb12<14 M Hb >14 Effect Size F Hb 10<12 F Hb >12 Effect Size Cancer Care Monitor Measure* (n = 327) (n = 471) Males (n = 449) (n = 2169) Females Note. *Mean (SE) for each item or scale. Adjusted means with different subscripts across rows are significantly different (p < .05) by Bryant Paulson comparisons. Fatigue Item 3.10 (0.16) b 2.27 (0.13) c 0.30 3.47 (0.13) b 2.65 (0.06) d 0.30 Physical Symptoms 49.54 (0.58) b 45.49 (0.48) c 0.38 49.32 (0.49) b 46.97 (0.22) d 0.22 General Distress 49.51 (0.57) b 47.07 (0.47) d 0.19 51.23 (0.49) c 49.85 (0.22) b 0.11 Despair 50.74 (0.52) b 48.58 (0.43) c 0.25 50.37 (0.44) b 49.13 (0.19) c 0.14 Impaired Performance 55.55 (0.87) b 49.78 (0.70) c 0.61 55.02 (0.73) b 49.90 (0.32) c 0.48 Quality of Life 49.41 (0.79) b 53.46 (0.64) c −0.45 48.95 (0.67) b 52.39 (0.29) c −0.35

Download Full-text

A clinical significance analysis of manualised psychological interventions for obsessive-compulsive disorder

BJPsych Open ◽

10.1192/bjo.2021.758 ◽

2021 ◽

Vol 7 (S1) ◽

pp. S285-S285

Author(s):

Jake Rigby ◽

Peter Fisher ◽

Gemma Cherry ◽

Taylor Stuart ◽

James Temple

Keyword(s):

Clinical Significance ◽

Effect Size ◽

Post Treatment ◽

Effect Sizes ◽

Group Effect ◽

Obsessive Compulsive ◽

Psychological Treatments ◽

Compulsive Disorder ◽

Treatment Type

AimsTo conduct an individual patient data meta-analysis of randomised controlled trials (RCTs) of manualised psychological treatments for obsessive-compulsive disorder (OCD), and examine the differential efficacy of psychological treatments by treatment type and format.BackgroundPrevious meta-analyses conclude that efficacious psychological treatments for OCD exist. However, determining the efficacy of psychological treatments requires multiple forms of assessment across a range of indexes, yet most previous meta-analyses in OCD are based solely on effect sizes.MethodWe evaluated treatment efficacy across 24 RCTs (n = 1,667) by conducting clinical significance analyses (using standardised Jacobson methodology) and standardised mean difference within-group effect-size analyses. Outcomes were Yale-Brown Obsessive Compulsive Scale (Y-BOCS) scores, evaluated at post-treatment and follow-up (3-6 months post-treatment).ResultPost-treatment, there was a large significant within-group effect size for treated patients (g = 1.28) and a small significant effect size for controls (g = 0.30). At follow-up, large within-group effect sizes were found for both treated patients (g = 1.45) and controls (g = 0.90). Clinical significance analyses indicated that treated patients were significantly more likely than controls to recover following an intervention, but recovery rates were low; post-intervention, only 32% of treated patients and 3% of controls recovered; rising to 38% and 21% respectively at follow-up. Regardless of allocation, only approximately 20% of patients were asymptomatic at follow-up. Across the different analysis methods, individual cognitive therapy (CT) was the most effective intervention, followed by group CT plus exposure and response prevention. Self-help interventions were generally less effective.ConclusionReliance on aggregated within-group effect sizes may lead to overestimation of the efficacy of psychological treatments for OCD. More research is needed to determine the most effective treatment type and format for patients with OCD.

Download Full-text

Clinically Meaningful Change

Methodology ◽

10.1027/1614-2241/a000168 ◽

2019 ◽

Vol 15 (3) ◽

pp. 97-105

Author(s):

Rodrigo Ferrer ◽

Antonio Pardo

Keyword(s):

Effect Size ◽

False Negative ◽

False Negative Rate ◽

Point Of View ◽

Skewed Distribution ◽

Effect Sizes ◽

False Negatives ◽

Large Size ◽

Before And After ◽

Post Test

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.

Download Full-text

Blinding, sham, and treatment effects in randomized controlled trials for back pain in 2000–2019: A review and meta-analytic approach

Clinical Trials ◽

10.1177/1740774520984870 ◽

2021 ◽

pp. 174077452098487

Author(s):

Brian Freed ◽

Brian Williams ◽

Xiaolu Situ ◽

Victoria Landsman ◽

Jeehyoung Kim ◽

...

Keyword(s):

Back Pain ◽

Active Treatment ◽

Treatment Effect ◽

Effect Size ◽

Treatment Effects ◽

Effect Sizes ◽

Controlled Trials ◽

Sham Treatment ◽

Randomized Controlled ◽

Random Guess

Background: Blinding aims to minimize biases from what participants and investigators know or believe. Randomized controlled trials, despite being the gold standard to evaluate treatment effect, do not generally assess the success of blinding. We investigated the extent of blinding in back pain trials and the associations between participant guesses and treatment effects. Methods: We did a review with PubMed/OvidMedline, 2000–2019. Eligibility criteria were back pain trials with data available on treatment effect and participants’ guess of treatment. For blinding, blinding index was used as chance-corrected measure of excessive correct guess (0 for random guess). For treatment effects, within- or between-arm effect sizes were used. Analyses of investigators’ guess/blinding or by treatment modality were performed exploratorily. Results: Forty trials (3899 participants) were included. Active and sham treatment groups had mean blinding index of 0.26 (95% confidence interval: 0.12, 0.41) and 0.01 (−0.11, 0.14), respectively, meaning 26% of participants in active treatment believed they received active treatment, whereas only 1% in sham believed they received sham treatment, beyond chance, that is, random guess. A greater belief of receiving active treatment was associated with a larger within-arm effect size in both arms, and ideal blinding (namely, “random guess,” and “wishful thinking” that signifies both groups believing they received active treatment) showed smaller effect sizes, with correlation of effect size and summary blinding indexes of 0.35 ( p = 0.028) for between-arm comparison. We observed uniformly large sham treatment effects for all modalities, and larger correlation for investigator’s (un)blinding, 0.53 ( p = 0.046). Conclusion: Participants in active treatments in back pain trials guessed treatment identity more correctly, while those in sham treatments tended to display successful blinding. Excessive correct guesses (that could reflect weaker blinding and/or noticeable effects) by participants and investigators demonstrated larger effect sizes. Blinding and sham treatment effects on back pain need due consideration in individual trials and meta-analyses.

Download Full-text

A Statistical Method for Synthesizing Meta-Analyses

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/732989 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Liansheng Larry Tang ◽

Michael Caudy ◽

Faye Taxman

Keyword(s):

Statistical Method ◽

Effect Size ◽

Meta Analysis ◽

Effect Sizes ◽

Weighted Mean ◽

Summary Effect ◽

Different Types ◽

Meta Analyses ◽

Size Estimates ◽

Similar Search

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.

Download Full-text

The use of effect size indices to determine practical significance

Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie ◽

10.4102/satnt.v25i3.157 ◽

2006 ◽

Vol 25 (3) ◽

Author(s):

H. S. Styn ◽

S. M. Ellis

Keyword(s):

Effect Size ◽

Statistical Significance ◽

Empirical Studies ◽

Research Literature ◽

Effect Sizes ◽

Practical Significance ◽

Significance Tests ◽

Statistical Application ◽

Significant Difference

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.

Download Full-text

Changes of Ankle Dorsiflexion Using Compression Tissue Flossing: A Systematic Review and Meta-Analysis

Journal of Sport Rehabilitation ◽

10.1123/jsr.2020-0129 ◽

2020 ◽

pp. 1-9

Author(s):

Devin S. Kielur ◽

Cameron J. Powden

Keyword(s):

Confidence Interval ◽

Range Of Motion ◽

Effect Size ◽

Evidence Synthesis ◽

Meta Analysis ◽

Effect Sizes ◽

Lower Extremity Injury ◽

Level Of Evidence ◽

Physically Active ◽

Evidence Database

Context: Impaired dorsiflexion range of motion (DFROM) has been established as a predictor of lower-extremity injury. Compression tissue flossing (CTF) may address tissue restrictions associated with impaired DFROM; however, a consensus is yet to support these effects. Objectives: To summarize the available literature regarding CTF on DFROM in physically active individuals. Evidence Acquisition: PubMed and EBSCOhost (CINAHL, MEDLINE, and SPORTDiscus) were searched from 1965 to July 2019 for related articles using combination terms related to CTF and DRFOM. Articles were included if they measured the immediate effects of CTF on DFROM. Methodological quality was assessed using the Physiotherapy Evidence Database scale. The level of evidence was assessed using the Strength of Recommendation Taxonomy. The magnitude of CTF effects from pre-CTF to post-CTF and compared with a control of range of motion activities only were examined using Hedges g effect sizes and 95% confidence intervals. Randomeffects meta-analysis was performed to synthesize DFROM changes. Evidence Synthesis: A total of 6 studies were included in the analysis. The average Physiotherapy Evidence Database score was 60% (range = 30%–80%) with 4 out of 6 studies considered high quality and 2 as low quality. Meta-analysis indicated no DFROM improvements for CTF compared with range of motion activities only (effect size = 0.124; 95% confidence interval, −0.137 to 0.384; P = .352) and moderate improvements from pre-CTF to post-CTF (effect size = 0.455; 95% confidence interval, 0.022 to 0.889; P = .040). Conclusions: There is grade B evidence to suggest CTF may have no effect on DFROM when compared with a control of range of motion activities only and results in moderate improvements from pre-CTF to post-CTF. This suggests that DFROM improvements were most likely due to exercises completed rather than the band application.

Download Full-text

Effect sizes and test-retest reliability of the fMRI-based Neurologic Pain Signature

10.1101/2021.05.29.445964 ◽

2021 ◽

Author(s):

Xiaochun Han ◽

Yoni K. Ashar ◽

Philip Kragel ◽

Bogdan Petre ◽

Victoria Schelkun ◽

...

Keyword(s):

Individual Differences ◽

Effect Size ◽

Mental States ◽

Effect Sizes ◽

Medium Effect ◽

Retest Reliability ◽

Medium Effect Size ◽

Pain Reports ◽

Nociceptive Input ◽

Test Retest Reliability

Identifying biomarkers that predict mental states with large effect sizes and high test-retest reliability is a growing priority for fMRI research. We examined a well-established multivariate brain measure that tracks pain induced by nociceptive input, the Neurologic Pain Signature (NPS). In N = 295 participants across eight studies, NPS responses showed a very large effect size in predicting within-person single-trial pain reports (d = 1.45) and medium effect size in predicting individual differences in pain reports (d = 0.49, average r = 0.20). The NPS showed excellent short-term (within-day) test-retest reliability (ICC = 0.84, with average 69.5 trials/person). Reliability scaled with the number of trials within-person, with ≥60 trials required for excellent test-retest reliability. Reliability was comparable in two additional studies across 5-day (N = 29, ICC = 0.74, 30 trials/person) and 1-month (N = 40, ICC = 0.46, 5 trials/person) test-retest intervals. The combination of strong within-person correlations and only modest between-person correlations between the NPS and pain reports indicates that the two measures have different sources of between-person variance. The NPS is not a surrogate for individual differences in pain reports, but can serve as a reliable measure of pain-related physiology and mechanistic target for interventions.

Download Full-text

Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing ‐ A Monte Carlo Simulation Approach.

Computational Communication Research ◽

10.5117/ccr2021.1.003.geis ◽

2021 ◽

Vol 3 (1) ◽

pp. 61-89

Author(s):

Stefan Geiß

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Content Analysis ◽

Sample Size ◽

Effect Size ◽

Statistical Power ◽

Effect Sizes ◽

Sample Sizes ◽

Expected Effect ◽

Sample Size Effect

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.

Download Full-text

Within-person variability in men’s facial width-to-height ratio

PeerJ ◽

10.7717/peerj.1801 ◽

2016 ◽

Vol 4 ◽

pp. e1801 ◽

Cited By ~ 10

Author(s):

Robin S.S. Kramer

Keyword(s):

Facial Expressions ◽

Effect Size ◽

Focal Length ◽

Effect Sizes ◽

Emotional Expressions ◽

Head Pose ◽

Height Ratio ◽

Camera Parameters ◽

Study Designs ◽

The Relationship

Background.In recent years, researchers have investigated the relationship between facial width-to-height ratio (FWHR) and a variety of threat and dominance behaviours. The majority of methods involved measuring FWHR from 2D photographs of faces. However, individuals can vary dramatically in their appearance across images, which poses an obvious problem for reliable FWHR measurement.Methods.I compared the effect sizes due to the differences between images taken with unconstrained camera parameters (Studies 1 and 2) or varied facial expressions (Study 3) to the effect size due to identity, i.e., the differences between people. In Study 1, images of Hollywood actors were collected from film screenshots, providing the least amount of experimental control. In Study 2, controlled photographs, which only varied in focal length and distance to camera, were analysed. In Study 3, images of different facial expressions, taken in controlled conditions, were measured.Results.Analyses revealed that simply varying the focal length and distance between the camera and face had a relatively small effect on FWHR, and therefore may prove less of a problem if uncontrolled in study designs. In contrast, when all camera parameters (including the camera itself) are allowed to vary, the effect size due to identity was greater than the effect of image selection, but the ranking of the identities was significantly altered by the particular image used. Finally, I found significant changes to FWHR when people posed with four of seven emotional expressions in comparison with neutral, and the effect size due to expression was larger than differences due to identity.Discussion.The results of these three studies demonstrate that even when head pose is limited to forward facing, changes to the camera parameters and a person’s facial expression have sizable effects on FWHR measurement. Therefore, analysing images that fail to constrain some of these variables can lead to noisy and unreliable results, but also relationships caused by previously unconsidered confounds.

Download Full-text