scholarly journals "P Value" or "Number Needed to Treat": Which One Is Better for Evaluating Clinical Treatments’ Benefits?

2021 ◽  
Vol 6 (3) ◽  
pp. 74-75
Author(s):  
Soudabeh Hamedi-Shahraki ◽  
Farshad Amirkhizi

Statistical significance does not necessarily mean clinical significance. A P value less than 0.05 does not guarantee the clinical effectiveness of a treatment. To assess the clinical valuable of a treatment, the effect size must be calculated. The number needed to treat (NNT) is an example of an effect size measure that can be very helpful in determining the clinical significance of a treatment. Therefore, it is recommended for all researchers and physicians to look beyond the P value and calculate the NNT for assessing the clinical significance of therapeutic measures and agents.

2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
H Cai ◽  
B W Mol ◽  
S Gordts ◽  
H Wang ◽  
J Shi

Abstract Study question If the elective single-blastocyst transfer (eSBT) strategy can be applied to women aged 36 or older. Summary answer In women ≥36 years old with at least two blastocysts, eSBT increased cumulative livebirth rate (LBR) while minimizing twins compared with double blastocyst transfer (DBT). What is known already: In young women with a good prognosis, eSBT policy is an accepted strategy to maintain LBR while decreasing multiple gestation. However, in many areas of the world DBT is still applied in older women. Study design, size, duration We performed a retrospective cohort study of 429 women aged ≥36 years or older who received IVF ovarian stimulation cycles between Jan 2015 and Oct 2018 and who had at least two blastocysts. Women were followed up until Oct 2020 for their fertility outcomes including cumulative live birth and multiple pregnancies. The study was performed at the Northwest Women and Children’s Hospital, Xi’an, China. Participants/materials, setting, methods Out of 429 women, 240 underwent a fresh cycle of eSBT and 189 DBT. The subsequent frozen-thawed embryo transfer cycles were a combination of single- and double- blastocyst transfers, more commonly the latter. Analysis was stratified for patients in age groups 36–37, 38–39 and ≥40 and quality of the blastocyst transferred, as graded by morphological examination. Outcomes were the LBR in the fresh cycle, cumulative LBR and multiple rate after fresh and frozen embryo transfers. Main results and the role of chance The cumulative LBR was 74.2% (178/240) for eSBT versus 63.0% (119/189) for DBT (OR = 1.69, 95%CI 1.12–2.56), irrespective of female age. The multiple rate was 9% (16/178) after eSBT versus 29.4% (35/119) after DBT (P- value < .001). The total number of children born was 194 after eSBT versus 154 after DBT. Stratified by female age, the cumulative LBRs in women aged 36–37 (78.9 vs 70.5%), 38–39 (68.9 and 61.1%) and ≥ 40 years (59.3 and 47.5%), were higher after eSBT compared with DBT, however, the differences did not reach statistical significance in each subgroups. LBRs in the fresh cycles were comparable for patients with eSBT compared with DBT (52.1% vs. 52.4%, OR = 0.99, 95%CI 0.68–1.45). In women < 40 years, DBT resulted in a small non-significant increase in LBR in the fresh transfer (63.2% vs. 61.2%, 95%CI=0.64–1.85, 36–37 years; 48.1% vs. 41.0%, 95%CI=0.64–2.80, 38–39 years) at the expense of a marked increase in twinning rate (0–5.4% vs. 31.7–34.6%). For women ≥40 years, no significant differences were observed in the LBR (37.0% vs 45%, 95%CI 0.47–4.07) or twinning rate (0 vs 7.7%) between eSBT and DBT group. The findings persisted with and without accounting for quality of the blastocyst transferred. Limitations, reasons for caution This study is limited by its observational character. Wider implications of the findings: In women ≥36 years with two blastocysts, eSBT should be the preferred treatment which maximizes the cumulative LBR for a decrease in the rate of multiple pregnancies. Trial registration number Not applicable


2020 ◽  
Author(s):  
Jörn Lötsch ◽  
Alfred Ultsch

Abstract Calculating the magnitude of treatment effects or of differences between two groups is a common task in quantitative science. Standard effect size measures based on differences, such as the commonly used Cohen's, fail to capture the treatment-related effects on the data if the effects were not reflected by the central tendency. "Impact” is a novel nonparametric measure of effect size obtained as the sum of two separate components and includes (i) the change in the central tendency of the group-specific data, normalized to the overall variability, and (ii) the difference in the probability density of the group-specific data. Results obtained on artificial data and empirical biomedical data showed that impact outperforms Cohen's d by this additional component. It is shown that in a multivariate setting, while standard statistical analyses and Cohen’s d are not able to identify effects that lead to changes in the form of data distribution, “Impact” correctly captures them. The proposed effect size measure shares the ability to observe such an effect with machine learning algorithms. It is numerically stable even for degenerate distributions consisting of singular values. Therefore, the proposed effect size measure is particularly well suited for data science and artificial intelligence-based knowledge discovery from (big) and heterogeneous data.


2010 ◽  
Vol 10 (2) ◽  
pp. 545-555 ◽  
Author(s):  
Guillermo Macbeth ◽  
Eugenia Razumiejczyk ◽  
Rubén Daniel Ledesma

The Cliff´s Delta statistic is an effect size measure that quantifies the amount of difference between two non-parametric variables beyond p-values interpretation. This measure can be understood as a useful complementary analysis for the corresponding hypothesis testing. During the last two decades the use of effect size measures has been strongly encouraged by methodologists and leading institutions of behavioral sciences. The aim of this contribution is to introduce the Cliff´s Delta Calculator software that performs such analysis and offers some interpretation tips. Differences and similarities with the parametric case are analysed and illustrated. The implementation of this free program is fully described and compared with other calculators. Alternative algorithmic approaches are mathematically analysed and a basic linear algebra proof of its equivalence is formally presented. Two worked examples in cognitive psychology are commented. A visual interpretation of Cliff´s Delta is suggested. Availability, installation and applications of the program are presented and discussed.


2017 ◽  
Vol 38 (5) ◽  
pp. 551-557 ◽  
Author(s):  
Hiok Yang Chan ◽  
Jerry Yongqiang Chen ◽  
Suraya Zainul-Abidin ◽  
Hao Ying ◽  
Kevin Koo ◽  
...  

Background: The American Orthopaedic Foot & Ankle Society (AOFAS) score is one of the most common and adapted outcome scales in hallux valgus surgery. However, AOFAS is predominantly physician based and not patient based. Although it may be straightforward to derive statistical significance, it may not equate to the true subjective benefit of the patient’s experience. There is a paucity of literature defining MCID for AOFAS in hallux valgus surgery although it could have a great impact on the accuracy of analyzing surgical outcomes. Hence, the primary aim of this study was to define the Minimal Clinically Important Difference (MCID) for the AOFAS score in these patients, and the secondary aim was to correlate patients’ demographics to the MCID. Methods: We conducted a retrospective cross-sectional study. A total of 446 patients were reviewed preoperatively and followed up for 2 years. An anchor question was asked 2 years postoperation: “How would you rate the overall results of your treatment for your foot and ankle condition?” (excellent, very good, good, fair, poor, terrible). The MCID was derived using 4 methods, 3 from an anchor-based approach and 1 from a distribution-based approach. Anchor-based approaches were (1) mean difference in 2-year AOFAS scores of patients who answered “good” versus “fair” based on the anchor question; (2) mean change of AOFAS score preoperatively and at 2-year follow-up in patients who answered good; (3) receiver operating characteristic (ROC) curves method, where the area under the curve (AUC) represented the likelihood that the scoring system would accurately discriminate these 2 groups of patients. The distribution-based approach used to calculate MCID was the effect size method. There were 405 (90.8%) females and 41 (9.2%) males. Mean age was 51.2 (standard deviation [SD] = 13) years, mean preoperative BMI was 24.2 (SD = 4.1). Results: Mean preoperative AOFAS score was 55.6 (SD = 16.8), with significant improvement to 85.7 (SD = 14.4) in 2 years ( P value < .001). There were no statistical differences between demographics or preoperative AOFAS scores of patients with good versus fair satisfaction levels. At 2 years, patients who had good satisfaction had higher AOFAS scores than fair satisfaction (83.9 vs 78.1, P < .001) and higher mean change (30.2 vs 22.3, P = .015). Mean change in AOFAS score in patients with good satisfaction was 30.2 (SD = 19.8). Mean difference in good versus fair satisfaction was 7.9. Using ROC analysis, the cut-off point is 29.0, with an area under the curve (AUC) of 0.62. Effect size method derived an MCID of 8.4 with a moderate effect size of 0.5. Multiple linear regression demonstrated increasing age (β = −0.129, CI = −0.245, –0.013, P = .030) and higher preoperative AOFAS score (β = −0.874, CI = −0.644, –0.081, P < .001) to significantly decrease the amount of change in the AOFAS score. Conclusion: The MCID of AOFAS score in hallux valgus surgery was 7.9 to 30.2. The MCID can ensure clinical improvement from a patient’s perspective and also aid in interpreting results from clinical trials and other studies. Level of Evidence: Level III, retrospective comparative series.


2014 ◽  
Vol 52 (2) ◽  
pp. 213-230 ◽  
Author(s):  
Hariharan Swaminathan ◽  
H. Jane Rogers ◽  
Robert H. Horner

2021 ◽  
Author(s):  
Mirka Henninger ◽  
Rudolf Debelak ◽  
Carolin Strobl

To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular ETS classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.


Sign in / Sign up

Export Citation Format

Share Document