Significance tests in clinical research—Challenges and pitfalls

AbstractBackgroundStatistical analyses are used to help understand the practical significance of the findings in a clinical study. Many clinical researchers appear to have limited knowledge onhowto perform appropriate statistical analysis as well as understanding what the results in fact mean.MethodsThis focal review is based on long experience in supervising clinicians on statistical analysis and advising editors of scientific journals on the quality of statistical analysis applied in scientific reports evaluated for publication.ResultsBasic facts on elementary statistical analyses are presented, and common misunderstandings are elucidated. Efficacy estimates, the effect of sample size, and confidence intervals for effect estimates are reviewed, and the difference between statistical significance and clinical relevance is highlighted. The weaknesses of p-values and misunderstandings in how to interpret them are illustrated with practical examples.Conclusions and recommendationsSome very important questions need to be answered before initiating a clinical trial. What is the research question? To which patients should the result be generalised? Is the number of patients sufficient to draw a valid conclusion? When data are analysed the number of (preplanned) significance tests should be kept small and post hoc analyses should be avoided. It should also be remembered that the clinical relevance of a finding cannot be assessed by the p-value. Thus effect estimates and corresponding 95% confidence intervals should always be reported.

Download Full-text

Effect Size and Effect Uncertainty in Organizational Research Methods

Oxford Research Encyclopedia of Business and Management ◽

10.1093/acrefore/9780190224851.013.238 ◽

2021 ◽

Author(s):

Scott B. Morris ◽

Arash Shokri

Keyword(s):

Confidence Intervals ◽

Effect Size ◽

Sampling Error ◽

Statistical Significance ◽

Scientific Progress ◽

Effect Sizes ◽

Practical Significance ◽

Significance Tests ◽

Wide Range ◽

Research Findings

To understand and communicate research findings, it is important for researchers to consider two types of information provided by research results: the magnitude of the effect and the degree of uncertainty in the outcome. Statistical significance tests have long served as the mainstream method for statistical inferences. However, the widespread misinterpretation and misuse of significance tests has led critics to question their usefulness in evaluating research findings and to raise concerns about the far-reaching effects of this practice on scientific progress. An alternative approach involves reporting and interpreting measures of effect size along with confidence intervals. An effect size is an indicator of magnitude and direction of a statistical observation. Effect size statistics have been developed to represent a wide range of research questions, including indicators of the mean difference between groups, the relative odds of an event, or the degree of correlation among variables. Effect sizes play a key role in evaluating practical significance, conducting power analysis, and conducting meta-analysis. While effect sizes summarize the magnitude of an effect, the confidence intervals represent the degree of uncertainty in the result. By presenting a range of plausible alternate values that might have occurred due to sampling error, confidence intervals provide an intuitive indicator of how strongly researchers should rely on the results from a single study.

Download Full-text

The Other Half of the Story: Effect Size Analysis in Quantitative Research

CBE—Life Sciences Education ◽

10.1187/cbe.13-04-0082 ◽

2013 ◽

Vol 12 (3) ◽

pp. 345-351 ◽

Cited By ~ 134

Author(s):

Jessica Middlemis Maher ◽

Jonathan C. Markey ◽

Diane Ebert-May

Keyword(s):

Educational Research ◽

Effect Size ◽

Quantitative Research ◽

Statistical Significance ◽

The Other ◽

Practical Significance ◽

Significance Testing ◽

Size Analysis ◽

Significance Tests ◽

Statistical Significance Testing

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.

Download Full-text

The use of effect size indices to determine practical significance

Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie ◽

10.4102/satnt.v25i3.157 ◽

2006 ◽

Vol 25 (3) ◽

Author(s):

H. S. Styn ◽

S. M. Ellis

Keyword(s):

Effect Size ◽

Statistical Significance ◽

Empirical Studies ◽

Research Literature ◽

Effect Sizes ◽

Practical Significance ◽

Significance Tests ◽

Statistical Application ◽

Significant Difference

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.

Download Full-text

Association of single-nucleotide polymorphisms in the ESR2 and FSHR genes with poor ovarian response in infertile Jordanian women

Clinical and Experimental Reproductive Medicine ◽

10.5653/cerm.2020.03706 ◽

2021 ◽

Vol 48 (1) ◽

pp. 69-79

Author(s):

Amer Mahmoud Sindiani ◽

Osamah Batiha ◽

Esra’a Al-zoubi ◽

Sara Khadrawi ◽

Ghadeer Alsoukhni ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Statistical Significance ◽

Ovarian Response ◽

Control Group ◽

P Value ◽

Case Group ◽

Nucleotide Polymorphisms ◽

Poor Ovarian Response ◽

Single Nucleotide ◽

Number Of Patients

Objective: Poor ovarian response (POR) refers to a subnormal follicular response that leads to a decrease in the quality and quantity of the eggs retrieved after ovarian stimulation during assisted reproductive treatment (ART). The present study investigated the associations of multiple variants of the estrogen receptor 2 (ESR2) and follicle-stimulating hormone receptor (FSHR) genes with POR in infertile Jordanian women undergoing ART.Methods: Four polymorphisms, namely ESR2 rs1256049, ESR2 rs4986938, FSHR rs6165, and FSHR rs6166, were investigated in 60 infertile Jordanian women undergoing ART (the case group) and 60 age-matched fertile women (the control group), with a mean age of 33.60±6.34 years. Single-nucleotide polymorphisms (SNPs) were detected by restriction fragment length polymorphism and then validated using Sanger sequencing.Results: The p-value of the difference between the case and control groups regarding FSHR rs6166 was very close to 0.05 (p=0.054). However, no significant differences were observed between the two groups in terms of the other three SNPs, namely ESR2 rs1256049, ESR2 rs4986938, and FSHR rs6165 (p=0.561, p=0.433, and p=0.696, respectively).Conclusion: The association between FSHR rs6166 and POR was not statistically meaningful in the present study, but the near-significant result of this experiment suggests that statistical significance might be found in a future study with a larger number of patients.

Download Full-text

Common pitfalls in statistical analysis: "P" values, statistical significance and confidence intervals

Perspectives in Clinical Research ◽

10.4103/2229-3485.154016 ◽

2015 ◽

Vol 6 (2) ◽

pp. 116 ◽

Cited By ~ 3

Author(s):

Priya Ranganathan ◽

Marc Buyse ◽

CS Pramesh

Keyword(s):

Statistical Analysis ◽

Confidence Intervals ◽

Statistical Significance ◽

P Values

Download Full-text

Endovascular Versus Open Versus Hybrid Revascularisation In Infra Inguinal Disease – 2 Years Prospective Study in A Tertiary Care Center in South India

International Journal of Research in Pharmaceutical Sciences ◽

10.26452/ijrps.v11ispl2.2134 ◽

2020 ◽

Vol 11 (SPL2) ◽

pp. 97-101

Author(s):

Senthilnathan T. T. ◽

Manoj Prabakar R. ◽

Subramaniyan S. R. ◽

Marunraj G. ◽

Saravanan B. ◽

...

Keyword(s):

Risk Factors ◽

Prospective Study ◽

Statistical Significance ◽

Care Center ◽

Tertiary Care ◽

Individual Risk ◽

P Value ◽

Acute Limb Ischemia ◽

Number Of Patients ◽

Treatment Procedures

Our aim is to share the clinical experience of , open and combined hybrid in infra inguinal disease and compare the results. A prospective study of 150 patients undergoing infra inguinal procedures was done, a period ranging from October 2017 to June 2019 with 3 months follow up. A number of patients undergoing CT , Digital Subtraction (DSA) were recorded. A number of cases undergoing Angioplasty, Catheter Directed (CDT), Open Surgical Bypass were noted. Cases of acute limb ischemia were excluded and chronic cases included in our study, age 35-85 years, sex distribution male 134(89.3%) and female 16(10.7%) cases. Similarly, individual risk factors were stratified. Ct was done in 60 (40%) and DSA in 90 (60%) cases. Diagnostic variables : left occlusion 42 (28%), right occlusion 55(36.7%), left tibial occlusion 18 (12%) and right tibial occlusion 35(23.3%). Treatment procedure variables : CDT and Angioplasty 1(0.7%), angioplasty 87(58%), angioplasty and bypass 8(5.3%), bypass 35(23.3%), CDT 15 (10%), CDT and bypass 4(2.7%). The results of the analysis were compared and statistical significance P-value were calculated by chi-square tests, SPSS software. Statistic significance was seen for risk factors CAD (0.001), Smoking (0.008), Hypertension (0.000) on comparison to treatment procedures and for corresponding clinical diagnosis (0.002), investigation modality (0.000) and treatment procedures.

Download Full-text

Visualization Strategies for Regression Estimates with Randomization Inference

10.31235/osf.io/bsd7g ◽

2019 ◽

Author(s):

Marshall A. Taylor

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Regression Models ◽

Statistical Significance ◽

Permutation Tests ◽

P Value ◽

P Values ◽

Alpha Level ◽

Significance Levels ◽

Nonprobability Sample

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.

Download Full-text

High Dose and Low Dose Oxytocin Regimens as Determinants of Successful Labor Induction: A multicenter comparative study

10.21203/rs.2.17285/v2 ◽

2019 ◽

Author(s):

MELESE GEZAHEGN TESEMMA ◽

Demisew Amenu Sori ◽

Desta Hiko Gemeda

Keyword(s):

Low Dose ◽

Statistical Significance ◽

Labor Induction ◽

Induction Of Labor ◽

P Value ◽

High Dose ◽

Bishop Score ◽

Delivery Time ◽

Number Of Patients ◽

Successful Induction

Abstract Background: Induction of labor by Oxytocin is a routine obstetric procedure. However, little is known regarding the optimal dose of oxytocin so as to bring successful induction. This study was aimed at comparing the effects of high dose versus low dose oxytocin regimens on success of induction. Methods: Hospital-based comparative cross-sectional study was conducted in four selected hospitals in Ethiopia from October 1, 2017 to May 30, 2018. A total of 216 pregnant women who undergo induction of labor at gestational age of 37 weeks and above were included. Data were entered into Epi-data version 3.1 and then exported to SPSS version 20 for cleaning and analysis. Chi-square test and logistic regression were done to look for determinants of successful induction. The result was presented using 95% confidence interval of crude and adjusted odds ratios. P-value < 0.05 was used to declare statistical significance. Result: The mean “Induction to delivery time” was 5.9 hours and 6.3 hours for participants who received high dose Oxytocin and low dose Oxytocin respectively. Higher successful induction (72.2% versus 61.1%) and lower Cesarean Section rate (27.8% vs. 38.9) were observed among participants who received low dose Oxytocin compared to high dose. Favourable bishop score [AOR 4.0 95% CI 1.9, 8.5], elective induction [AOR 0.2 95% CI 0.1, 0.4], performing artificial rupture of membrane [AOR 10.1 95% CI 3.2, 32.2], neonatal birth weight of < 4Kg [AOR 4.3, 95% CI 1.6, 11.6] and being parous [AOR 2.1 95% CI 1.1, 4.0] were significantly associated with success of induction. Conclusions: In this study, Different oxytocin regimens didn’t show significant association with success of induction. But, high dose oxytocin regimen was significantly associated with slightly shorter induction to delivery time. Favourable bishop score, emergency induction, performing artificial rupture of membrane and delivery to non-macrosomic fetuses were positive determinants of successful induction.We recommend researchers to conduct multicenter research on a large number of patients that controls confounders to see the real effects of different oxytocin regimens on success of labor induction.

Download Full-text

Some Methodological Deficiencies in Empirical Research Articles in Accounting

Accounting Horizons ◽

10.2308/acch-50818 ◽

2014 ◽

Vol 28 (3) ◽

pp. 695-712 ◽

Cited By ~ 47

Author(s):

Thomas R. Dyckman ◽

Stephen A. Zeff

Keyword(s):

Empirical Research ◽

Statistical Significance ◽

Reward System ◽

Statistical Analyses ◽

Research Articles ◽

Significance Tests ◽

Research Results ◽

Replication Studies ◽

Economic Significance ◽

Time Period

SYNOPSIS This paper uses a sample of the regression and behavioral papers published in The Accounting Review and the Journal of Accounting Research from September 2012 through May 2013. We argue first that the current research results reported in empirical regression papers fail adequately to justify the time period adopted for the study. Second, we maintain that the statistical analyses used in these papers as well as in the behavioral papers have produced flawed results. We further maintain that their tests of statistical significance are not appropriate and, more importantly, that these studies do not—and cannot—properly address the economic significance of the work. In other words, significance tests are not tests of the economic meaningfulness of the results. We suggest ways to avoid some but not all of these problems. We also argue that replication studies, which have been essentially abandoned by accounting researchers, can contribute to our search for truth, but few will be forthcoming unless the academic reward system is modified.

Download Full-text

Effect size: commentary to the study on the Factors associated with the practice of exclusive breastfeeding

Colombia Medica ◽

10.25100/cm.v51i1.4102 ◽

2020 ◽

Author(s):

Marisol Angulo-Ramos ◽

César Merino-Soto

Keyword(s):

Statistical Significance ◽

Numerical Data ◽

Practical Significance ◽

Logistic Distribution ◽

P Value ◽

Knowledge And Skills ◽

Instructional Content ◽

The Difference ◽

In Pregnancy

This letter focuses on recent and interesting work on breastfeeding, to emphasize two observations. The first observation refers to the fact that, in Mateus and Cabrera’s manuscript It was hardly discussed whether the referred knowledge and skills may be relevant to understanding the mothers’ behavior regarding their commitment to breastfeeding. The relevance of these cognitive aspects requires more attention due to their relationship with breastfeeding practices, and in general with the long-term mother-infant dyad. Because the knowledge and skills to maintain successful breastfeeding have implications for developing instructional content in interventions for mothers as well, great attention needs to be paid to the size of the effect of differences between reported frequencies in pregnancy and the immediate puerperium. In Table 3, these differences were examined by the McNemar statistical test, which allows obtaining the statistical significance of the rejection of the null hypothesis of no differences. But neither this test nor the size of its p-value inform about the degree or size of the differences. An estimate of the size or magnitude of the differences, represented as point values or confidence intervals (as reported in Tables 4 and 5), tend to better specify tests of statistical significance. Therefore, we estimate the practical significance of the difference between the percentages obtained in pregnancy and the immediate puerperium for each of the knowledge and basic skills reported in Table 3. With the only numerical data presented in this table, we calculated the McNemar odd ratio (McNemar OR , and standardized difference measures d: d Cox 8 and d probit . These show less bias with their population values in relation to other estimators5. Because they both assume different statistical assumptions e.g., logistic distribution for McNemar OR and d Cox , and normal distribution for d probit , calculating both will report the convergence or divergence of these estimates.

Download Full-text