scholarly journals Claims of ‘no difference’ or ‘no effect’ in Cochrane and other systematic reviews

2020 ◽  
pp. bmjebm-2019-111257 ◽  
Author(s):  
Phoebe Rose Marson Smith ◽  
Lynda Ware ◽  
Clive Adams ◽  
Iain Chalmers

Estimates of treatment effects/differences derived from controlled comparisons are subject to uncertainty, both because of the quality of the data and the play of chance. Despite this, authors sometimes use statistical significance testing to make definitive statements that ‘no difference exists between’ treatments. A survey to assess abstracts of Cochrane reviews published in 2001/2002 identified unqualified claims of ‘no difference’ or ‘no effect’ in 259 (21.3%) out of 1212 review abstracts surveyed. We have repeated the survey to assess the frequency of such claims among the abstracts of Cochrane and other systematic reviews published in 2017. We surveyed the 643 Cochrane review abstracts published in 2017 and a random sample of 643 abstracts of other systematic reviews published in the same year. We excluded review abstracts that referred only to a protocol, lacked a conclusion or did not contain any relevant information. We took steps to reduce biases during our survey. 'No difference/no effect' was claimed in the abstracts of 36 (7.8%) of 460 Cochrane reviews and in the abstracts of 13 (6.0%) of 218 other systematic reviews. Incorrect claims of no difference/no effect of treatments were substantially less common in Cochrane reviews published in in 2017 than they were in abstracts of reviews published in 2001/2002. We hope that this reflects greater efforts to reduce biases and inconsistent judgements in the later survey as well as more careful wording of review abstracts. There are numerous other ways of wording treatment claims incorrectly. These must be addressed because they can have adverse effects on healthcare and health research.

2016 ◽  
Vol 21 (1) ◽  
pp. 102-115 ◽  
Author(s):  
Stephen Gorard

This paper reminds readers of the absurdity of statistical significance testing, despite its continued widespread use as a supposed method for analysing numeric data. There have been complaints about the poor quality of research employing significance tests for a hundred years, and repeated calls for researchers to stop using and reporting them. There have even been attempted bans. Many thousands of papers have now been written, in all areas of research, explaining why significance tests do not work. There are too many for all to be cited here. This paper summarises the logical problems as described in over 100 of these prior pieces. It then presents a series of demonstrations showing that significance tests do not work in practice. In fact, they are more likely to produce the wrong answer than a right one. The confused use of significance testing has practical and damaging consequences for people's lives. Ending the use of significance tests is a pressing ethical issue for research. Anyone knowing the problems, as described over one hundred years, who continues to teach, use or publish significance tests is acting unethically, and knowingly risking the damage that ensues.


2021 ◽  
pp. 174569162097060
Author(s):  
Klaus Fiedler ◽  
Linda McCaughey ◽  
Johannes Prager

The current debate about how to improve the quality of psychological science revolves, almost exclusively, around the subordinate level of statistical significance testing. In contrast, research design and strict theorizing, which are superordinate to statistics in the methods hierarchy, are sorely neglected. The present article is devoted to the key role assigned to manipulation checks (MCs) for scientific quality control. MCs not only afford a critical test of the premises of hypothesis testing but also (a) prompt clever research design and validity control, (b) carry over to refined theorizing, and (c) have important implications for other facets of methodology, such as replication science. On the basis of an analysis of the reality of MCs reported in current issues of the Journal of Personality and Social Psychology, we propose a future methodology for the post– p < .05 era that replaces scrutiny in significance testing with refined validity control and diagnostic research designs.


2020 ◽  
Author(s):  
Jan Benjamin Vornhagen ◽  
April Tyack ◽  
Elisa D Mekler

Statistical Significance Testing -- or Null Hypothesis Significance Testing (NHST) -- is common to quantitative CHI PLAY research. Drawing from recent work in HCI and psychology promoting transparent statistics and the reduction of questionable research practices, we systematically review the reporting quality of 119 CHI PLAY papers using NHST (data and analysis plan at https://osf.io/4mcbn/. We find that over half of these papers employ NHST without specific statistical hypotheses or research questions, which may risk the proliferation of false positive findings. Moreover, we observe inconsistencies in the reporting of sample sizes and statistical tests. These issues reflect fundamental incompatibilities between NHST and the frequently exploratory work common to CHI PLAY. We discuss the complementary roles of exploratory and confirmatory research, and provide a template for more transparent research and reporting practices.


2021 ◽  
pp. 204589402110249
Author(s):  
David D Ivy ◽  
Damien Bonnet ◽  
Rolf MF Berger ◽  
Gisela Meyer ◽  
Simin Baygani ◽  
...  

Objective: This study evaluated the efficacy and safety of tadalafil in pediatric patients with pulmonary arterial hypertension (PAH). Methods: This phase-3, international, randomized, multicenter (24 weeks double-blind placebo controlled period; 2-year, open-labelled extension period), add-on (patient’s current endothelin receptor antagonist therapy) study included pediatric patients aged <18 years with PAH. Patients received tadalafil 20 mg or 40 mg based on their weight (Heavy-weight: ≥40 kg; Middle-weight: ≥25—<40 kg) or placebo orally QD for 24 weeks. Primary endpoint was change from baseline in 6-minute walk (6MW) distance in patients aged ≥6 years at Week 24. Sample size was amended from 134 to ≥34 patients, due to serious recruitment challenges. Therefore, statistical significance testing was not performed between treatment groups. Results: Patient demographics and baseline characteristics (N=35; tadalafil=17; placebo=18) were comparable between treatment groups; median age was 14.2 years (6.2 to 17.9 years) and majority (71.4%, n=25) of patients were in HW cohort. Least square mean (SE) changes from baseline in 6MW distance at Week 24 was numerically greater with tadalafil versus placebo (60.48 [20.41] vs 36.60 [20.78] meters; placebo-adjusted mean difference [SD] 23.88 [29.11]). Safety of tadalafil treatment was as expected without any new safety concerns. During study period 1, two patients (1 in each group) discontinued due to investigator’s reported clinical worsening, and no deaths were reported. Conclusions: The statistical significance testing was not performed between the treatment groups due to low sample size, however, the study results show positive trend in improvement in non invasive measurements, commonly utilized by clinicians to evaluate the disease status for children with PAH. Safety of tadalafil treatment was as expected without any new safety signals.


2020 ◽  
Vol 2020 (4) ◽  
Author(s):  
Mariano Mascarenhas ◽  
Theodoros Kalampokas ◽  
Sesh Kamal Sunkara ◽  
Mohan S Kamath

Abstract STUDY QUESTION Are systematic reviews published within a 3-year period on interventions in ART concordant in their conclusions? SUMMARY ANSWER The majority of the systematic reviews published within a 3-year period in the field of assisted reproduction on the same topic had discordant conclusions. WHAT IS KNOWN ALREADY Systematic reviews and meta-analyses have now replaced individual randomized controlled trials (RCTs) at the top of the evidence pyramid. There has been a proliferation of systematic reviews and meta-analyses, many of which suffer from methodological issues and provide varying conclusions. STUDY DESIGN, SIZE, DURATION We assessed nine interventions in women undergoing ART with at least three systematic reviews each, published from January 2015 to December 2017. PARTICIPANTS/MATERIALS, SETTING, METHODS The systematic reviews which included RCTs were considered eligible for inclusion. The primary outcome was extent of concordance between systematic reviews on the same topic. Secondary outcomes included assessment of quality of systematic reviews, differences in included studies in meta-analyses covering the same search period, selective reporting and reporting the quality of evidence. MAIN RESULTS AND THE ROLE OF CHANCE Concordant results and conclusions were found in only one topic, with reviews in the remaining eight topics displaying partial discordance. The AMSTAR grading for the majority of the non-Cochrane reviews was critically low whilst it was categorized as high for all of the Cochrane reviews. For three of the nine topics, none of the included systematic reviews assessed the quality of evidence. We were unable to assess selective reporting as most of the reviews did not have a pre-specified published protocol. LIMITATIONS, REASONS FOR CAUTION We were limited by the high proportion of reviews lacking a pre-specified protocol, which made it impossible to assess for selective reporting. Furthermore, many reviews did not specify primary and secondary outcomes which made it difficult to assess reporting bias. All the authors of this review were Cochrane review authors which may introduce some assessment bias. The categorization of the review’s conclusions as beneficial, harmful or neutral was subjective, depending on the tone and wording of the conclusion section of the review. WIDER IMPLICATIONS OF THE FINDINGS The majority of the systematic reviews published within a 3-year period on the same topic in the field of assisted reproduction revealed discordant conclusions and suffered from serious methodological issues, hindering the process of informed healthcare decision-making. STUDY FUNDING/COMPETING INTEREST(S) All the authors are Cochrane authors. M.S.K. is an editorial board member of Cochrane Gynaecology and Fertility group. No grant from funding agencies in the public, commercial or not-for-profit sectors was obtained.


2013 ◽  
Vol 12 (3) ◽  
pp. 345-351 ◽  
Author(s):  
Jessica Middlemis Maher ◽  
Jonathan C. Markey ◽  
Diane Ebert-May

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.


Sign in / Sign up

Export Citation Format

Share Document