NO EVIDENCE, OR NOT ENOUGH EVIDENCE?

PEDIATRICS ◽  
1996 ◽  
Vol 98 (6) ◽  
pp. A22-A22
Author(s):  
Student

When we are told that "there's no evidence that A causes B," we should first ask whether absence of evidence means simply that there is no information at all. If there are data, we should look for quantification of the association rather than just a P value. Where risks are small, P values may well mislead: confidence intervals are likely to be wide, indicating considerable uncertainty.

2016 ◽  
Vol 156 (6) ◽  
pp. 978-980 ◽  
Author(s):  
Peter M. Vila ◽  
Melanie Elizabeth Townsend ◽  
Neel K. Bhatt ◽  
W. Katherine Kao ◽  
Parul Sinha ◽  
...  

There is a lack of reporting effect sizes and confidence intervals in the current biomedical literature. The objective of this article is to present a discussion of the recent paradigm shift encouraging the use of reporting effect sizes and confidence intervals. Although P values help to inform us about whether an effect exists due to chance, effect sizes inform us about the magnitude of the effect (clinical significance), and confidence intervals inform us about the range of plausible estimates for the general population mean (precision). Reporting effect sizes and confidence intervals is a necessary addition to the biomedical literature, and these concepts are reviewed in this article.


2019 ◽  
Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Agustín Ciapponi ◽  
José M. Belizán ◽  
Gilda Piaggio ◽  
Sanni Yaya

AbstractThis article challenges the “tyranny of P-value” and promote more valuable and applicable interpretations of the results of research on health care delivery. We provide here solid arguments to retire statistical significance as the unique way to interpret results, after presenting the current state of the debate inside the scientific community. Instead, we promote reporting the much more informative confidence intervals and eventually adding exact P-values. We also provide some clues to integrate statistical and clinical significance by referring to minimal important differences and integrating the effect size of an intervention and the certainty of evidence ideally using the GRADE approach. We have argued against interpreting or reporting results as statistically significant or statistically non-significant. We recommend showing important clinical benefits with their confidence intervals in cases of point estimates compatible with results benefits and even important harms. It seems fair to report the point estimate and the more likely values along with a very clear statement of the implications of extremes of the intervals. We recommend drawing conclusions, considering the multiple factors besides P-values such as certainty of the evidence for each outcome, net benefit, economic considerations and values and preferences. We use several examples and figures to illustrate different scenarios and further suggest a wording to standardize the reporting. Several statistical measures have a role in the scientific communication of studies, but it is time to understand that there is life beyond the statistical significance. There is a great opportunity for improvement towards a more complete interpretation and to a more standardized reporting.


PEDIATRICS ◽  
1996 ◽  
Vol 97 (2) ◽  
pp. A42-A42
Author(s):  
Student

To evaluate the extent of prediction error we must discard hypotheses testing in favor of estimation ... The use of confidence intervals as summaries of the effect of an intervention enables the correct conclusions to be drawn from meta-analyses; reliance on whether a P value is more or less than 0.05 is a dangerous way of making decisions ...


Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically nonsignificant at least at the alpha level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this article, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate’s p-value and its associated confidence interval in relation to a specified alpha level. These plots can help the analyst interpret and report the statistical and substantive significances of their models. I illustrate using a nonprobability sample of activists and participants at a 1962 anticommunism school.


2021 ◽  
Author(s):  
Willem M Otte ◽  
Christiaan H Vinkers ◽  
Philippe Habets ◽  
David G P van IJzendoorn ◽  
Joeri K Tijdink

Abstract Objective To quantitatively map how non-significant outcomes are reported in randomised controlled trials (RCTs) over the last thirty years. Design Quantitative analysis of English full-texts containing 567,758 RCTs recorded in PubMed (81.5% of all published RCTs). Methods We determined the exact presence of 505 pre-defined phrases denoting results that do not reach formal statistical significance (P<0.05) in 567,758 RCT full texts between 1990 and 2020 and manually extracted associated P values. Phrase data was modeled with Bayesian linear regression. Evidence for temporal change was obtained through Bayes-factor analysis. In a randomly sampled subset, the associated P values were manually extracted. Results We identified 61,741 phrases indicating close to significant results in 49,134 (8.65%; 95% confidence interval (CI): 8.58–8.73) RCTs. The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being ‘marginally significant’ (in 7,735 RCTs), ‘all but significant’ (7,015), ‘a nonsignificant trend’ (3,442), ‘failed to reach statistical significance’ (2,578) and ‘a strong trend’ (1,700). The strongest evidence for a temporal prevalence increase was found for ‘a numerical trend’, ‘a positive trend’, ‘an increasing trend’ and ‘nominally significant’. The phrases ‘all but significant’, ‘approaches statistical significance’, ‘did not quite reach statistical significance’, ‘difference was apparent’, ‘failed to reach statistical significance’ and ‘not quite significant’ decreased over time. In the random sampled subset, the 11,926 identified P values ranged between 0.05 and 0.15 (68.1%; CI: 67.3–69.0; median 0.06). Conclusions Our results demonstrate that phrases describing marginally significant results are regularly used in RCTs to report P values close to but above the dominant 0.05 cut-off. The phrase prevalence remained stable over time, despite all efforts to change the focus from P < 0.05 to reporting effect sizes and corresponding confidence intervals. To improve transparency and enhance responsible interpretation of RCT results, researchers, clinicians, reviewers, and editors need to abandon the focus on formal statistical significance thresholds and stimulate reporting of exact P values with corresponding effect sizes and confidence intervals. Significance statement The power of language to modify the reader’s perception of how to interpret biomedical results cannot be underestimated. Misreporting and misinterpretation are urgent problems in RCT output. This may be at least partially related to the statistical paradigm of the 0.05 significance threshold. Sometimes, creativity and inventive strategies of clinical researchers may be used – describing their clinical results to be ‘almost significant’ – to get their data published. This phrasing may convince readers about the value of their work. Since 2005 there is an increasing concern that most current published research findings are false and it has been generally advised to switch from null hypothesis significance testing to using effect sizes, estimation, and cumulation of evidence. If this ‘new statistics’ approach has worked out well should be reflected in the phases describing non-significance results of RCTs. In particular in changing patterns describing P values just above 0.05 value. More than five hundred phrases potentially suited to report or discuss non-significant results were searched in over half a million published RCTs. A stable overall prevalence of these phrases (10.87%, CI: 10.79–10.96; N: 61,741), with associated P values close to 0.05, was found in the last three decades, with strong increases or decreases in individual phrases describing these near-significant results. The pressure to pass scientific peer-review barrier may function as an incentive to use effective phrases to mask non-significant results in RCTs. However, this keeps the researcher’s pre-occupied with hypothesis testing rather than presenting outcome estimations with uncertainty. The effect of language on getting RCT results published should ideally be minimal to steer evidence-based medicine away from overselling of research results, unsubstantiated claims about the efficacy of certain RCTs and to prevent an over-reliance on P value cutoffs. Our exhaustive search suggests that presenting RCT findings remains a struggle when P values approach the carved-in-stone threshold of 0.05.


Author(s):  
Peter Wills ◽  
Emanuel Knill ◽  
Kevin Coakley ◽  
Yanbao Zhang

Given a composite null hypothesis H0, test supermartingales are non-negative supermartingales with respect to H0 with an initial value of 1. Large values of test supermartingales provide evidence against H0. As a result, test supermartingales are an effective tool for rejecting H0, particularly when the p-values obtained are very small and serve as certificates against the null hypothesis. Examples include the rejection of local realism as an explanation of Bell test experiments in the foundations of physics and the certification of entanglement in quantum information science. Test supermartingales have the advantage of being adaptable during an experiment and allowing for arbitrary stopping rules. By inversion of acceptance regions, they can also be used to determine confidence sets. We used an example to compare the performance of test supermartingales for computing p-values and confidence intervals to Chernoff-Hoeffding bounds and the “exact” p-value. The example is the problem of inferring the probability of success in a sequence of Bernoulli trials. There is a cost in using a technique that has no restriction on stopping rules, and, for a particular test supermartingale, our study quantifies this cost.


2021 ◽  
pp. 1-2
Author(s):  
Sukhvinder Singh Oberoi ◽  
Mansi Atri

The interpretation of the p-value has been an arena for discussion making it difficult for many researchers. The p-value was introduced in 1900 by Pearson. Though, it is very difficult to comment about the demerits of the p-values and significance testing which has not been spoken in a long time because of the practical application of it as a measure of interpretation in clinical research. The usage of the confidence intervals around the sample statistics and effect size should be given more importance than relying solely upon the statistical significance. The researchers, should be consulting a statistician in the initial stages of the planning of the study for avoidance of the misinterpretation of the P-value especially if they are using statistical software for their data analysis.


Sign in / Sign up

Export Citation Format

Share Document