scholarly journals True and False Positive Rates for Different Criteria of Evaluating Statistical Evidence from Clinical Trials

2019 ◽  
Author(s):  
Don van Ravenzwaaij ◽  
John P A Ioannidis

Abstract Background: Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment). Methods: In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted. Results: Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. Conclusions: Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.

2019 ◽  
Author(s):  
Don van Ravenzwaaij ◽  
John P A Ioannidis

Abstract Background: Until recently a typical rule that has often been often used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive (endorsement of an ineffective treatment) rates. Methods: In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p -values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, and thresholds for what constitutes a clinically meaningful treatment effect. Results: Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. Conclusions: Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.


2018 ◽  
Author(s):  
Don van Ravenzwaaij ◽  
john Ioannidis

Background: Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment). Methods: In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted. Results: Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. Conclusions: Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Don van Ravenzwaaij ◽  
John P. A. Ioannidis

Abstract Background Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment). Methods In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted. Results Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. Conclusions Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.


2021 ◽  
pp. bmjebm-2020-111603
Author(s):  
John Ferguson

Commonly accepted statistical advice dictates that large-sample size and highly powered clinical trials generate more reliable evidence than trials with smaller sample sizes. This advice is generally sound: treatment effect estimates from larger trials tend to be more accurate, as witnessed by tighter confidence intervals in addition to reduced publication biases. Consider then two clinical trials testing the same treatment which result in the same p values, the trials being identical apart from differences in sample size. Assuming statistical significance, one might at first suspect that the larger trial offers stronger evidence that the treatment in question is truly effective. Yet, often precisely the opposite will be true. Here, we illustrate and explain this somewhat counterintuitive result and suggest some ramifications regarding interpretation and analysis of clinical trial results.


2018 ◽  
Author(s):  
Hyemin Han ◽  
Joonsuk Park

We composed an R-based script for Image-based Bayesian random-effect meta-analysis of previous fMRI studies. It meta-analyzes second-level test results of the studies and calculates Bayes Factors indicating whether the effect in each voxel is significantly different from zero. We compared results from Bayesian and classical meta-analyses by examining the overlap between the result from each method and that created by NeuroSynth as the target. As an example, we analyzed previous fMRI studies focusing on working memory extracted from NeuroSynth. The result from our Bayesian method showed a greater overlap than the classical method. In addition, Bayes Factors proved a better way to examine whether the evidence supported hypotheses than p-values. Given these, Bayesian meta-analysis provides neuroscientists with a better meta-analysis method for fMRI studies given the improved overlap with the NeuroSynth result and the practical and epistemological value of Bayes Factors that can directly test presence of an effect.


2021 ◽  
Vol 11 ◽  
Author(s):  
Mingming Zhao ◽  
Yi Yu ◽  
Rumeng Wang ◽  
Meiying Chang ◽  
Sijia Ma ◽  
...  

As the current treatment of chronic kidney disease (CKD) is limited, it is necessary to seek more effective and safer treatment methods, such as Chinese herbal medicines (CHMs). In order to clarify the modern theoretical basis and molecular mechanisms of CHMs, we reviewed the knowledge based on publications in peer-reviewed English-language journals, focusing on the anti-inflammatory, antioxidative, anti-apoptotic, autophagy-mediated and antifibrotic effects of CHMs commonly used in kidney disease. We also discussed recently published clinical trials and meta-analyses in this field. Based on recent studies regarding the mechanisms of kidney disease in vivo and in vitro, CHMs have anti-inflammatory, antioxidative, anti-apoptotic, autophagy-mediated, and antifibrotic effects. Several well-designed randomized controlled trials (RCTs) and meta-analyses demonstrated that the use of CHMs as an adjuvant to conventional medicines may benefit patients with CKD. Unknown active ingredients, low quality and small sample sizes of some clinical trials, and the safety of CHMs have restricted the development of CHMs. CHMs is a potential method in the treatment of CKD. Further study on the mechanism and well-conducted RCTs are urgently needed to evaluate the efficacy and safety of CHMs.


PEDIATRICS ◽  
1996 ◽  
Vol 97 (2) ◽  
pp. A42-A42
Author(s):  
Student

To evaluate the extent of prediction error we must discard hypotheses testing in favor of estimation ... The use of confidence intervals as summaries of the effect of an intervention enables the correct conclusions to be drawn from meta-analyses; reliance on whether a P value is more or less than 0.05 is a dangerous way of making decisions ...


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5318 ◽  
Author(s):  
Felicity A. Braithwaite ◽  
Julie L. Walters ◽  
Lok Sze Katrina Li ◽  
G. Lorimer Moseley ◽  
Marie T. Williams ◽  
...  

Background Blinding is critical to clinical trials because it allows for separation of specific intervention effects from bias, by equalising all factors between groups except for the proposed mechanism of action. Absent or inadequate blinding in clinical trials has consistently been shown in large meta-analyses to result in overestimation of intervention effects. Blinding in dry needling trials, particularly blinding of participants and therapists, is a practical challenge; therefore, specific effects of dry needling have yet to be determined. Despite this, dry needling is widely used by health practitioners internationally for the treatment of pain. This review presents the first empirical account of the influence of blinding on intervention effect estimates in dry needling trials. The aim of this systematic review was to determine whether participant beliefs about group allocation relative to actual allocation (blinding effectiveness), and/or adequacy of blinding procedures, moderated pain outcomes in dry needling trials. Methods Twelve databases (MEDLINE, EMBASE, AMED, Scopus, CINAHL, PEDro, The Cochrane Library, Trove, ProQuest, trial registries) were searched from inception to February 2016. Trials that compared active dry needling with a sham that simulated dry needling were included. Two independent reviewers performed screening, data extraction, and critical appraisal. Available blinding effectiveness data were converted to a blinding index, a quantitative measurement of blinding, and meta-regression was used to investigate the influence of the blinding index on pain. Adequacy of blinding procedures was based on critical appraisal, and subgroup meta-analyses were used to investigate the influence of blinding adequacy on pain. Meta-analytical techniques used inverse-variance random-effects models. Results The search identified 4,894 individual publications with 24 eligible for inclusion in the quantitative syntheses. In 19 trials risk of methodological bias was high or unclear. Five trials were adequately blinded, and blinding was assessed and sufficiently reported to compute the blinding index in 10 trials. There was no evidence of a moderating effect of blinding index on pain. For short-term and long-term pain assessments pooled effects for inadequately blinded trials were statistically significant in favour of active dry needling, whereas there was no evidence of a difference between active and sham groups for adequately blinded trials. Discussion The small number and size of included trials meant there was insufficient evidence to conclusively determine if a moderating effect of blinding effectiveness or adequacy existed. However, with the caveats of small sample size, generally unclear risk of bias, statistical heterogeneity, potential publication bias, and the limitations of subgroup analyses, the available evidence suggests that inadequate blinding procedures could lead to exaggerated intervention effects in dry needling trials.


Sign in / Sign up

Export Citation Format

Share Document