Sample size and effect size calculations are necessary in clinical studies in order to avoid false positive and false negative conclusions

2013 ◽  
Vol 4 (3) ◽  
pp. 163-164 ◽  
Author(s):  
Mads U. Werner
2021 ◽  
Author(s):  
Ravi Jandhyala

Abstract Background: Previous research assessed the accuracy of disease-severity measurement in clinical studies as a mathematical relationship between the set of endpoints selected and the disease-severity scale (DSS), a surrogate for the theoretical Neutral list of indicators representing the disease phenotype. New DSSs are continually developed, so clinical studies’ operationalisation of the Neutral list and resulting relative neutrality may vary over time. We assessed variation in the neutrality of clinical studies over time and the probability of false positive and false negative classifications at different disease prevalence rates.Methods: We used search strings extracted from the Orphanet Register of Rare Diseases using a proprietary algorithm to conduct a systematic review of studies published until January 2021 per Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines. Overall, 483 studies and 12 rare diseases met inclusion criteria. We extracted all indicators from clinical studies and calculated neutrality and its components, sensitivity and specificity, as well as the probability of misclassifications at 20%, 50% and 80% disease prevalence rates at two time points, the times of publication of the first and last DSS. Surrogate Neutral lists were the first DSS and a composite of all later DSSs.Results: Over time, the neutrality of clinical studies increased for six diseases and decreased for five diseases, driven by sensitivity for all but Friedreich ataxia. The neutrality of clinical studies in encephalitis decreased, but sensitivity remained constant at zero. At both timepoints, the likely false negative rate increased and the likely false positive rate decreased with increasing disease prevalence. The probability that the least neutral clinical study for most diseases would yield a false positive result was equal to one at all disease prevalence rates. Conclusions: The potential for accurate clinical trial disease-severity measurement increases over time. Neutral theory showed that endpoint selection and DSSs may need improvement in Charcot Marie Tooth disease, Gaucher disease Type I, Huntington’s disease, Sjogren’s syndrome and Tourette syndrome. Using Neutral theory to benchmark disease-severity measurement in rare disease clinical trials may reduce the risk of misclassification, ensuring that recruitment and treatment effect assessment optimise medicine adoption and benefit patients.


2021 ◽  
Vol 19 (9) ◽  
pp. 1072-1078
Author(s):  
Changyu Shen ◽  
Enrico G. Ferro ◽  
Huiping Xu ◽  
Daniel B. Kramer ◽  
Rushad Patell ◽  
...  

Background: Statistical testing in phase III clinical trials is subject to chance errors, which can lead to false conclusions with substantial clinical and economic consequences for patients and society. Methods: We collected summary data for the primary endpoints of overall survival (OS) and progression-related survival (PRS) (eg, time to other type of event) for industry-sponsored, randomized, phase III superiority oncology trials from 2008 through 2017. Using an empirical Bayes methodology, we estimated the number of false-positive and false-negative errors in these trials and the errors under alternative P value thresholds and/or sample sizes. Results: We analyzed 187 OS and 216 PRS endpoints from 362 trials. Among 56 OS endpoints that achieved statistical significance, the true efficacy of experimental therapies failed to reach the projected effect size in 33 cases (58.4% false-positives). Among 131 OS endpoints that did not achieve statistical significance, the true efficacy of experimental therapies reached the projected effect size in 1 case (0.9% false-negatives). For PRS endpoints, there were 34 (24.5%) false-positives and 3 (4.2%) false-negatives. Applying an alternative P value threshold and/or sample size could reduce false-positive errors and slightly increase false-negative errors. Conclusions: Current statistical approaches detect almost all truly effective oncologic therapies studied in phase III trials, but they generate many false-positives. Adjusting testing procedures in phase III trials is numerically favorable but practically infeasible. The root of the problem is the large number of ineffective therapies being studied in phase III trials. Innovative strategies are needed to efficiently identify which new therapies merit phase III testing.


Methodology ◽  
2019 ◽  
Vol 15 (3) ◽  
pp. 97-105
Author(s):  
Rodrigo Ferrer ◽  
Antonio Pardo

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.


1974 ◽  
Vol 31 (02) ◽  
pp. 273-278
Author(s):  
Kenneth K Wu ◽  
John C Hoak ◽  
Robert W Barnes ◽  
Stuart L Frankel

SummaryIn order to evaluate its daily variability and reliability, impedance phlebography was performed daily or on alternate days on 61 patients with deep vein thrombosis, of whom 47 also had 125I-fibrinogen uptake tests and 22 had radiographic venography. The results showed that impedance phlebography was highly variable and poorly reliable. False positive results were noted in 8 limbs (18%) and false negative results in 3 limbs (7%). Despite its being simple, rapid and noninvasive, its clinical usefulness is doubtful when performed according to the original method.


Sign in / Sign up

Export Citation Format

Share Document