Precision of sensitivity estimations in diagnostic test evaluations. Power functions for comparisons of sensitivities of two tests.

1985 ◽  
Vol 31 (4) ◽  
pp. 574-580 ◽  
Author(s):  
K Linnet

Abstract The precision of estimates of the sensitivity of diagnostic tests is evaluated. "Sensitivity" is defined as the fraction of diseased subjects with test values exceeding the 0.975-fractile of the distribution of control values. An estimate of the sensitivity is subject to sample variation because of variation of both control observations and patient observations. If gaussian distributions are assumed, the 0.95-confidence interval for a sensitivity estimate is up to +/- 0.15 for a sample of 100 controls and 100 patients. For the same sample size, minimum differences of 0.08 to 0.32 of sensitivities of two tests are established as significant with a power of 0.90. For some published diagnostic test evaluations the median sample sizes for controls and patients were 63 and 33, respectively. I show that, to obtain a reasonable precision of sensitivity estimates and a reasonable power when two tests are being compared, the number of samples should in general be considerably larger.

2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Louis M. Houston

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.


Methodology ◽  
2014 ◽  
Vol 10 (1) ◽  
pp. 1-11 ◽  
Author(s):  
Bethany A. Bell ◽  
Grant B. Morgan ◽  
Jason A. Schoeneberger ◽  
Jeffrey D. Kromrey ◽  
John M. Ferron

Whereas general sample size guidelines have been suggested when estimating multilevel models, they are only generalizable to a relatively limited number of data conditions and model structures, both of which are not very feasible for the applied researcher. In an effort to expand our understanding of two-level multilevel models under less than ideal conditions, Monte Carlo methods, through SAS/IML, were used to examine model convergence rates, parameter point estimates (statistical bias), parameter interval estimates (confidence interval accuracy and precision), and both Type I error control and statistical power of tests associated with the fixed effects from linear two-level models estimated with PROC MIXED. These outcomes were analyzed as a function of: (a) level-1 sample size, (b) level-2 sample size, (c) intercept variance, (d) slope variance, (e) collinearity, and (f) model complexity. Bias was minimal across nearly all conditions simulated. The 95% confidence interval coverage and Type I error rate tended to be slightly conservative. The degree of statistical power was related to sample sizes and level of fixed effects; higher power was observed with larger sample sizes and level-1 fixed effects.


Author(s):  
Rand Wilcox

There is an extensive literature dealing with inferences about the probability of success. A minor goal in this note is to point out when certain recommended methods can be unsatisfactory when the sample size is small. The main goal is to report results on the two-sample case. Extant results suggest using one of four methods. The results indicate when computing a 0.95 confidence interval, two of these methods can be more satisfactory when dealing with small sample sizes.


1986 ◽  
Vol 32 (7) ◽  
pp. 1341-1346 ◽  
Author(s):  
K Linnet ◽  
E Brandt

Abstract The specificity and sensitivity of a quantitative diagnostic test depends on the chosen cutoff point. The common practice of selecting a cutoff point that maximizes the specificity plus the sensitivity, as judged from the observed test results, is studied here by simulation. Test performance is on average assessed too optimistically by this procedure--a phenomenon of importance when sample sizes are small. For example, the average positive bias is up to 15% of the test performance for sample sizes of 25. Furthermore, binomial calculated standard errors of specificity and sensitivity estimates are incorrect. A Monte Carlo statistical method--the "bootstrap procedure"--is applied to correct for bias and to estimate standard errors, including the standard error of the optimal cutoff point. Independent and paired comparisons of two diagnostic tests are also considered when optimal cutoff points have been selected. For this purpose, binomial statistical tests behave satisfactorily. Examples of power functions are presented.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1462
Author(s):  
José Antonio Roldán-Nofuentes ◽  
Saad Bouh Regad

A binary diagnostic test is a medical test that is applied to an individual in order to determine the presence or the absence of a certain disease and whose result can be positive or negative. A positive result indicates the presence of the disease, and a negative result indicates the absence. Positive and negative predictive values represent the accuracy of a binary diagnostic test when it is applied to a cohort of individuals, and they are measures of the clinical accuracy of the binary diagnostic test. In this manuscript, we study the comparison of the positive (negative) predictive values of two binary diagnostic tests subject to a paired design through confidence intervals. We have studied confidence intervals for the difference and for the ratio of the two positive (negative) predictive values. Simulation experiments have been carried out to study the asymptotic behavior of the confidence intervals, giving some general rules for application. We also study a method to calculate the sample size to compare the parameters using confidence intervals. We have written a program in R to solve the problems studied in this manuscript. The results have been applied to the diagnosis of colorectal cancer.


2021 ◽  
Author(s):  
Xuan Deng ◽  
Silvia Tanumiharjo ◽  
Qianyin Chen ◽  
Shengnan Li ◽  
Huimin Lin ◽  
...  

Aims: To investigate the evaluation indices (diagnostic test accuracy and agreement) of 15 combinations of ultrawide field scanning laser ophthalmoscopy (UWF SLO) images in myopic retinal changes (MRC) screening to determine the combination of imaging that yields the highest evaluation indices in screening MRC. Methods: This is a retrospective study of UWF SLO images obtained from myopes and were analyzed by two retinal specialists independently. 5-field UWF SLO images that included the posterior (B), superior (S), inferior (I), nasal (N) and temporal (T) regions were obtained for analysis and its results used as a reference standard. The evaluation indices of different combinations comprising of one to four fields of the retina were compared to determine the abilities of each combinations screen for MRC. Results: UWF SLO images obtained from 823 myopic patients (1646 eyes) were included for the study. Sensitivities ranged from 50.0% to 98.9% (95% confidence interval (CI), 43.8-99.7%); the combinations of B+S+I (97.3%; 95% CI, 94.4-98.8%), B+T+S+I (98.5%; 95% CI, 95.9-99.5%), and B+S+N+I (98.9%; 95% CI, 96.4-99.7%) ranked highest. Furthermore, the combinations of B+S+I, B+T+S+I and B+S+N+I also revealed the highest accuracy (97.7%; 95% CI, 95.1-100.0%, 98.6%; 95% CI, 96.7-100.0%, 98.8%; 95% CI, 96.9-100.0%) and agreement (Kappa = 0.968, 0.980 and 0.980). For the various combinations, specificities were all higher than 99.5% (95% CI, 99.3-100.0%). Conclusion: In our study, screening combinations of B+S+I, B+T+S+I and B+S+N+I stand out with high-performing optimal evaluation indices. However, when time is limited, B+S+I may be more applicable in primary screening of MRC.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


Sign in / Sign up

Export Citation Format

Share Document