Estimating completion rates from small samples using binomial confidence intervals: Comparisons and recommendations

Author(s):  
Jeff Sauro ◽  
James R. Lewis
Author(s):  
Jeff Sauro ◽  
James R. Lewis

The completion rate — the proportion of participants who successfully complete a task — is a common usability measurement. As is true for any point measurement, practitioners should compute appropriate confidence intervals for completion rate data. For proportions such as the completion rate, the appropriate interval is a binomial confidence interval. The most widely-taught method for calculating binomial confidence intervals (the “Wald Method,” discussed both in introductory statistics texts and in the human factors literature) grossly understates the width of the true interval when sample sizes are small. Alternative “exact” methods over-correct the problem by providing intervals that are too conservative. This can result in practitioners unintentionally accepting interfaces that are unusable or rejecting interfaces that are usable. We examined alternative methods for building confidence intervals from small sample completion rates, using Monte Carlo methods to sample data from a number of real, large-sample usability tests. It appears that the best method for practitioners to compute 95% confidence intervals for small-sample completion rates is to add two successes and two failures to the observed completion rate, then compute the confidence interval using the Wald method (the “Adjusted Wald Method”). This simple approach provides the best coverage, is fairly easy to compute, and agrees with other analyses in the statistics literature.


PEDIATRICS ◽  
1989 ◽  
Vol 83 (3) ◽  
pp. A72-A72
Author(s):  
Student

The believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the data of the first few subjects) and in the stability of observed patterns (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others', he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal "explanation" for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.


SIMULATION ◽  
1991 ◽  
Vol 56 (2) ◽  
pp. 119-127 ◽  
Author(s):  
Krishnamurty Muralidhar ◽  
Gary Adna Ames ◽  
Rathindra Sarathy

Author(s):  
Daniel Fryer ◽  
Inga Strümke ◽  
Hien Nguyen

The coefficient of determination, the R2, is often used to measure the variance explained by an affine combination of multiple explanatory covariates. An attribution of this explanatory contribution to each of the individual covariates is often sought in order to draw inference regarding the importance of each covariate with respect to the response phenomenon. A recent method for ascertaining such an attribution is via the game theoretic Shapley value decomposition of the coefficient of determination. Such a decomposition has the desirable efficiency, monotonicity, and equal treatment properties. Under a weak assumption that the joint distribution is pseudo-elliptical, we obtain the asymptotic normality of the Shapley values. We then utilize this result in order to construct confidence intervals and hypothesis tests for Shapley values. Monte Carlo studies regarding our results are provided. We found that our asymptotic confidence intervals required less computational time to competing bootstrap methods and are able to exhibit improved coverage, especially on small samples. In an expository application to Australian real estate price modeling, we employ Shapley value confidence intervals to identify significant differences between the explanatory contributions of covariates, between models, which otherwise share approximately the same R2 value. These different models are based on real estate data from the same periods in 2019 and 2020, the latter covering the early stages of the arrival of the novel coronavirus, COVID-19.


Author(s):  
The Tien Mai

In this paper we perform numerous numerical studies for the problem of low-rank matrix completion. We compare the Bayesian approaches and a recently introduced de-biased estimator which provides a useful way to build confidence intervals of interest. From a theoretical viewpoint, the de-biased estimator comes with a sharp minimax-optimal rate of estimation error whereas the Bayesian approach reaches this rate with an additional logarithmic factor. Our simulation studies show originally interesting results that the de-biased estimator is just as good as the Bayesian estimators. Moreover, Bayesian approaches are much more stable and can outperform the de-biased estimator in the case of small samples. However, we also find that the length of the confidence intervals revealed by the de-biased estimator for an entry is absolutely shorter than the length of the considered credible interval. These suggest further theoretical studies on the estimation error and the concentration for Bayesian methods as they are being quite limited up to present.


2018 ◽  
Author(s):  
Robert Calin-Jageman ◽  
Geoff Cumming

&&& Now published in the American Statistician: https://amstat.tandfonline.com/doi/full/10.1080/00031305.2018.1518266 *** The "New Statistics" emphasizes effect sizes, confidence intervals, meta-analysis and the use of Open Science practices. We present 3 specific ways in which a New Statistics approach can help improve scientific practice: by reducing over-confidence in small samples, by reducing confirmation bias, and by fostering more cautious judgements of consistency.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Michael Perakis

PurposeThe purpose of the paper is the construction of confidence intervals for the ratio of the values of process capability index Cpm for two processes. These confidence intervals can be used for comparing the capability of any pair of competitive processes.Design/methodology/approachTwo methods for constructing confidence intervals for the ratio of the values of process capability index Cpm for two processes are proposed. The suggested techniques are based on a two-step approximation of the doubly non-central F distribution. Their performance is tested via simulation.FindingsThe performance of the suggested techniques seems to be rather satisfactory even for small samples, as illustrated through the use of simulated data.Practical implicationsThe practical implication of the suggested techniques is that they can be implemented in real-world applications, since they can be used for comparing the capability of any pair of competitive processes.Originality/valueThe paper presents two new methods for constructing confidence intervals for the ratio of the values of process capability index Cpm for two processes.


Sign in / Sign up

Export Citation Format

Share Document