On Small-Sample Confidence Intervals for Parameters in Discrete Distributions

The believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the data of the first few subjects) and in the stability of observed patterns (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others', he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal "explanation" for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.

Download Full-text

Confidence intervals for a difference between lognormal means in cluster randomization trials

Statistical Methods in Medical Research ◽

10.1177/0962280214552291 ◽

2014 ◽

Vol 26 (2) ◽

pp. 598-614 ◽

Cited By ~ 1

Author(s):

Julia Poirier ◽

GY Zou ◽

John Koval

Keyword(s):

Confidence Intervals ◽

Community Acquired Pneumonia ◽

Small Sample ◽

Cluster Randomization ◽

Critical Pathway ◽

Arithmetic Means ◽

Multiple Parameters ◽

The Difference ◽

Using Data ◽

Small Sample Sizes

Cluster randomization trials, in which intact social units are randomized to different interventions, have become popular in the last 25 years. Outcomes from these trials in many cases are positively skewed, following approximately lognormal distributions. When inference is focused on the difference between treatment arm arithmetic means, existent confidence interval procedures either make restricting assumptions or are complex to implement. We approach this problem by assuming log-transformed outcomes from each treatment arm follow a one-way random effects model. The treatment arm means are functions of multiple parameters for which separate confidence intervals are readily available, suggesting that the method of variance estimates recovery may be applied to obtain closed-form confidence intervals. A simulation study showed that this simple approach performs well in small sample sizes in terms of empirical coverage, relatively balanced tail errors, and interval widths as compared to existing methods. The methods are illustrated using data arising from a cluster randomization trial investigating a critical pathway for the treatment of community acquired pneumonia.

Download Full-text

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Entropy ◽

10.3390/e20080601 ◽

2018 ◽

Vol 20 (8) ◽

pp. 601 ◽

Cited By ~ 3

Author(s):

Paul Darscheid ◽

Anneli Guthke ◽

Uwe Ehret

Keyword(s):

Maximum Entropy ◽

Multinomial Distribution ◽

Entropy Method ◽

Small Sample ◽

Discrete Distributions ◽

Occupation Probability ◽

Small Samples ◽

Data Set ◽

Sample Distribution ◽

Leibler Divergence

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback–Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper–Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback–Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple “add one counter”, and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.

Download Full-text

Random Weighting Estimation of One-Sided Confidence Intervals in Discrete Distributions

Advanced Engineering and Computational Methodologies for Intelligent Mechatronics and Robotics ◽

10.4018/978-1-4666-3634-7.ch006 ◽

2013 ◽

pp. 92-102

Author(s):

Yalin Jiao ◽

Yongmin Zhong ◽

Shesheng Gao ◽

Bijan Shirinzadeh

Keyword(s):

Confidence Intervals ◽

Coverage Probability ◽

Bootstrap Method ◽

Experimental Results ◽

Discrete Distributions ◽

Estimation Accuracy ◽

Weighting Method ◽

Random Weighting Method ◽

Random Weighting ◽

The Bootstrap Method

This paper presents a new random weighting method for estimation of one-sided confidence intervals in discrete distributions. It establishes random weighting estimations for the Wald and Score intervals. Based on this, a theorem of coverage probability is rigorously proved by using the Edgeworth expansion for random weighting estimation of the Wald interval. Experimental results demonstrate that the proposed random weighting method can effectively estimate one-sided confidence intervals, and the estimation accuracy is much higher than that of the bootstrap method.

Download Full-text

Asymptotic analysis of extremes from autoregressive negative binomial processes

Journal of Applied Probability ◽

10.1017/s0021900200043783 ◽

1992 ◽

Vol 29 (04) ◽

pp. 904-920 ◽

Cited By ~ 4

Author(s):

William P. McCormick ◽

You Sung Park

Keyword(s):

Negative Binomial ◽

Extreme Value Distribution ◽

Extreme Value ◽

Small Sample ◽

Discrete Distributions ◽

Stationary Processes ◽

Mixing Conditions ◽

Negative Finding ◽

Distribution Of The Maximum ◽

Small Sample Sizes

It is well known that most commonly used discrete distributions fail to belong to the domain of maximal attraction for any extreme value distribution. Despite this negative finding, C. W. Anderson showed that for a class of discrete distributions including the negative binomial class, it is possible to asymptotically bound the distribution of the maximum. In this paper we extend Anderson's result to discrete-valued processes satisfying the usual mixing conditions for extreme value results for dependent stationary processes. We apply our result to obtain bounds for the distribution of the maximum based on negative binomial autoregressive processes introduced by E. McKenzie and Al-Osh and Alzaid. A simulation study illustrates the bounds for small sample sizes.

Download Full-text

True and false positive rates for different criteria of evaluating statistical evidence from clinical trials

BMC Medical Research Methodology ◽

10.1186/s12874-019-0865-y ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Don van Ravenzwaaij ◽

John P. A. Ioannidis

Keyword(s):

Clinical Trials ◽

Confidence Intervals ◽

False Positive ◽

Evaluation Criteria ◽

Small Sample ◽

Statistical Evidence ◽

Bayes Factors ◽

True Positive ◽

Small Sample Sizes ◽

Meta Analyses

Abstract Background Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment). Methods In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted. Results Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. Conclusions Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.

Download Full-text

Data-randomized confidence intervals for discrete distributions

Communication in Statistics- Theory and Methods ◽

10.1080/03610928708829397 ◽

1987 ◽

Vol 16 (3) ◽

pp. 705-715 ◽

Cited By ~ 3

Author(s):

E.L. Korn

Keyword(s):

Confidence Intervals ◽

Discrete Distributions

Download Full-text

Confidence and coverage for Bland–Altman limits of agreement and their approximate confidence intervals

Statistical Methods in Medical Research ◽

10.1177/0962280216665419 ◽

2016 ◽

Vol 27 (5) ◽

pp. 1559-1574 ◽

Cited By ~ 22

Author(s):

Andrew Carkeet ◽

Yee Teng Goh

Keyword(s):

Confidence Intervals ◽

Small Sample ◽

Interval Methods ◽

Tolerance Interval ◽

Confidence Limits ◽

Sample Sizes ◽

Limits Of Agreement ◽

Approximate Methods ◽

Exact Confidence Intervals ◽

Tolerance Factors

Bland and Altman described approximate methods in 1986 and 1999 for calculating confidence limits for their 95% limits of agreement, approximations which assume large subject numbers. In this paper, these approximations are compared with exact confidence intervals calculated using two-sided tolerance intervals for a normal distribution. The approximations are compared in terms of the tolerance factors themselves but also in terms of the exact confidence limits and the exact limits of agreement coverage corresponding to the approximate confidence interval methods. Using similar methods the 50th percentile of the tolerance interval are compared with the k values of 1.96 and 2, which Bland and Altman used to define limits of agreements (i.e. [Formula: see text]+/− 1.96Sd and [Formula: see text]+/− 2Sd). For limits of agreement outer confidence intervals, Bland and Altman’s approximations are too permissive for sample sizes <40 (1999 approximation) and <76 (1986 approximation). For inner confidence limits the approximations are poorer, being permissive for sample sizes of <490 (1986 approximation) and all practical sample sizes (1999 approximation). Exact confidence intervals for 95% limits of agreements, based on two-sided tolerance factors, can be calculated easily based on tables and should be used in preference to the approximate methods, especially for small sample sizes.

Download Full-text

Bootstrap confidence limits for groundfish trawl survey estimates of mean abundance

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f96-303 ◽

1997 ◽

Vol 54 (3) ◽

pp. 616-630 ◽

Cited By ~ 17

Author(s):

S J Smith

Keyword(s):

Confidence Intervals ◽

Probability Model ◽

Small Sample ◽

Trawl Survey ◽

Confidence Limits ◽

Complex Sampling ◽

Survey Estimates ◽

Small Sample Sizes ◽

Complex Sampling Designs ◽

Trawl Surveys

Trawl surveys using stratified random designs are widely used on the east coast of North America to monitor groundfish populations. Statistical quantities estimated from these surveys are derived via a randomization basis and do not require that a probability model be postulated for the data. However, the large sample properties of these estimates may not be appropriate for the small sample sizes and skewed data characteristic of bottom trawl surveys. In this paper, three bootstrap resampling strategies that incorporate complex sampling designs are used to explore the properties of estimates for small sample situations. A new form for the bias-corrected and accelerated confidence intervals is introduced for stratified random surveys. Simulation results indicate that the bias-corrected and accelerated confidence limits may overcorrect for the trawl survey data and that percentile limits were closer to the expected values. Nonparametric density estimates were used to investigate the effects of unusually large catches of fish on the bootstrap estimates and confidence intervals. Bootstrap variance estimates decreased as increasingly smoother distributions were assumed for the observations in the stratum with the large catch. Lower confidence limits generally increased with increasing smoothness but the upper bound depended upon assumptions about the shape of the distribution.

Download Full-text