scholarly journals Normalization of the Kolmogorov–Smirnov and Shapiro–Wilk tests of normality

2015 ◽  
Vol 52 (2) ◽  
pp. 85-93 ◽  
Author(s):  
Zofia Hanusz ◽  
Joanna Tarasińska

Abstract Two very well-known tests for normality, the Kolmogorov-Smirnov and the Shapiro- Wilk tests, are considered. Both of them may be normalized using Johnson’s (1949) SB distribution. In this paper, functions for normalizing constants, dependent on the sample size, are given. These functions eliminate the need to use non-standard statistical tables with normalizing constants, and make it easy to obtain p-values for testing normality.

Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 603
Author(s):  
Leonid Hanin

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.


2018 ◽  
Vol 7 (3) ◽  
pp. 63-69
Author(s):  
Suzanne L. Havstad ◽  
George W. Divine

ABSTRACT In this first of a two-part series on introductory biostatistics, we briefly describe common designs. The advantages and disadvantages of six design types are highlighted. The randomized clinical trial is the gold standard to which other designs are compared. We present the benefits of randomization and discuss the importance of power and sample size. Sample size and power calculations for any design need to be based on meaningful effects of interest. We give examples of how the effect of interest and the sample size interrelate. We also define concepts helpful to the statistical inference process. When drawing conclusions from a completed study, P values, point estimates, and confidence intervals will all assist the researcher. Finally, the issue of multiple comparisons is briefly explored. The second paper in this series will describe basic analytical techniques and discuss some common mistakes in the interpretation of data.


2007 ◽  
Vol 135 (3) ◽  
pp. 1151-1157 ◽  
Author(s):  
Dag J. Steinskog ◽  
Dag B. Tjøstheim ◽  
Nils G. Kvamstø

Abstract The Kolmogorov–Smirnov goodness-of-fit test is used in many applications for testing normality in climate research. This note shows that the test usually leads to systematic and drastic errors. When the mean and the standard deviation are estimated, it is much too conservative in the sense that its p values are strongly biased upward. One may think that this is a small sample problem, but it is not. There is a correction of the Kolmogorov–Smirnov test by Lilliefors, which is in fact sometimes confused with the original Kolmogorov–Smirnov test. Both the Jarque–Bera and the Shapiro–Wilk tests for normality are good alternatives to the Kolmogorov–Smirnov test. A power comparison of eight different tests has been undertaken, favoring the Jarque–Bera and the Shapiro–Wilk tests. The Jarque–Bera and the Kolmogorov–Smirnov tests are also applied to a monthly mean dataset of geopotential height at 500 hPa. The two tests give very different results and illustrate the danger of using the Kolmogorov–Smirnov test.


2015 ◽  
Vol 36 (6Supl2) ◽  
pp. 4151
Author(s):  
Giovani Facco ◽  
Alberto Cargnelutti Filho ◽  
Alessandro Dal’Col Lúcio ◽  
Gustavo Oliveira dos Santos ◽  
Réges Bellé Stefanello ◽  
...  

The objectives of this study were to determine the sample size (i.e., number of plants) required to accurately estimate the average of morphological traits of pigeonpea (Cajanus cajan L.) and to check for variability in sample size between evaluation periods and seasons. Two uniformity trials (i.e., experiments without treatment) were conducted for two growing seasons. In the first season (2011/2012), the seeds were sown by broadcast seeding, and in the second season (2012/2013), the seeds were sown in rows spaced 0.50 m apart. The ground area in each experiment was 1,848 m2, and 360 plants were marked in the central area, in a 2 m × 2 m grid. Three morphological traits (e.g., number of nodes, plant height and stem diameter) were evaluated 13 times during the first season and 22 times in the second season. Measurements for all three morphological traits were normally distributed and confirmed through the Kolmogorov-Smirnov test. Randomness was confirmed using the Run Test, and the descriptive statistics were calculated. For each trait, the sample size (n) was calculated for the semiamplitudes of the confidence interval (i.e., estimation error) equal to 2, 4, 6, ..., 20% of the estimated mean with a confidence coefficient (1-?) of 95%. Subsequently, n was fixed at 360 plants, and the estimation error of the estimated percentage of the average for each trait was calculated. Variability of the sample size for the pigeonpea culture was observed between the morphological traits evaluated, among the evaluation periods and between seasons. Therefore, to assess with an accuracy of 6% of the estimated average, at least 136 plants must be evaluated throughout the pigeonpea crop cycle to determine the sample size for the traits (e.g., number of nodes, plant height and stem diameter) in the different evaluation periods and between seasons.


2021 ◽  
pp. bmjebm-2020-111603
Author(s):  
John Ferguson

Commonly accepted statistical advice dictates that large-sample size and highly powered clinical trials generate more reliable evidence than trials with smaller sample sizes. This advice is generally sound: treatment effect estimates from larger trials tend to be more accurate, as witnessed by tighter confidence intervals in addition to reduced publication biases. Consider then two clinical trials testing the same treatment which result in the same p values, the trials being identical apart from differences in sample size. Assuming statistical significance, one might at first suspect that the larger trial offers stronger evidence that the treatment in question is truly effective. Yet, often precisely the opposite will be true. Here, we illustrate and explain this somewhat counterintuitive result and suggest some ramifications regarding interpretation and analysis of clinical trial results.


2021 ◽  
Author(s):  
Andrea Madella ◽  
Christoph Glotzbach ◽  
Todd A. Ehlers

Abstract. Detrital tracer thermochronology exploits the relationship between bedrock thermochronometric age-elevation profiles and a distribution of detrital grain-ages collected from river, glacial, or other sediment to study spatial changes in the distribution of catchment erosion. If ages increase linearly with elevation, spatially uniform erosion is expected to yield a detrital age distribution that mirrors the catchment's hypsometric curve. Alternatively, a mismatch between detrital and hypsometric distributions may indicate non-uniform erosion within a catchment. For studies seeking to identify the pattern of erosion, measured grain-age populations rarely exceed 100 grains due largely to the time and costs related to individual measurements. With sample sizes of this order, discerning between two detrital age distributions produced by different catchment erosion scenarios can be difficult at a high statistical confidence level. However, there is no established method to quantify the sample-size-dependent uncertainty inherent to detrital tracer thermochronology, and practitioners are often left wondering how many grains is enough?. Here, we investigate how sample size affects the uncertainty of detrital age distributions and how such uncertainty affects the ability to uniquely infer the erosional pattern of the upstream area. We do this using the Kolmogorov-Smirnov statistic as metric of dissimilarity among distributions, based on which the statistical confidence of detecting an erosional pattern is determined through Monte Carlo sampling. The techniques are implemented in a new tool (ESD_thermotrace) to consistently report confidence levels as a function of sample size and application-specific variables. The proposed tool is made available as a new open-source Python-based script along with test data. Testing between different hypothesized erosion scenarios with this tool provides thermochronologists with the minimum sample size (i.e. number of bedrock and detrital grain-ages) required to answer their specific scientific question, at their desired level of statistical confidence. Furthermore, in cases of unavoidably small sample size (e.g., due to poor grain quality or low sample volume), we provide a means to calculate the confidence level of interpretations made from the data.


Methodology ◽  
2015 ◽  
Vol 11 (2) ◽  
pp. 65-79 ◽  
Author(s):  
Geert H. van Kollenburg ◽  
Joris Mulder ◽  
Jeroen K. Vermunt

The application of latent class (LC) analysis involves evaluating the LC model using goodness-of-fit statistics. To assess the misfit of a specified model, say with the Pearson chi-squared statistic, a p-value can be obtained using an asymptotic reference distribution. However, asymptotic p-values are not valid when the sample size is not large and/or the analyzed contingency table is sparse. Another problem is that for various other conceivable global and local fit measures, asymptotic distributions are not readily available. An alternative way to obtain the p-value for the statistic of interest is by constructing its empirical reference distribution using resampling techniques such as the parametric bootstrap or the posterior predictive check (PPC). In the current paper, we show how to apply the parametric bootstrap and two versions of the PPC to obtain empirical p-values for a number of commonly used global and local fit statistics within the context of LC analysis. The main difference between the PPC using test statistics and the parametric bootstrap is that the former takes into account parameter uncertainty. The PPC using discrepancies has the advantage that it is computationally much less intensive than the other two resampling methods. In a Monte Carlo study we evaluated Type I error rates and power of these resampling methods when used for global and local goodness-of-fit testing in LC analysis. Results show that both the bootstrap and the PPC using test statistics are generally good alternatives to asymptotic p-values and can also be used when (asymptotic) distributions are not known. Nominal Type I error rates were not met when sample size was small and the contingency table has many cells. Overall the PPC using test statistics was somewhat more conservative than the parametric bootstrap. We have also replicated previous research suggesting that the Pearson χ2 statistic should in many cases be preferred over the likelihood-ratio G2 statistic. Power to reject a model for which the number of LCs was one less than in the population was very high, unless sample size was small. When the contingency tables are very sparse, the total bivariate residual (TBVR) statistic, which is based on bivariate relationships, still had very high power, signifying its usefulness in assessing model fit.


Sign in / Sign up

Export Citation Format

Share Document