scholarly journals Normality for Non-normal Distributions

2020 ◽  
Vol 8 (2) ◽  
pp. 51-60
Author(s):  
Khong Liang Koh ◽  
Nor Aishah Ahad

It has been usually assumed that a sample data is normally distributed when the sample size is at least 30. This is the general rule in using central limit theorem based on the sample size being greater or equal to 30. Many literary works also assumed normality when sample size is at least 30. This study aims to determine the least required sample size that satisfy normality assumption from three non-normal distributions, Poisson, Gamma and Exponential distributions. Computer simulations are carried out to study the least required sample size for the three distributions. Through the study, it is found that sample data from Poisson and Gamma distributions need sample size less than 30, while Exponential needs more than 30 to achieve normality.

Symmetry ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 19 ◽  
Author(s):  
Roser Bono ◽  
Jaume Arnau ◽  
Rafael Alarcón ◽  
Maria J. Blanca

Several measures of skewness and kurtosis were proposed by Hogg (1974) in order to reduce the bias of conventional estimators when the distribution is non-normal. Here we conducted a Monte Carlo simulation study to compare the performance of conventional and Hogg’s estimators, considering the most frequent continuous distributions used in health, education, and social sciences (gamma, lognormal and exponential distributions). In order to determine the bias, precision and accuracy of the skewness and kurtosis estimators for each distribution we calculated the relative bias, the coefficient of variation, and the scaled root mean square error. The effect of sample size on the estimators is also analyzed. In addition, a SAS program for calculating both conventional and Hogg’s estimators is presented. The results indicated that for the non-normal distributions investigated, the estimators of skewness and kurtosis which best reflect the shape of the distribution are Hogg’s estimators. It should also be noted that Hogg’s estimators are not as affected by sample size as are conventional estimators.


1995 ◽  
Vol 20 (4) ◽  
pp. 337-348 ◽  
Author(s):  
Lingjia Zeng ◽  
Ronald T. Cope

Large-sample standard errors of linear equating for the counterbalanced design are derived using the general delta method. Computer simulations were conducted to compare the standard errors derived by Lord under the normality assumption with those derived in this article without such an assumption. The standard errors derived without the normality assumption were found to be more accurate than those derived with the normality assumption in an example using large sample size and moderately skewed score distributions. In an example using nearly symmetric distributions, the standard errors computed with the normality assumption were found to be at least as accurate as those derived without the normality assumption.


A wide class of stochastic processes, called regenerative, is defined, and it is shown that under general conditions the instantaneous probability distribution of such a process tends with time to a unique limiting distribution, whatever the initial conditions. The general results are then applied to 'S.M.-processes’, a generalization of Markov chains, and it is shown that the limiting distribution of the process may always be obtained by assuming negative exponential distributions for the ‘waits’ in the different ‘states’. Lastly, the behaviour of integrals of regenerative processes is considered and, amongst other results, an ergodic and a multi-dimensional central limit theorem are proved.


2008 ◽  
Vol 65 (3) ◽  
pp. 1077-1086 ◽  
Author(s):  
Maarten H. P. Ambaum

Abstract A novel statistic for local wave amplitude of the 500-hPa geopotential height field is introduced. The statistic uses a Hilbert transform to define a longitudinal wave envelope and dynamical latitude weighting to define the latitudes of interest. Here it is used to detect the existence, or otherwise, of multimodality in its distribution function. The empirical distribution function for the 1960–2000 period is close to a Weibull distribution with shape parameters between 2 and 3. There is substantial interdecadal variability but no apparent local multimodality or bimodality. The zonally averaged wave amplitude, akin to the more usual wave amplitude index, is close to being normally distributed. This is consistent with the central limit theorem, which applies to the construction of the wave amplitude index. For the period 1960–70 it is found that there is apparent bimodality in this index. However, the different amplitudes are realized at different longitudes, so there is no bimodality at any single longitude. As a corollary, it is found that many commonly used statistics to detect multimodality in atmospheric fields potentially satisfy the assumptions underlying the central limit theorem and therefore can only show approximately normal distributions. The author concludes that these techniques may therefore be suboptimal to detect any multimodality.


2017 ◽  
Vol 17 (3) ◽  
pp. 196-222 ◽  
Author(s):  
Rasim M. Musal ◽  
Tahir Ekin

Overpayment estimation using a sample of audited medical claims is an often used method to determine recoupment amounts. The current practice based on central limit theorem may not be efficient for certain kinds of claims data, including skewed payment populations with partial overpayments. As an alternative, we propose a novel Bayesian inflated mixture model. We provide an analysis of the validity and efficiency of the model estimates for a number of payment populations and overpayment scenarios. In addition, learning about the parameters of the overpayment distribution with increasing sample size may provide insights for the medical investigators. We present a discussion of model selection and potential modelling extensions.


2014 ◽  
Vol 37 (1) ◽  
pp. 95-100
Author(s):  
A. Martínez-Abrain ◽  

A common practice of experimental design in field ecology, which relies on the Central Limit Theorem, is the use of the ‘n = 30 rule of thumb’. I show here that papers published in Animal Biodiversity and Conservation during the period 2010–2013 adjust to this rule. Samples collected around this relatively small size have the advantage of coupling statistically–significant results with large effect sizes, which is positive because field researchers are commonly interested in large ecological effects. However, the power to detect a large effect size depends not only on sample size but, importantly, also on between–population variability. By means of a hypothetical example, I show here that the statistical power is little affected by small–medium variance changes between populations. However, power decreases abruptly beyond a certain threshold, which I identify roughly around a five–fold difference in variance between populations. Hence, researchers should explore variance profiles of their study populations to make sure beforehand that their study populations lies within the safe zone to use the ‘n = 30 rule of thumb’. Otherwise, sample size should be increased beyond 30, even to detect large effect sizes.


Author(s):  
Mbuba Morris Mwiti ◽  
Samson W. Wanyonyi ◽  
Davis Mwenda Marangu

The Central limit theorem is a very powerful tool in statistical inference and Mathematics in general, since it has numerous applications such as in topology and many other areas. For the case of probability theory, it states that, “given certain conditions, the sample mean of a sufficiently large number or iterates of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed”. In the research paper, three different statements of our theorem (CLT) are given. This research paper has data regarding the shoe size and the gender of the of the university students. The paper is aimed at finding if the shoe sizes converges to a normal distribution as well as find the modal shoe size of university students and to apply the results of the central limit theorem to test the hypothesis if most university students put on shoe size seven. The Shoe sizes are typically treated as discretely distributed random variables, allowing the calculation of mean value and the standard deviation of the shoe sizes. The sample data which is used in this research paper belonged to different areas of Kibabii University which was divided into five strata. From two strata, a sample size of 74 respondents was drawn and from the remaining three strata, a sample of 73 students per stratum was drawn at random which totaled to a sample size of 367 respondents. By analyzing the data, using SPSS and Microsoft Excel, it was vivid that the shoe sizes are normally distributed with a well-defined mean and standard deviation. We also proved that most university students put on shoe size seven by testing our hypothesis using the p-value. The modal shoe size for university students was found to be seven since it had the highest frequency (97/367). This research was aimed at enlightening shoe investors, whose main market is the university students, on the shoe sizes that are on high demand among university students.


Sign in / Sign up

Export Citation Format

Share Document