scholarly journals Testing for goodness rather than lack of fit of continuous probability distributions

PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0256499
Author(s):  
Stefan Wellek

The vast majority of testing procedures presented in the literature as goodness-of-fit tests fail to accomplish what the term is promising. Actually, a significant result of such a test indicates that the true distribution underlying the data differs substantially from the assumed model, whereas the true objective is usually to establish that the model fits the data sufficiently well. Meeting that objective requires to carry out a testing procedure for a problem in which the statement that the deviations between model and true distribution are small, plays the role of the alternative hypothesis. Testing procedures of this kind, for which the term tests for equivalence has been coined in statistical usage, are available for establishing goodness-of-fit of discrete distributions. We show how this methodology can be extended to settings where interest is in establishing goodness-of-fit of distributions of the continuous type.

In Chapter 2, probability distributions are presented; the distributions exposed are those with more relation to the analysis and study of waiting lines; discrete distributions: binomial, geometric, Poisson; continuous distributions: uniform, exponential, erlang, and normal. Confidence intervals are calculated for some of the parameters of the distributions. A brief example of the generation of pseudorandom exponential times using a spreadsheet is presented. The chapter closes with the goodness-of-fit tests of probability distributions, especially the Anderson-Darling test. The statistical language of programming R is used in the exercises performed. Several codes are proposed in R Language to perform calculations automatically.


2016 ◽  
Vol 11 (1) ◽  
pp. 432-440 ◽  
Author(s):  
M. T. Amin ◽  
M. Rizwan ◽  
A. A. Alazba

AbstractThis study was designed to find the best-fit probability distribution of annual maximum rainfall based on a twenty-four-hour sample in the northern regions of Pakistan using four probability distributions: normal, log-normal, log-Pearson type-III and Gumbel max. Based on the scores of goodness of fit tests, the normal distribution was found to be the best-fit probability distribution at the Mardan rainfall gauging station. The log-Pearson type-III distribution was found to be the best-fit probability distribution at the rest of the rainfall gauging stations. The maximum values of expected rainfall were calculated using the best-fit probability distributions and can be used by design engineers in future research.


2021 ◽  
Vol 2 (2) ◽  
pp. 60-67
Author(s):  
Rashidul Hasan Rashidul Hasan

The estimation of a suitable probability model depends mainly on the features of available temperature data at a particular place. As a result, existing probability distributions must be evaluated to establish an appropriate probability model that can deliver precise temperature estimation. The study intended to estimate the best-fitted probability model for the monthly maximum temperature at the Sylhet station in Bangladesh from January 2002 to December 2012 using several statistical analyses. Ten continuous probability distributions such as Exponential, Gamma, Log-Gamma, Beta, Normal, Log-Normal, Erlang, Power Function, Rayleigh, and Weibull distributions were fitted for these tasks using the maximum likelihood technique. To determine the model’s fit to the temperature data, several goodness-of-fit tests were applied, including the Kolmogorov-Smirnov test, Anderson-Darling test, and Chi-square test. The Beta distribution is found to be the best-fitted probability distribution based on the largest overall score derived from three specified goodness-of-fit tests for the monthly maximum temperature data at the Sylhet station.


2021 ◽  
Vol 3 (1) ◽  
pp. 16-25
Author(s):  
Siti Mariam Norrulashikin ◽  
Fadhilah Yusof ◽  
Siti Rohani Mohd Nor ◽  
Nur Arina Bazilah Kamisan

Modeling meteorological variables is a vital aspect of climate change studies. Awareness of the frequency and magnitude of climate change is a critical concern for mitigating the risks associated with climate change. Probability distribution models are valuable tools for a frequency study of climate variables since it measures how the probability distribution able to fit well in the data series. Monthly meteorological data including average temperature, wind speed, and rainfall were analyzed in order to determine the most suited probability distribution model for Kuala Krai district. The probability distributions that were used in the analysis were Beta, Burr, Gamma, Lognormal, and Weibull distributions. To estimate the parameters for each distribution, the maximum likelihood estimate (MLE) was employed. Goodness-of-fit tests such as the Kolmogorov-Smirnov, and Anderson-Darling tests were conducted to assess the best suited model, and the test's reliability. Results from statistical studies indicate that Burr distributions better characterize the meteorological data of our research. The graph of probability density function, cumulative distribution function as well as Q-Q plot are presented.


Proceedings ◽  
2018 ◽  
Vol 2 (11) ◽  
pp. 579
Author(s):  
Thomas Papalaskaris ◽  
Theologos Panagiotidis

Only a few scientific research studies, especially dealing with extremely low flow conditions, have been compiled so far, in Greece. The present study, aiming to contribute in this specific area of hydrologic investigation, generates synthetic low stream flow time series of an entire calendar year considering the stream flow data recorded during a center interval period of the year 2015. We examined the goodness of fit tests of eleven theoretical probability distributions to daily low stream flow data acquired at a certain location of the absolutely channelized urban stream which crosses the roads junction formed by Iokastis road an Chrisostomou Smirnis road, Agios Loukas residential area, Kavala city, NE Greece, using a 3-inches conventional portable Parshall flume and calculated the corresponding probability distributions parameters. The Kolmogorov-Smirnov, Anderson-Darling and Chi-Squared, GOF tests were employed to show how well the probability distributions fitted the recorded data and the results were demonstrated through interactive tables providing us the ability to effectively decide which model best fits the observed data. Finally, the observed against the calculated low flow data are plotted, compiling a log-log scale chart and calculate statistics featuring the comparison between the recorded and the forecasted low flow data.


Risks ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 55
Author(s):  
Vytaras Brazauskas ◽  
Sahadeb Upretee

Quantiles of probability distributions play a central role in the definition of risk measures (e.g., value-at-risk, conditional tail expectation) which in turn are used to capture the riskiness of the distribution tail. Estimates of risk measures are needed in many practical situations such as in pricing of extreme events, developing reserve estimates, designing risk transfer strategies, and allocating capital. In this paper, we present the empirical nonparametric and two types of parametric estimators of quantiles at various levels. For parametric estimation, we employ the maximum likelihood and percentile-matching approaches. Asymptotic distributions of all the estimators under consideration are derived when data are left-truncated and right-censored, which is a typical loss variable modification in insurance. Then, we construct relative efficiency curves (REC) for all the parametric estimators. Specific examples of such curves are provided for exponential and single-parameter Pareto distributions for a few data truncation and censoring cases. Additionally, using simulated data we examine how wrong quantile estimates can be when one makes incorrect modeling assumptions. The numerical analysis is also supplemented with standard model diagnostics and validation (e.g., quantile-quantile plots, goodness-of-fit tests, information criteria) and presents an example of when those methods can mislead the decision maker. These findings pave the way for further work on RECs with potential for them being developed into an effective diagnostic tool in this context.


2011 ◽  
Vol 478 ◽  
pp. 54-63 ◽  
Author(s):  
Antony T. McTigue ◽  
Annette M. Harte

This paper presents the results from an experimental test program conducted on commercially available oriented strandboard (OSB) panels and statistical analyses of the results. Standardised testing was used to determine the short-term behaviour of OSB/3 panels subjected to tension loading. A variety of thicknesses sourced from three different producers were used. Analysis of the results indicate that a quadratic expression in the form of  = a2 + b provides the best description of the relationship between stress (and strain ( up to the point of failure. It has also been shown that the coefficients a and b of the quadratic regression equations are negatively correlated to each other. Anderson-Darling goodness-of-fit tests were conducted on the results for tension strength and modulus of elasticity (MOE). The results indicate that the tension strength and MOE come from populations that follow either normal or lognormal probability distributions.


2019 ◽  
Vol 65 (3) ◽  
pp. 296-313
Author(s):  
Agnieszka Lach ◽  
Łukasz Smaga

The aim of this paper is to investigate the finite sample behavior of seven goodness-of-fit tests for left truncated distributions of Chernobai et al. (2015) in terms of size and power. Simulation experiments are based on artificial data generated from the distributions that were used in the past or are used nowadays to describe the tails of asset returns. The study was conducted for different tail thickness and for changing truncation point. Simulation results indicate that the testing procedures do not work equally well under finite samples, and some of them require quite large number of observations to perform satisfactorily.


1988 ◽  
Vol 32 (7) ◽  
pp. 460-464
Author(s):  
Mari Berry ◽  
Brian Peacock ◽  
Bobbie Foote ◽  
Lawrence Leemis

Statistical tests are used to identify the parent distribution corresponding to a data set. A human observer looking at a histogram can also identify a probability distribution that models the parent distribution. The accuracy of a human observer was compared to the chi-square test for discrete data and the Kolmogorov-Smirnov and chi-square tests for continuous data. The human observer proved more accurate in identifying continuous distributions and the chi-square test proved to be superior in identifying discrete distributions. The effect of sample size and number of intervals in the histogram was included in the experimental design.


Sign in / Sign up

Export Citation Format

Share Document