scholarly journals A sketch for the KS test for Big Data

2021 ◽  
Author(s):  
Thalis D. Galeno ◽  
João Gama ◽  
Douglas O. Cardoso

Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribution. The method allows to evaluate the departure from such a distribution of a input stream, being space and time efficient. We show the accuracy of our algorithm by making several experiments in different scenarios: varying reference distribution and its parameters, sample size, and available memory. The performance of rival methods, some of which are considered the state-of-the-art, were compared. It is demonstrated that our algorithm is superior in most of the cases, considering the absolute error of the test statistic.

2021 ◽  
Vol 5 (1) ◽  
pp. 10
Author(s):  
Mark Levene

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.


Author(s):  
ZHENMIN CHEN ◽  
CHUNMIAO YE

Improving power of goodness-of-fit tests is an important research topic in statistics. The goal of the goodness-of-fit test is to check whether the underlying probability distribution, from which a sample is drawn, differs from a hypothesized distribution. Numerous research papers have been published in this area. It has been shown that the power of the existing goodness-of-fit tests in the literature is unsatisfactory when the alternative distributions are of V-shape or when the sample sizes are small. This motivates the development of more powerful test statistics. In this research, a new test statistic is proposed. The result can be used to test whether the underlying probability distribution differs from a uniform distribution. By applying the probability integral transformation, the proposed test statistic can be used to check whether the underlying distribution differs from any hypothesized distribution. The performance of the method proposed in this research is compared with the Kolmogorov–Smirnov test, which is a widely adopted statistical test in the literature. It has been shown that the test proposed in this proposal is more powerful than the Kolmogorov–Smirnov test in some cases.


1994 ◽  
Vol 10 (2) ◽  
pp. 316-356 ◽  
Author(s):  
Yanqin Fan

Let F denote a distribution function defined on the probability space (Ω,,P), which is absolutely continuous with respect to the Lebesgue measure in Rd with probability density function f. Let f0(·,β) be a parametric density function that depends on an unknown p × 1 vector β. In this paper, we consider tests of the goodness-of-fit of f0(·,β) for f(·) for some β based on (i) the integrated squared difference between a kernel estimate of f(·) and the quasimaximum likelihood estimate of f0(·,β) denoted by In and (ii) the integrated squared difference between a kernel estimate of f(·) and the corresponding kernel smoothed estimate of f0(·, β) denoted by Jn. It is shown in this paper that the amount of smoothing applied to the data in constructing the kernel estimate of f(·) determines the form of the test statistic based on In. For each test developed, we also examine its asymptotic properties including consistency and the local power property. In particular, we show that tests developed in this paper, except the first one, are more powerful than the Kolmogorov-Smirnov test under the sequence of local alternatives introduced in Rosenblatt [12], although they are less powerful than the Kolmogorov-Smirnov test under the sequence of Pitman alternatives. A small simulation study is carried out to examine the finite sample performance of one of these tests.


2007 ◽  
Vol 135 (3) ◽  
pp. 1151-1157 ◽  
Author(s):  
Dag J. Steinskog ◽  
Dag B. Tjøstheim ◽  
Nils G. Kvamstø

Abstract The Kolmogorov–Smirnov goodness-of-fit test is used in many applications for testing normality in climate research. This note shows that the test usually leads to systematic and drastic errors. When the mean and the standard deviation are estimated, it is much too conservative in the sense that its p values are strongly biased upward. One may think that this is a small sample problem, but it is not. There is a correction of the Kolmogorov–Smirnov test by Lilliefors, which is in fact sometimes confused with the original Kolmogorov–Smirnov test. Both the Jarque–Bera and the Shapiro–Wilk tests for normality are good alternatives to the Kolmogorov–Smirnov test. A power comparison of eight different tests has been undertaken, favoring the Jarque–Bera and the Shapiro–Wilk tests. The Jarque–Bera and the Kolmogorov–Smirnov tests are also applied to a monthly mean dataset of geopotential height at 500 hPa. The two tests give very different results and illustrate the danger of using the Kolmogorov–Smirnov test.


2015 ◽  
Vol 32 (2) ◽  
pp. 132-143
Author(s):  
Mohammad Saleh Owlia ◽  
Mohammad Saber Fallah Nezhad ◽  
Mohesn Sheikh Sajadieh

Purpose – The purpose of this paper is to propose a new method based on goodness of fit tests for shift detection problems. Design/methodology/approach – In this method, although the distribution of gathered data from the process is the subject of control, but any out-of-control signal could also be generalized to the overall state of the process including the parameters of the distribution. Findings – Results of simulation study denote that among goodness of fit tests, the χ2 test has a better performance than the Kolmogorov-Smirnov test in detecting shifts of process. Also comparison of proposed method with traditional methods denotes that, proposed method generally has smaller probabilities of first and second type errors. Originality/value – To the best of author’s knowledge, no attention has previously been paid to application of goodness of fit tests in process control.


Sign in / Sign up

Export Citation Format

Share Document