nonparametric algorithm
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 11)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol 45 (2) ◽  
pp. 253-260
Author(s):  
I.V. Zenkov ◽  
A.V. Lapko ◽  
V.A. Lapko ◽  
S.T. Im ◽  
V.P. Tuboltsev ◽  
...  

A nonparametric algorithm for automatic classification of large statistical data sets is proposed. The algorithm is based on a procedure for optimal discretization of the range of values of a random variable. A class is a compact group of observations of a random variable corresponding to a unimodal fragment of the probability density. The considered algorithm of automatic classification is based on the «compression» of the initial information based on the decomposition of a multidimensional space of attributes. As a result, a large statistical sample is transformed into a data array composed of the centers of multidimensional sampling intervals and the corresponding frequencies of random variables. To substantiate the optimal discretization procedure, we use the results of a study of the asymptotic properties of a kernel-type regression estimate of the probability density. An optimal number of sampling intervals for the range of values of one- and two-dimensional random variables is determined from the condition of the minimum root-mean square deviation of the regression probability density estimate. The results obtained are generalized to the discretization of the range of values of a multidimensional random variable. The optimal discretization formula contains a component that is characterized by a nonlinear functional of the probability density. An analytical dependence of the detected component on the antikurtosis coefficient of a one-dimensional random variable is established. For independent components of a multidimensional random variable, a methodology is developed for calculating estimates of the optimal number of sampling intervals for random variables and their lengths. On this basis, a nonparametric algorithm for the automatic classification is developed. It is based on a sequential procedure for checking the proximity of the centers of multidimensional sampling intervals and relationships between frequencies of the membership of the random variables from the original sample of these intervals. To further increase the computational efficiency of the proposed automatic classification algorithm, a multithreaded method of its software implementation is used. The practical significance of the developed algorithms is confirmed by the results of their application in processing remote sensing data.


2020 ◽  
Vol 223 ◽  
pp. 02012
Author(s):  
Ekaterina Chzhan

The article deals with the problem of modeling stochastic processes under uncertainty. The peculiarity of the processes under consideration is that the researcher does not have information about the mathematical structure of the object; the object is represented as a black box. The article proposes to use a nonparametric modeling algorithm based on a nonparametric estimate of the regression function on observations. To improve the accuracy of modeling, it is proposed to use an algorithm for generating training samples. The algorithm differs from the previous modification by the definition of essential variables. The results of computational experiments have shown the effectiveness of the proposed algorithms.


Author(s):  
N V Koplyarova ◽  
E A Chzhan ◽  
A V Medvedev ◽  
A A Korneeva ◽  
A V Raskina ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document