Comparison of the Efficiency of Methods of Digitizing the Range of Values of Dependent Random Variables During Synthesis of a Nonparametric Assessment of Two-Dimensional Probability Density

2017 ◽  
Vol 60 (4) ◽  
pp. 325-330
Author(s):  
A. V. Lapko ◽  
V. A. Lapko
2021 ◽  
pp. 9-14
Author(s):  
Aleksandr V. Lapko ◽  
Vasiliy A. Lapko

The influence on the approximation properties of a nonparametric probability density estimate of Rosenblatt-Parzen type of the information on the dependence of random variables is determined. The ratio of the asymptotic expressions of the mean square deviations of independent and dependent random variables is obtained. This relation for a two-dimensional random variable is considered as a quantitative assessment of the influence of information about their dependence on the approximation properties of the kernel probability density estimate. The established ratio is determined by the kind of probability density and the volumes of the initial statistical data that are used in estimating the probability densities of dependent and independent random variables. The general results obtained are considered in detail for two-dimensional linearly dependent random variables with normal distribution laws. The functional dependence of the ratio of the mean square deviations of the independent and dependent two-dimensional random variables on the correlation coefficient is determined. The dependence of the considered ratio on the volume of statistical data is analyzed. A method for estimating the functional of the second derivatives of two-dimensional random variables with normal distribution laws is developed. The results obtained are the basis for the development of modifications of “fast” procedures for optimizing kernel estimates of probability densities in conditions of large samples.


2021 ◽  
Vol 45 (2) ◽  
pp. 253-260
Author(s):  
I.V. Zenkov ◽  
A.V. Lapko ◽  
V.A. Lapko ◽  
S.T. Im ◽  
V.P. Tuboltsev ◽  
...  

A nonparametric algorithm for automatic classification of large statistical data sets is proposed. The algorithm is based on a procedure for optimal discretization of the range of values of a random variable. A class is a compact group of observations of a random variable corresponding to a unimodal fragment of the probability density. The considered algorithm of automatic classification is based on the «compression» of the initial information based on the decomposition of a multidimensional space of attributes. As a result, a large statistical sample is transformed into a data array composed of the centers of multidimensional sampling intervals and the corresponding frequencies of random variables. To substantiate the optimal discretization procedure, we use the results of a study of the asymptotic properties of a kernel-type regression estimate of the probability density. An optimal number of sampling intervals for the range of values of one- and two-dimensional random variables is determined from the condition of the minimum root-mean square deviation of the regression probability density estimate. The results obtained are generalized to the discretization of the range of values of a multidimensional random variable. The optimal discretization formula contains a component that is characterized by a nonlinear functional of the probability density. An analytical dependence of the detected component on the antikurtosis coefficient of a one-dimensional random variable is established. For independent components of a multidimensional random variable, a methodology is developed for calculating estimates of the optimal number of sampling intervals for random variables and their lengths. On this basis, a nonparametric algorithm for the automatic classification is developed. It is based on a sequential procedure for checking the proximity of the centers of multidimensional sampling intervals and relationships between frequencies of the membership of the random variables from the original sample of these intervals. To further increase the computational efficiency of the proposed automatic classification algorithm, a multithreaded method of its software implementation is used. The practical significance of the developed algorithms is confirmed by the results of their application in processing remote sensing data.


2021 ◽  
pp. 14-20
Author(s):  
Aleksandr V. Lapko ◽  
Vasiliy A. Lapko

A method for estimating the nonlinear functional of the probability density of a two-dimensional random variable is proposed. It is relevant when implementing procedures for fast bandwidths selection in the problem of optimization of kernel probability density estimates. The solution of this problem allows to significantly improve the computational efficiency of nonparametric decision rules. The basis of the proposed approach is the analysis of the formula for the optimal bandwidth of the kernel probability density estimation. In this case, the bandwidth of kernel functions is represented as the product of an indeterminate parameter and the average square deviations of random variables. The main component of an undefined parameter is a nonlinear functional of the probability density. The considered functional is determined by the type of probability density and does not depend on the density parameters. For a family of two-dimensional lognormal laws of distribution of independent random variables, the approximation errors of the considered nonlinear functional from the probability density are determined. The possibility of applying the proposed methodology when evaluating nonlinear functionals of probability densities that differ from the lognormal distribution laws is investigated. An analysis is made of the effect of the resulting approximation errors on the root-mean-square criteria for restoring a non-parametric estimate of the probability density of a two-dimensional random variable.


2006 ◽  
Vol 38 (3) ◽  
pp. 693-728 ◽  
Author(s):  
A. D. Barbour ◽  
Aihua Xia

In this paper, we adapt the very effective Berry-Esseen theorems of Chen and Shao (2004), which apply to sums of locally dependent random variables, for use with randomly indexed sums. Our particular interest is in random variables resulting from integrating a random field with respect to a point process. We illustrate the use of our theorems in three examples: in a rather general model of the insurance collective; in problems in geometrical probability involving stabilizing functionals; and in counting the maximal points in a two-dimensional region.


2006 ◽  
Vol 38 (03) ◽  
pp. 693-728 ◽  
Author(s):  
A. D. Barbour ◽  
Aihua Xia

In this paper, we adapt the very effective Berry-Esseen theorems of Chen and Shao (2004), which apply to sums of locally dependent random variables, for use with randomly indexed sums. Our particular interest is in random variables resulting from integrating a random field with respect to a point process. We illustrate the use of our theorems in three examples: in a rather general model of the insurance collective; in problems in geometrical probability involving stabilizing functionals; and in counting the maximal points in a two-dimensional region.


2020 ◽  
pp. 9-13
Author(s):  
A. V. Lapko ◽  
V. A. Lapko

An original technique has been justified for the fast bandwidths selection of kernel functions in a nonparametric estimate of the multidimensional probability density of the Rosenblatt–Parzen type. The proposed method makes it possible to significantly increase the computational efficiency of the optimization procedure for kernel probability density estimates in the conditions of large-volume statistical data in comparison with traditional approaches. The basis of the proposed approach is the analysis of the optimal parameter formula for the bandwidths of a multidimensional kernel probability density estimate. Dependencies between the nonlinear functional on the probability density and its derivatives up to the second order inclusive of the antikurtosis coefficients of random variables are found. The bandwidths for each random variable are represented as the product of an undefined parameter and their mean square deviation. The influence of the error in restoring the established functional dependencies on the approximation properties of the kernel probability density estimation is determined. The obtained results are implemented as a method of synthesis and analysis of a fast bandwidths selection of the kernel estimation of the two-dimensional probability density of independent random variables. This method uses data on the quantitative characteristics of a family of lognormal distribution laws.


Sign in / Sign up

Export Citation Format

Share Document