scholarly journals An Improved Variable Kernel Density Estimator Based on L2 Regularization

Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 2004
Author(s):  
Yi Jin ◽  
Yulin He ◽  
Defa Huang

The nature of the kernel density estimator (KDE) is to find the underlying probability density function (p.d.f) for a given dataset. The key to training the KDE is to determine the optimal bandwidth or Parzen window. All the data points share a fixed bandwidth (scalar for univariate KDE and vector for multivariate KDE) in the fixed KDE (FKDE). In this paper, we propose an improved variable KDE (IVKDE) which determines the optimal bandwidth for each data point in the given dataset based on the integrated squared error (ISE) criterion with the L2 regularization term. An effective optimization algorithm is developed to solve the improved objective function. We compare the estimation performance of IVKDE with FKDE and VKDE based on ISE criterion without L2 regularization on four univariate and four multivariate probability distributions. The experimental results show that IVKDE obtains lower estimation errors and thus demonstrate the effectiveness of IVKDE.

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-17 ◽  
Author(s):  
Yulin He ◽  
Jie Jiang ◽  
Dexin Dai ◽  
Klohoun Fabrice

Probability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation problem for stream data because a bran-new KDE has to be retrained based on the combination of current data and newly coming data. This process increases the training time and wastes the computation resource. This article proposes an incremental kernel density estimator (I-KDE) which deals with the p.d.f. estimation problem in the way of data stream computation. The I-KDE updates the current KDE dynamically and gradually with the newly coming data rather than retraining the bran-new KDE with the combination of current data and newly coming data. The theoretical analysis proves the convergence of the I-KDE only if the estimated p.d.f. of newly coming data is convergent to its true p.d.f. In order to guarantee the convergence of the I-KDE, a new multivariate fixed-point iteration algorithm based on the unbiased cross validation (UCV) method is developed to determine the optimal bandwidth of the KDE. The experimental results on 10 univariate and 4 multivariate probability distributions demonstrate the feasibility and effectiveness of the I-KDE.


Sign in / Sign up

Export Citation Format

Share Document