probability density estimation
Recently Published Documents


TOTAL DOCUMENTS

202
(FIVE YEARS 23)

H-INDEX

25
(FIVE YEARS 1)

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259111
Author(s):  
Frank Kwasniok

A comprehensive methodology for semiparametric probability density estimation is introduced and explored. The probability density is modelled by sequences of mostly regular or steep exponential families generated by flexible sets of basis functions, possibly including boundary terms. Parameters are estimated by global maximum likelihood without any roughness penalty. A statistically orthogonal formulation of the inference problem and a numerically stable and fast convex optimization algorithm for its solution are presented. Automatic model selection over the type and number of basis functions is performed with the Bayesian information criterion. The methodology can naturally be applied to densities supported on bounded, infinite or semi-infinite domains without boundary bias. Relationships to the truncated moment problem and the moment-constrained maximum entropy principle are discussed and a new theorem on the existence of solutions is contributed. The new technique compares very favourably to kernel density estimation, the diffusion estimator, finite mixture models and local likelihood density estimation across a diverse range of simulation and observation data sets. The semiparametric estimator combines a very small mean integrated squared error with a high degree of smoothness which allows for a robust and reliable detection of the modality of the probability density in terms of the number of modes and bumps.


2021 ◽  
Vol 2125 (1) ◽  
pp. 012032
Author(s):  
Ning Li ◽  
Junfeng Duan ◽  
Jun Ma ◽  
Wei Qiu ◽  
Wei Zhang ◽  
...  

Abstract Electric energy metering equipment (EEME) will fail in advance not as designed running in extreme environments. A multi-kernel Gaussian process regression model using measurement error data to perceive remaining useful life (RUL) for EEME is proposed. Firstly, the gauss kernel and periodic kernel are used to match the health index trend of EEME under a variety of typical environmental stresses. Furthermore, the Bayesian method and Monte Carlo Markov chain method are used to solve the model, and the Weibull distribution is used to fit the posterior trajectory to get the probability density estimation of the RUL.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Ruihan Cheng ◽  
Longfei Zhang ◽  
Shiqi Wu ◽  
Sen Xu ◽  
Shang Gao ◽  
...  

Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.


Author(s):  
Hien Duy Nguyen ◽  
TrungTin Nguyen ◽  
Faicel Chamroukhi ◽  
Geoffrey John McLachlan

AbstractMixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is univariate. Auxiliary lemmas are proved regarding the richness of the soft-max gating function class, and their relationships to the class of Gaussian gating functions.


2021 ◽  
Author(s):  
Antoni Torres-Signes ◽  
M. Pilar Frías ◽  
Maria Dolores Ruiz-Medina

Abstract A multiple objective space-time forecasting approach is presented involving cyclical curve log-regression, and multivariate time series spatial residual correlation analysis. Specifically, the mean quadratic loss function is minimized in the framework of trigonometric regression. While, in our subsequent spatial residual correlation analysis, maximization of the likelihood allows us to compute the posterior mode in a Bayesian multivariate time series soft-data framework. The presented approach is applied to the analysis of COVID-19 mortality in the first wave affecting the Spanish Communities, since March, 8, 2020 until May, 13, 2020. An empirical comparative study with Machine Learning (ML) regression, based on random k-fold cross-validation, and bootstrapping confidence interval and probability density estimation, is carried out. This empirical analysis also investigates the performance of ML regression models in a hard- and soft-data frameworks. The results could be extrapolated to other counts, countries, and posterior COVID-19 waves.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1080
Author(s):  
Namuk Park ◽  
Songkuk Kim

Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams.


Sign in / Sign up

Export Citation Format

Share Document