histogram estimation
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 8)

H-INDEX

7
(FIVE YEARS 2)

Author(s):  
Qibin Zhou ◽  
Qingang Su ◽  
Dingyu Yang

Real-time traffic estimation focuses on predicting the travel time of one travel path, which is capable of helping drivers selecting an appropriate or favor path. Statistical analysis or neural network approaches have been explored to predict the travel time on a massive volume of traffic data. These methods need to be updated when the traffic varies frequently, which incurs tremendous overhead. We build a system RealTER⁢e⁢a⁢l⁢T⁢E, implemented on a popular and open source streaming system StormS⁢t⁢o⁢r⁢m to quickly deal with high speed trajectory data. In RealTER⁢e⁢a⁢l⁢T⁢E, we propose a locality-sensitive partition and deployment algorithm for a large road network. A histogram estimation approach is adopted to predict the traffic. This approach is general and able to be incremental updated in parallel. Extensive experiments are conducted on six real road networks and the results illustrate RealTE achieves higher throughput and lower prediction error than existing methods. The runtime of a traffic estimation is less than 11 seconds over a large road network and it takes only 619619 microseconds for model updates.


Mathematics ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. 1090 ◽  
Author(s):  
Branislav Panić ◽  
Jernej Klemenc ◽  
Marko Nagode

A maximum-likelihood estimation of a multivariate mixture model’s parameters is a difficult problem. One approach is to combine the REBMIX and EM algorithms. However, the REBMIX algorithm requires the use of histogram estimation, which is the most rudimentary approach to an empirical density estimation and has many drawbacks. Nevertheless, because of its simplicity, it is still one of the most commonly used techniques. The main problem is to estimate the optimum histogram-bin width, which is usually set by the number of non-overlapping, regularly spaced bins. For univariate problems it is usually denoted by an integer value; i.e., the number of bins. However, for multivariate problems, in order to obtain a histogram estimation, a regular grid must be formed. Thus, to obtain the optimum histogram estimation, an integer-optimization problem must be solved. The aim is therefore the estimation of optimum histogram binning, alone and in application to the mixture model parameter estimation with the REBMIX&EM strategy. As an estimator, the Knuth rule was used. For the optimization algorithm, a derivative based on the coordinate-descent optimization was composed. These proposals yielded promising results. The optimization algorithm was efficient and the results were accurate. When applied to the multivariate, Gaussian-mixture-model parameter estimation, the results were competitive. All the improvements were implemented in the rebmix R package.


2019 ◽  
Vol 31 (4) ◽  
pp. 655-669 ◽  
Author(s):  
Yiwen Nie ◽  
Wei Yang ◽  
Liusheng Huang ◽  
Xike Xie ◽  
Zhenhua Zhao ◽  
...  
Keyword(s):  

2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Bai Li ◽  
Vishesh Karwa ◽  
Aleksandra Slavković ◽  
Rebecca Carter Steorts

Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy.We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.


2018 ◽  
Vol 69 ◽  
pp. 426-434 ◽  
Author(s):  
Amira S. Ashour ◽  
Yanhui Guo ◽  
Enver Kucukkulahli ◽  
Pakize Erdogmus ◽  
Kemal Polat

Sign in / Sign up

Export Citation Format

Share Document