scholarly journals Minimum distance histograms with universal performance guarantees

2019 ◽  
Vol 2 (2) ◽  
pp. 507-527 ◽  
Author(s):  
Raazesh Sainudiin ◽  
Gloria Teng

AbstractWe present a data-adaptive multivariate histogram estimator of an unknown density f based on n independent samples from it. Such histograms are based on binary trees called regular pavings (RPs). RPs represent a computationally convenient class of simple functions that remain closed under addition and scalar multiplication. Unlike other density estimation methods, including various regularization and Bayesian methods based on the likelihood, the minimum distance estimate (MDE) is guaranteed to be within an $$L_1$$ L 1 distance bound from f for a given n, no matter what the underlying f happens to be, and is thus said to have universal performance guarantees (Devroye and Lugosi, Combinatorial methods in density estimation. Springer, New York, 2001). Using a form of tree matrix arithmetic with RPs, we obtain the first generic constructions of an MDE, prove that it has universal performance guarantees and demonstrate its performance with simulated and real-world data. Our main contribution is a constructive implementation of an MDE histogram that can handle large multivariate data bursts using a tree-based partition that is computationally conducive to subsequent statistical operations.

2021 ◽  
Vol 11 (15) ◽  
pp. 6748
Author(s):  
Hsun-Ping Hsieh ◽  
Fandel Lin ◽  
Jiawei Jiang ◽  
Tzu-Ying Kuo ◽  
Yu-En Chang

Research on flourishing public bike-sharing systems has been widely discussed in recent years. In these studies, many existing works focus on accurately predicting individual stations in a short time. This work, therefore, aims to predict long-term bike rental/drop-off demands at given bike station locations in the expansion areas. The real-world bike stations are mainly built-in batches for expansion areas. To address the problem, we propose LDA (Long-Term Demand Advisor), a framework to estimate the long-term characteristics of newly established stations. In LDA, several engineering strategies are proposed to extract discriminative and representative features for long-term demands. Moreover, for original and newly established stations, we propose several feature extraction methods and an algorithm to model the correlations between urban dynamics and long-term demands. Our work is the first to address the long-term demand of new stations, providing the government with a tool to pre-evaluate the bike flow of new stations before deployment; this can avoid wasting resources such as personnel expense or budget. We evaluate real-world data from New York City’s bike-sharing system, and show that our LDA framework outperforms baseline approaches.


2016 ◽  
Vol 91 (1-2) ◽  
pp. 141-159 ◽  
Author(s):  
Arthur Charpentier ◽  
Emmanuel Flachaire

Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that a preliminary logarithmic transformation of the data, combined with standard kernel density estimation methods, can provide a much better fit of the density estimation.


Author(s):  
M. P. Enright ◽  
R. C. McClung ◽  
S. J. Hudak ◽  
H. R. Millwater

The risk of fracture associated with high energy rotating components in aircraft gas turbine engines can be sensitive to small changes in applied stress values which are often difficult to measure and predict. Although a parametric approach is often used to characterize random variables, it is difficult to apply to multimodal densities. Nonparametric methods provide a direct fit to the data, and can be used to estimate the multimodal densities often associated with rainflow stress data. In this paper, a comparison of parametric and nonparametric methods is presented for density estimation of rainflow stress profiles associated with military aircraft gas turbine engine usages. A nonparametric adaptive kernel density estimator algorithm is illustrated for standard parametric probability density functions and for rainflow stress pairs associated with F-16/F100 engine usages. The kernel estimates are compared to parametric estimates, including a hybrid approach based on separate treatment of maximum stress pairs. The results provide some insight regarding the strengths and weaknesses of parametric and nonparametric density estimation methods for gas turbine engines, and can be used to develop improved stress estimates for probabilistic life predictions.


Author(s):  
Jian Li ◽  
Yuming Wang ◽  
Jing Wu ◽  
Jing-Wen Ai ◽  
Hao-Cheng Zhang ◽  
...  

Abstract Public health interventions have been implemented to contain the outbreak of COVID-19 in New York City. However, the assessment of those interventions, e.g. social distancing, cloth face covering based on the real-world data from filed study is lacking. The SEIR compartmental model was used to evaluate the social distancing and cloth face covering effect on the daily culminative laboratory confirmed cases in NYC, and COVID-19 transmissibility. The latter was measured by Rt reproduction numbers in three phases which were based on two interventions in implemented in the timeline. The transmissibility decreased from phase 1 to phase 3. The Initial, R0 was 4.60 in Phase 1 without any intervention. After social distancing, the Rt value was reduced by 68%, while after the mask recommendation, it was further reduced by ~60%. Interventions resulted in significant reduction of confirmed case numbers, relative to predicted values based on SEIR model without intervention. Our findings highlight the effectiveness of social distancing and cloth face coverings in slowing down the spread of SARS-CoV-2 in NYC.


2012 ◽  
Vol 2012 ◽  
pp. 1-24 ◽  
Author(s):  
Long Yu ◽  
Zhongqing Su

The present work concerns the estimation of the probability density function (p.d.f.) of measured data in the Lamb wave-based damage detection. Although there was a number of research work which focused on the consensus algorithm of combining all the results of individual sensors, the p.d.f. of measured data, which was the fundamental part of the probability-based method, was still given by experience in existing work. Based on the analysis about the noise-induced errors in measured data, it was learned that the type of distribution was related with the level of noise. In the case of weak noise, the p.d.f. of measured data could be considered as the normal distribution. The empirical methods could give satisfied estimating results. However, in the case of strong noise, the p.d.f. was complex and did not belong to any type of common distribution function. Nonparametric methods, therefore, were needed. As the most popular nonparametric method, kernel density estimation was introduced. In order to demonstrate the performance of the kernel density estimation methods, a numerical model was built to generate the signals of Lamb waves. Three levels of white Gaussian noise were intentionally added into the simulated signals. The estimation results showed that the nonparametric methods outperformed the empirical methods in terms of accuracy.


Sign in / Sign up

Export Citation Format

Share Document