AbstractWe present a data-adaptive multivariate histogram estimator of an unknown density f based on n independent samples from it. Such histograms are based on binary trees called regular pavings (RPs). RPs represent a computationally convenient class of simple functions that remain closed under addition and scalar multiplication. Unlike other density estimation methods, including various regularization and Bayesian methods based on the likelihood, the minimum distance estimate (MDE) is guaranteed to be within an $$L_1$$
L
1
distance bound from f for a given n, no matter what the underlying f happens to be, and is thus said to have universal performance guarantees (Devroye and Lugosi, Combinatorial methods in density estimation. Springer, New York, 2001). Using a form of tree matrix arithmetic with RPs, we obtain the first generic constructions of an MDE, prove that it has universal performance guarantees and demonstrate its performance with simulated and real-world data. Our main contribution is a constructive implementation of an MDE histogram that can handle large multivariate data bursts using a tree-based partition that is computationally conducive to subsequent statistical operations.