Fair near neighbor search via sampling

2021 ◽  
Vol 50 (1) ◽  
pp. 42-49
Author(s):  
Martin Aumuller ◽  
Sariel Har-Peled ◽  
Sepideh Mahabadi ◽  
Rasmus Pagh ◽  
Francesco Silvestri

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the rnear neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee.

2009 ◽  
Vol 130 (13) ◽  
pp. 134102 ◽  
Author(s):  
Sandeep Somani ◽  
Benjamin J. Killian ◽  
Michael K. Gilson

Biometrika ◽  
2019 ◽  
Vol 106 (4) ◽  
pp. 781-801 ◽  
Author(s):  
Miles E Lopes ◽  
Andrew Blandino ◽  
Alexander Aue

Summary Statistics derived from the eigenvalues of sample covariance matrices are called spectral statistics, and they play a central role in multivariate testing. Although bootstrap methods are an established approach to approximating the laws of spectral statistics in low-dimensional problems, such methods are relatively unexplored in the high-dimensional setting. The aim of this article is to focus on linear spectral statistics as a class of prototypes for developing a new bootstrap in high dimensions, a method we refer to as the spectral bootstrap. In essence, the proposed method originates from the parametric bootstrap and is motivated by the fact that in high dimensions it is difficult to obtain a nonparametric approximation to the full data-generating distribution. From a practical standpoint, the method is easy to use and allows the user to circumvent the difficulties of complex asymptotic formulas for linear spectral statistics. In addition to proving the consistency of the proposed method, we present encouraging empirical results in a variety of settings. Lastly, and perhaps most interestingly, we show through simulations that the method can be applied successfully to statistics outside the class of linear spectral statistics, such as the largest sample eigenvalue and others.


Sign in / Sign up

Export Citation Format

Share Document