A Projection-Based Locality-Sensitive Hashing Technique for Reducing False Negatives

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1341 ◽

2012 ◽

Vol 263-266 ◽

pp. 1341-1346 ◽

Cited By ~ 1

Author(s):

Keon Myung Lee

Keyword(s):

Euclidean Distance ◽

Experimental Studies ◽

Hash Functions ◽

Identification Problem ◽

Locality Sensitive Hashing ◽

False Negatives ◽

Map Objects ◽

Similar Pair

It is challenging to efficiently find similar pairs of objects when the number of objects is huge. The locality-sensitive hashing techniques have been developed to address this issue. They employ the hash functions to map objects into buckets, where similar objects have high chances to fall into the same buckets. This paper is concerned with a locality-sensitive hashing technique, the projection-based method, which is applicable to the Euclidean distance-based similar pair identification problem. It proposes an extended method which allows an object to be hashed to more than one bucket by introducing additional hashing functions. From the experimental studies, it has been shown that the proposed method could provide better performance compared to the projection-based method.

Download Full-text

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Computational Intelligence and Neuroscience ◽

10.1155/2015/217216 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Jingjing Wang ◽

Chen Lin

Keyword(s):

Large Scale ◽

False Negative ◽

Experimental Studies ◽

Simulated Data ◽

False Positives ◽

Locality Sensitive Hashing ◽

False Negatives ◽

Large Scale Data ◽

Similarity Joins ◽

Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.

Download Full-text

Similar pair identification using locality-sensitive hashing technique

The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems ◽

10.1109/scis-isis.2012.6505385 ◽

2012 ◽

Cited By ~ 6

Author(s):

Kyung Mi Lee ◽

Keon Myung Lee

Keyword(s):

Locality Sensitive Hashing ◽

Similar Pair

Download Full-text

Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5889 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4594-4601

Author(s):

Dung D. Le ◽

Hady W. Lauw

Keyword(s):

Nearest Neighbor ◽

Hash Functions ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Neighbor Search ◽

Data Points ◽

Recommendation Accuracy ◽

Prohibitive Cost ◽

The Cost

Locality Sensitive Hashing (LSH) has become one of the most commonly used approximate nearest neighbor search techniques to avoid the prohibitive cost of scanning through all data points. For recommender systems, LSH achieves efficient recommendation retrieval by encoding user and item vectors into binary hash codes, reducing the cost of exhaustively examining all the item vectors to identify the top-k items. However, conventional matrix factorization models may suffer from performance degeneration caused by randomly-drawn LSH hash functions, directly affecting the ultimate quality of the recommendations. In this paper, we propose a framework named øurmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing. Experiments on publicly available datasets show that the proposed framework not only effectively learns user's preferences for prediction, but also achieves high compatibility with LSH stochasticity, producing superior post-LSH indexing performances as compared to state-of-the-art baselines.

Download Full-text

Efficient encrypted image retrieval in IoT-cloud with multi-user authentication

International Journal of Distributed Sensor Networks ◽

10.1177/1550147718761814 ◽

2018 ◽

Vol 14 (2) ◽

pp. 155014771876181 ◽

Cited By ~ 3

Author(s):

Mustafa A Al Sibahee ◽

Songfeng Lu ◽

Zaid Ameen Abduljabbar ◽

Ayad Ibrahim ◽

Zaid Alaa Hussien ◽

...

Keyword(s):

Euclidean Distance ◽

User Authentication ◽

Locality Sensitive Hashing ◽

Security And Privacy ◽

Smart Devices ◽

Encrypted Data ◽

Minimum Number ◽

Search Speed ◽

Encrypted Images ◽

And Storage

Encryption is one of the best methods to safeguard the security and privacy of an image. However, looking through encrypted data is difficult. A number of techniques for searching encrypted data have been devised. However, certain security solutions may not be used in smart devices in IoT-cloud because such solutions are not lightweight. In this article, we present a lightweight scheme that can enable a content-based search through encrypted images. In particular, images are represented using local features. We develop and validate a secure scheme for measuring the Euclidean distance between two feature vectors. In addition, we use a hashing method, namely, locality-sensitive hashing, to devise the searchable index. The use of an locality-sensitive hashing index increases the proficiency and effectiveness of a system, thereby allowing the retrieval of only relevant images with a minimum number of distance evaluations. Refining vector techniques are used to refine relevant results efficiently and securely. Our index construction process ensures that stored data and trapdoors are kept private. Our system also efficiently supports multi-user authentication by avoiding the expensive traditional method, which enables data owners to define who can search for a specific image. Compared with other similarity-based encryption methods predicated upon searchability, the option presented in this study offers superior search speed and storage efficiency.

Download Full-text

Locality-sensitive hashing for the edit distance

Bioinformatics ◽

10.1093/bioinformatics/btz354 ◽

2019 ◽

Vol 35 (14) ◽

pp. i127-i135 ◽

Cited By ~ 6

Author(s):

Guillaume Marçais ◽

Dan DeBlasio ◽

Prashant Pandey ◽

Carl Kingsford

Keyword(s):

High Probability ◽

Edit Distance ◽

Locality Sensitive Hashing ◽

Supplementary Information ◽

Challenging Problem ◽

Jaccard Similarity ◽

Bioinformatics Pipeline ◽

High Quality ◽

False Negatives ◽

Computational Requirement

Abstract Motivation Sequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality-sensitive hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have high-quality alignment from those that may. Therefore, an LSH reduces the overall computational requirement while not introducing many false negatives (i.e. omitting to report a valid alignment). However, current LSH methods treat sequences as a bag of k-mers and do not take into account the relative ordering of k-mers in sequences. In addition, due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming similarity are used as a proxy. Results We present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is sensitive not only to the k-mer contents of the sequences but also to the relative order of the k-mers in the sequences. We present theoretical guarantees of the OMH as a gapped LSH. Availability and implementation The code to generate the results is available at http://github.com/Kingsford-Group/omhismb2019. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A joint structure of multi-distance based metric learning for person re-identification

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210505 ◽

2021 ◽

pp. 1-11

Author(s):

Bin Li ◽

Qiyu He ◽

Xiaopeng Liu ◽

Yajun Jiang ◽

Zhigang Hu

Keyword(s):

Mahalanobis Distance ◽

Euclidean Distance ◽

Distance Measure ◽

Metric Learning ◽

Identification Problem ◽

Training Dataset ◽

The Novel ◽

Learning Method ◽

Surveillance Network ◽

Joint Structure

Person re-identification problem is a valuable computer vision task, which aims at matching pedestrian images of different cameras in a non-overlapping surveillance network. Existing metric learning based methods address this problem by learning a robust distance function. These methods learn a mapping subspace by forcing the distance of the positive pair much smaller than the negative pair by a strict constraint. The metric model is over-fitting to the training dataset. Due to drastic appearance variations, the handcrafted features are weak of representation for person re-identification. To address these problems, we propose a joint distance measure based approach, which learns a Mahalanobis distance and a Euclidean distance with a novel feature jointly. The novel feature is represented with a dictionary representation based method which considers pedestrian images of different camera views with the same dictionary. The joint distance combine the Mahalanobis distance based on metric learning method with the Euclidean distance based on the novel feature to measure the similarity between matching pairs. Extensive experiments are conducted on the publicly available bench marking datasets VIPeR and CUHK01. The identification results show that the proposed method reaches a good performance than the comparison methods.

Download Full-text

Locality-Sensitive Hashing Without False Negatives for $$l_p$$

Lecture Notes in Computer Science - Computing and Combinatorics ◽

10.1007/978-3-319-42634-1_9 ◽

2016 ◽

pp. 105-118 ◽

Cited By ~ 2

Author(s):

Andrzej Pacuk ◽

Piotr Sankowski ◽

Karol Wegrzycki ◽

Piotr Wygocki

Keyword(s):

Locality Sensitive Hashing ◽

False Negatives

Download Full-text

Practical Federated Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5895 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4642-4649 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Decision Trees ◽

Differential Privacy ◽

Predictive Accuracy ◽

Homomorphic Encryption ◽

Experimental Studies ◽

Locality Sensitive Hashing ◽

Gradient Boosting ◽

Local Data ◽

Original Record ◽

Privacy Constraints

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each party, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

Download Full-text

Locality-sensitive Hashing without False Negatives

Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms ◽

10.1137/1.9781611974331.ch1 ◽

2015 ◽

Cited By ~ 10

Author(s):

Rasmus Pagh

Keyword(s):

Locality Sensitive Hashing ◽

False Negatives

Download Full-text

Locality sensitive hashing for the edit distance

10.1101/534446 ◽

2019 ◽

Cited By ~ 2

Author(s):

Guillaume Marçais ◽

Dan DeBlasio ◽

Prashant Pandey ◽

Carl Kingsford

Keyword(s):

High Probability ◽

Edit Distance ◽

Hamming Distance ◽

Locality Sensitive Hashing ◽

Relative Order ◽

Challenging Problem ◽

Jaccard Similarity ◽

Bioinformatics Pipeline ◽

False Negatives ◽

Computational Requirement

AbstractMotivationSequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality Sensitive Hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have an alignment from those that may have an alignment. Therefore, an LSH reduces in the overall computational requirement while not introducing many false negatives (i.e., omitting to report a valid alignment). However, current LSH methods treat sequences as a bag ofk-mers and do not take into account the relative ordering ofk-mers in sequences. And due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming distance are used as a proxy.ResultsWe present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is not only sensitive to thek-mer contents of the sequences but also to the relative order of thek-mers in the sequences. We present theoretical guarantees of the OMH as a gapped [email protected],[email protected]

Download Full-text