A Projection-Based Locality-Sensitive Hashing Technique for Reducing False Negatives

2012 ◽  
Vol 263-266 ◽  
pp. 1341-1346 ◽  
Author(s):  
Keon Myung Lee

It is challenging to efficiently find similar pairs of objects when the number of objects is huge. The locality-sensitive hashing techniques have been developed to address this issue. They employ the hash functions to map objects into buckets, where similar objects have high chances to fall into the same buckets. This paper is concerned with a locality-sensitive hashing technique, the projection-based method, which is applicable to the Euclidean distance-based similar pair identification problem. It proposes an extended method which allows an object to be hashed to more than one bucket by introducing additional hashing functions. From the experimental studies, it has been shown that the proposed method could provide better performance compared to the projection-based method.

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Jingjing Wang ◽  
Chen Lin

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.


2020 ◽  
Vol 34 (04) ◽  
pp. 4594-4601
Author(s):  
Dung D. Le ◽  
Hady W. Lauw

Locality Sensitive Hashing (LSH) has become one of the most commonly used approximate nearest neighbor search techniques to avoid the prohibitive cost of scanning through all data points. For recommender systems, LSH achieves efficient recommendation retrieval by encoding user and item vectors into binary hash codes, reducing the cost of exhaustively examining all the item vectors to identify the top-k items. However, conventional matrix factorization models may suffer from performance degeneration caused by randomly-drawn LSH hash functions, directly affecting the ultimate quality of the recommendations. In this paper, we propose a framework named øurmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing. Experiments on publicly available datasets show that the proposed framework not only effectively learns user's preferences for prediction, but also achieves high compatibility with LSH stochasticity, producing superior post-LSH indexing performances as compared to state-of-the-art baselines.


2018 ◽  
Vol 14 (2) ◽  
pp. 155014771876181 ◽  
Author(s):  
Mustafa A Al Sibahee ◽  
Songfeng Lu ◽  
Zaid Ameen Abduljabbar ◽  
Ayad Ibrahim ◽  
Zaid Alaa Hussien ◽  
...  

Encryption is one of the best methods to safeguard the security and privacy of an image. However, looking through encrypted data is difficult. A number of techniques for searching encrypted data have been devised. However, certain security solutions may not be used in smart devices in IoT-cloud because such solutions are not lightweight. In this article, we present a lightweight scheme that can enable a content-based search through encrypted images. In particular, images are represented using local features. We develop and validate a secure scheme for measuring the Euclidean distance between two feature vectors. In addition, we use a hashing method, namely, locality-sensitive hashing, to devise the searchable index. The use of an locality-sensitive hashing index increases the proficiency and effectiveness of a system, thereby allowing the retrieval of only relevant images with a minimum number of distance evaluations. Refining vector techniques are used to refine relevant results efficiently and securely. Our index construction process ensures that stored data and trapdoors are kept private. Our system also efficiently supports multi-user authentication by avoiding the expensive traditional method, which enables data owners to define who can search for a specific image. Compared with other similarity-based encryption methods predicated upon searchability, the option presented in this study offers superior search speed and storage efficiency.


2019 ◽  
Vol 35 (14) ◽  
pp. i127-i135 ◽  
Author(s):  
Guillaume Marçais ◽  
Dan DeBlasio ◽  
Prashant Pandey ◽  
Carl Kingsford

Abstract Motivation Sequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality-sensitive hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have high-quality alignment from those that may. Therefore, an LSH reduces the overall computational requirement while not introducing many false negatives (i.e. omitting to report a valid alignment). However, current LSH methods treat sequences as a bag of k-mers and do not take into account the relative ordering of k-mers in sequences. In addition, due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming similarity are used as a proxy. Results We present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is sensitive not only to the k-mer contents of the sequences but also to the relative order of the k-mers in the sequences. We present theoretical guarantees of the OMH as a gapped LSH. Availability and implementation The code to generate the results is available at http://github.com/Kingsford-Group/omhismb2019. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
pp. 1-11
Author(s):  
Bin Li ◽  
Qiyu He ◽  
Xiaopeng Liu ◽  
Yajun Jiang ◽  
Zhigang Hu

Person re-identification problem is a valuable computer vision task, which aims at matching pedestrian images of different cameras in a non-overlapping surveillance network. Existing metric learning based methods address this problem by learning a robust distance function. These methods learn a mapping subspace by forcing the distance of the positive pair much smaller than the negative pair by a strict constraint. The metric model is over-fitting to the training dataset. Due to drastic appearance variations, the handcrafted features are weak of representation for person re-identification. To address these problems, we propose a joint distance measure based approach, which learns a Mahalanobis distance and a Euclidean distance with a novel feature jointly. The novel feature is represented with a dictionary representation based method which considers pedestrian images of different camera views with the same dictionary. The joint distance combine the Mahalanobis distance based on metric learning method with the Euclidean distance based on the novel feature to measure the similarity between matching pairs. Extensive experiments are conducted on the publicly available bench marking datasets VIPeR and CUHK01. The identification results show that the proposed method reaches a good performance than the comparison methods.


Author(s):  
Andrzej Pacuk ◽  
Piotr Sankowski ◽  
Karol Wegrzycki ◽  
Piotr Wygocki

2020 ◽  
Vol 34 (04) ◽  
pp. 4642-4649 ◽  
Author(s):  
Qinbin Li ◽  
Zeyi Wen ◽  
Bingsheng He

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each party, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.


2019 ◽  
Author(s):  
Guillaume Marçais ◽  
Dan DeBlasio ◽  
Prashant Pandey ◽  
Carl Kingsford

AbstractMotivationSequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality Sensitive Hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have an alignment from those that may have an alignment. Therefore, an LSH reduces in the overall computational requirement while not introducing many false negatives (i.e., omitting to report a valid alignment). However, current LSH methods treat sequences as a bag ofk-mers and do not take into account the relative ordering ofk-mers in sequences. And due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming distance are used as a proxy.ResultsWe present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is not only sensitive to thek-mer contents of the sequences but also to the relative order of thek-mers in the sequences. We present theoretical guarantees of the OMH as a gapped [email protected],[email protected]


Sign in / Sign up

Export Citation Format

Share Document