locality sensitive hashing Latest Research Papers

High-scale random access on DNA storage systems

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab126 ◽

2022 ◽

Vol 4 (1) ◽

Author(s):

Alex El-Shaikh ◽

Marius Welzel ◽

Dominik Heider ◽

Bernhard Seeger

Keyword(s):

Storage Systems ◽

High Capacity ◽

Random Access ◽

General Purpose ◽

Information Storage ◽

Locality Sensitive Hashing ◽

Probe Design ◽

Dna Storage ◽

Data Objects ◽

Dna Pool

ABSTRACT Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g. ≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.

P-QALSH: Parallelizing Query Aware Locality-Sensitive Hashing for Big Data

10.1109/bigdata52589.2021.9671881 ◽

2021 ◽

Author(s):

Yikai Huang ◽

Zhili Yao ◽

Jianlin Feng

Keyword(s):

Big Data ◽

Locality Sensitive Hashing

Time-Aware Cross-Platform IoT Service Recommendation with Privacy Preservation

Security and Communication Networks ◽

10.1155/2021/5648168 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Can Zhang ◽

Junhua Wu ◽

Chao Yan ◽

Guangshun Li

Keyword(s):

Real World ◽

Data Privacy ◽

Privacy Preservation ◽

Locality Sensitive Hashing ◽

Service Recommendation ◽

Cross Platform ◽

Time Aware ◽

Temporal Feature

IoT service recommendation techniques can help a user select appropriate IoT services efficiently. Aiming at improving the recommendation efficiency and preserving the data privacy, the locality-sensitive hashing (LSH) technique is adopted in service recommendation. However, existing LSH-based service recommendation methods ignore the intrinsic temporal feature of IoT services. In light of this challenge, we integrate the temporal feature into the conventional LSH-based method and present a time-aware approach with the capability of privacy preservation for IoT service recommendation across multiple platforms. Experiments on a real-world dataset are conducted to validate the advantage of our proposed approach in terms of accuracy and efficiency in recommendation.

Splitting chemical structure data sets for federated privacy-preserving machine learning

Journal of Cheminformatics ◽

10.1186/s13321-021-00576-2 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Jaak Simm ◽

Lina Humbeck ◽

Adam Zalewski ◽

Noe Sturm ◽

Wouter Heyndrickx ◽

...

Keyword(s):

Machine Learning ◽

Quality Criteria ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Data Sets ◽

Data Set ◽

Test Set ◽

Chemical Structures ◽

Multiple Partners ◽

Applications Of Machine Learning

AbstractWith the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

Using Inverted Index for Fingerprint Search

Journal of Information and Data Management ◽

10.5753/jidm.2021.1918 ◽

2021 ◽

Vol 12 (5) ◽

Author(s):

Johnny Marcos S. Soares ◽

Luciano Barbosa ◽

Paulo Antonio Leal Rego ◽

Regis Pires Magalhães ◽

Jose Antônio F. de Macêdo

Keyword(s):

Information Retrieval ◽

Penetration Rate ◽

Locality Sensitive Hashing ◽

Inverted Index ◽

Text Documents ◽

Data Set ◽

Textual Information ◽

Data Indexing ◽

Biometric Information ◽

Fingerprint Data

Fingerprints are the most used biometric information for identifying people. With the increase in fingerprint data, indexing techniques are essential to perform an efficient search. In this work, we devise a solution that applies traditional inverted index, widely used in textual information retrieval, for fingerprint search. For that, it first converts fingerprints to text documents using techniques, such as Minutia Cylinder-Code and Locality-Sensitive Hashing, and then indexes them in inverted files. In the experimental evaluation, our approach obtained 0.42% of error rate with 10% of penetration rate in the FVC2002 DB1a data set, surpassing some established methods.

Splitting chemical structure data sets for federated privacy-preserving machine learning

10.33774/chemrxiv-2021-xd440-v3 ◽

2021 ◽

Author(s):

Jaak Simm ◽

Lina Humbeck ◽

Adam Zalewski ◽

Noe Sturm ◽

Wouter Heyndrickx ◽

...

Keyword(s):

Machine Learning ◽

Quality Criteria ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Data Sets ◽

Data Set ◽

Test Set ◽

Chemical Structures ◽

Multiple Partners ◽

Applications Of Machine Learning

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant,but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties in the federated learning. In this work we discuss three methods which provide a splitting of the data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria: bias in prediction performance, label and data imbalance, distance of the test set compounds to the training set and compare them to a random splitting. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

VBLSH: Volume-Balancing Locality-Sensitive Hashing Algorithm for K-Nearest Neighbors Search

Information Sciences ◽

10.1016/j.ins.2021.11.006 ◽

2021 ◽

Author(s):

Shi Zhang ◽

Huixia Lai ◽

Weilin Chen ◽

Lulu Zhang ◽

Xinhong Lin ◽

...

Keyword(s):

Nearest Neighbors ◽

Locality Sensitive Hashing ◽

K Nearest Neighbors ◽

Hashing Algorithm

Splitting chemical structure data sets for federated privacy-preserving machine learning

10.33774/chemrxiv-2021-xd440-v2 ◽

2021 ◽

Author(s):

Jaak Simm ◽

Lina Humbeck ◽

Adam Zalewski ◽

Noe Sturm ◽

Wouter Heyndrickx ◽

...

Keyword(s):

Machine Learning ◽

Quality Criteria ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Data Sets ◽

Data Set ◽

Test Set ◽

Chemical Structures ◽

Multiple Partners ◽

Applications Of Machine Learning

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant,but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties in the federated learning. In this work we discuss three methods which provide a splitting of the data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria: bias in prediction performance, label and data imbalance, distance of the test set compounds to the training set and compare them to a random splitting. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

Kinematic-Based Classification of Social Gestures and Grasping by Humans and Machine Learning Techniques

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.699505 ◽

2021 ◽

Vol 8 ◽

Author(s):

Paul Hemeren ◽

Peter Veto ◽

Serge Thill ◽

Cai Li ◽

Jiong Sun

Keyword(s):

Machine Learning ◽

Strong Association ◽

Critical Role ◽

Locality Sensitive Hashing ◽

Machine Learning Techniques ◽

Support Vector ◽

Social Quality ◽

Learning Techniques ◽

The Social

The affective motion of humans conveys messages that other humans perceive and understand without conventional linguistic processing. This ability to classify human movement into meaningful gestures or segments plays also a critical role in creating social interaction between humans and robots. In the research presented here, grasping and social gesture recognition by humans and four machine learning techniques (k-Nearest Neighbor, Locality-Sensitive Hashing Forest, Random Forest and Support Vector Machine) is assessed by using human classification data as a reference for evaluating the classification performance of machine learning techniques for thirty hand/arm gestures. The gestures are rated according to the extent of grasping motion on one task and the extent to which the same gestures are perceived as social according to another task. The results indicate that humans clearly rate differently according to the two different tasks. The machine learning techniques provide a similar classification of the actions according to grasping kinematics and social quality. Furthermore, there is a strong association between gesture kinematics and judgments of grasping and the social quality of the hand/arm gestures. Our results support previous research on intention-from-movement understanding that demonstrates the reliance on kinematic information for perceiving the social aspects and intentions in different grasping actions as well as communicative point-light actions.

A Clustering-based Method for Business Hall Efficiency Analysis

Scientific Programming ◽

10.1155/2021/7622576 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Tianlin Huang ◽

Ning Wang

Keyword(s):

Empirical Study ◽

Prediction Method ◽

Final Analysis ◽

Locality Sensitive Hashing ◽

Characteristic Analysis ◽

Decision Optimization ◽

Economic Operation ◽

Initial Cluster ◽

Load Intensity ◽

Stability And Accuracy

Excessive or insufficient business hall resources may result in unreasonable resource allocation, adversely affecting the value of an entity business hall. Therefore, proper characteristic parameters are the key factors for analyzing the business hall, which strongly affect the final analysis results. In this study, a characteristic analysis method for the economic operation of a business hall is developed and the feature engineering is established. Because of its simplicity and versatility, the k -means algorithm has been widely used since it was first proposed around 50 years ago. However, the classical k -means algorithm has poor stability and accuracy. In particular, it is difficult to achieve a suitable balance between of the centroid initialization and the clustering number k . We propose a new initialization (LSH- k -means) algorithm for k -means clustering. This algorithms is mainly based on locality-sensitive hashing (LSH) as an index for computing the initial cluster centroids, and it reduces the range of the clustering number. Furthermore, an empirical study is conducted. According to the load intensity and time change of the business hall, an index system reflecting the optimization analysis of the business hall is established, and the LSH- k -means algorithm is used to analyze the economic operation of the business hall. The results of the empirical study show that the LSH- k -means that the clustering method outperforms the direct prediction method, provides expected analysis results as well as decision optimization recommendations for the business hall, and serves as a basis for the optimal layout of the business hall.

locality sensitive hashing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

High-scale random access on DNA storage systems

P-QALSH: Parallelizing Query Aware Locality-Sensitive Hashing for Big Data

Time-Aware Cross-Platform IoT Service Recommendation with Privacy Preservation

Splitting chemical structure data sets for federated privacy-preserving machine learning

Using Inverted Index for Fingerprint Search

Splitting chemical structure data sets for federated privacy-preserving machine learning

VBLSH: Volume-Balancing Locality-Sensitive Hashing Algorithm for K-Nearest Neighbors Search

Splitting chemical structure data sets for federated privacy-preserving machine learning

Kinematic-Based Classification of Social Gestures and Grasping by Humans and Machine Learning Techniques

A Clustering-based Method for Business Hall Efficiency Analysis

Export Citation Format

locality sensitive hashingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

High-scale random access on DNA storage systems

P-QALSH: Parallelizing Query Aware Locality-Sensitive Hashing for Big Data

Time-Aware Cross-Platform IoT Service Recommendation with Privacy Preservation

Splitting chemical structure data sets for federated privacy-preserving machine learning

Using Inverted Index for Fingerprint Search

Splitting chemical structure data sets for federated privacy-preserving machine learning

VBLSH: Volume-Balancing Locality-Sensitive Hashing Algorithm for K-Nearest Neighbors Search

Splitting chemical structure data sets for federated privacy-preserving machine learning

Kinematic-Based Classification of Social Gestures and Grasping by Humans and Machine Learning Techniques

A Clustering-based Method for Business Hall Efficiency Analysis

locality sensitive hashing
Recently Published Documents