Locality-Sensitive Hashing Without False Negatives for $$l_p$$

2012 ◽

Vol 263-266 ◽

pp. 1341-1346 ◽

Cited By ~ 1

Author(s):

Keon Myung Lee

Keyword(s):

Euclidean Distance ◽

Experimental Studies ◽

Hash Functions ◽

Identification Problem ◽

Locality Sensitive Hashing ◽

False Negatives ◽

Map Objects ◽

Similar Pair

It is challenging to efficiently find similar pairs of objects when the number of objects is huge. The locality-sensitive hashing techniques have been developed to address this issue. They employ the hash functions to map objects into buckets, where similar objects have high chances to fall into the same buckets. This paper is concerned with a locality-sensitive hashing technique, the projection-based method, which is applicable to the Euclidean distance-based similar pair identification problem. It proposes an extended method which allows an object to be hashed to more than one bucket by introducing additional hashing functions. From the experimental studies, it has been shown that the proposed method could provide better performance compared to the projection-based method.

Download Full-text

Locality-sensitive hashing for the edit distance

Bioinformatics ◽

10.1093/bioinformatics/btz354 ◽

2019 ◽

Vol 35 (14) ◽

pp. i127-i135 ◽

Cited By ~ 6

Author(s):

Guillaume Marçais ◽

Dan DeBlasio ◽

Prashant Pandey ◽

Carl Kingsford

Keyword(s):

High Probability ◽

Edit Distance ◽

Locality Sensitive Hashing ◽

Supplementary Information ◽

Challenging Problem ◽

Jaccard Similarity ◽

Bioinformatics Pipeline ◽

High Quality ◽

False Negatives ◽

Computational Requirement

Abstract Motivation Sequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality-sensitive hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have high-quality alignment from those that may. Therefore, an LSH reduces the overall computational requirement while not introducing many false negatives (i.e. omitting to report a valid alignment). However, current LSH methods treat sequences as a bag of k-mers and do not take into account the relative ordering of k-mers in sequences. In addition, due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming similarity are used as a proxy. Results We present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is sensitive not only to the k-mer contents of the sequences but also to the relative order of the k-mers in the sequences. We present theoretical guarantees of the OMH as a gapped LSH. Availability and implementation The code to generate the results is available at http://github.com/Kingsford-Group/omhismb2019. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Locality-sensitive Hashing without False Negatives

Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms ◽

10.1137/1.9781611974331.ch1 ◽

2015 ◽

Cited By ~ 10

Author(s):

Rasmus Pagh

Keyword(s):

Locality Sensitive Hashing ◽

False Negatives

Download Full-text

Locality sensitive hashing for the edit distance

10.1101/534446 ◽

2019 ◽

Cited By ~ 2

Author(s):

Guillaume Marçais ◽

Dan DeBlasio ◽

Prashant Pandey ◽

Carl Kingsford

Keyword(s):

High Probability ◽

Edit Distance ◽

Hamming Distance ◽

Locality Sensitive Hashing ◽

Relative Order ◽

Challenging Problem ◽

Jaccard Similarity ◽

Bioinformatics Pipeline ◽

False Negatives ◽

Computational Requirement

AbstractMotivationSequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality Sensitive Hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have an alignment from those that may have an alignment. Therefore, an LSH reduces in the overall computational requirement while not introducing many false negatives (i.e., omitting to report a valid alignment). However, current LSH methods treat sequences as a bag ofk-mers and do not take into account the relative ordering ofk-mers in sequences. And due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming distance are used as a proxy.ResultsWe present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is not only sensitive to thek-mer contents of the sequences but also to the relative order of thek-mers in the sequences. We present theoretical guarantees of the OMH as a gapped [email protected],[email protected]

Download Full-text

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Computational Intelligence and Neuroscience ◽

10.1155/2015/217216 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Jingjing Wang ◽

Chen Lin

Keyword(s):

Large Scale ◽

False Negative ◽

Experimental Studies ◽

Simulated Data ◽

False Positives ◽

Locality Sensitive Hashing ◽

False Negatives ◽

Large Scale Data ◽

Similarity Joins ◽

Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.

Download Full-text

Exploring the Validity and Operational Impact of Using Allied Health Assistants to Conduct Dysphagia Screening for Low-Risk Patients Within the Acute Hospital Setting

American Journal of Speech-Language Pathology ◽

10.1044/2020_ajslp-19-00060 ◽

2020 ◽

Vol 29 (4) ◽

pp. 1944-1955 ◽

Cited By ~ 1

Author(s):

Maria Schwarz ◽

Elizabeth C. Ward ◽

Petrea Cornwell ◽

Anne Coccetti ◽

Pamela D'Netto ◽

...

Keyword(s):

At Risk ◽

Allied Health ◽

Assessment Tool ◽

Hospital Setting ◽

Low Risk ◽

False Negatives ◽

Exact Agreement ◽

Risk Patients ◽

Impact Phase ◽

Operational Impact

Purpose The purpose of this study was to examine (a) the agreement between allied health assistants (AHAs) and speech-language pathologists (SLPs) when completing dysphagia screening for low-risk referrals and at-risk patients under a delegation model and (b) the operational impact of this delegation model. Method All AHAs worked in the adult acute inpatient settings across three hospitals and completed training and competency evaluation prior to conducting independent screening. Screening (pass/fail) was based on results from pre-screening exclusionary questions in combination with a water swallow test and the Eating Assessment Tool. To examine the agreement of AHAs' decision making with SLPs, AHAs ( n = 7) and SLPs ( n = 8) conducted an independent, simultaneous dysphagia screening on 51 adult inpatients classified as low-risk/at-risk referrals. To examine operational impact, AHAs independently completed screening on 48 low-risk/at-risk patients, with subsequent clinical swallow evaluation conducted by an SLP with patients who failed screening. Results Exact agreement between AHAs and SLPs on overall pass/fail screening criteria for the first 51 patients was 100%. Exact agreement for the two tools was 100% for the Eating Assessment Tool and 96% for the water swallow test. In the operational impact phase ( n = 48), 58% of patients failed AHA screening, with only 10% false positives on subjective SLP assessment and nil identified false negatives. Conclusion AHAs demonstrated the ability to reliably conduct dysphagia screening on a cohort of low-risk patients, with a low rate of false negatives. Data support high level of agreement and positive operational impact of using trained AHAs to perform dysphagia screening in low-risk patients.

Download Full-text

Clinically Meaningful Change

Methodology ◽

10.1027/1614-2241/a000168 ◽

2019 ◽

Vol 15 (3) ◽

pp. 97-105

Author(s):

Rodrigo Ferrer ◽

Antonio Pardo

Keyword(s):

Effect Size ◽

False Negative ◽

False Negative Rate ◽

Point Of View ◽

Skewed Distribution ◽

Effect Sizes ◽

False Negatives ◽

Large Size ◽

Before And After ◽

Post Test

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.

Download Full-text

Sky Segmentation for Enhanced Depth Reconstruction and Bokeh Rendering with Efficient Architectures

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.14.coimg-378 ◽

2020 ◽

Vol 2020 (14) ◽

pp. 378-1-378-7

Author(s):

Tyler Nuanes ◽

Matt Elsey ◽

Radek Grzeszczuk ◽

John Paul Shen

Keyword(s):

Real Time ◽

Mobile Device ◽

Computational Cost ◽

False Positives ◽

Compact Model ◽

High Quality ◽

False Negatives ◽

Trade Off ◽

Depth Reconstruction ◽

Binary Classifiers

We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.

Download Full-text

A Recommender System for Inverse Design of Polycarbonates and Polyesters

10.26434/chemrxiv.12679031 ◽

2020 ◽

Author(s):

Nathaniel Park ◽

Dmitry Yu. Zubarev ◽

James L. Hedrick ◽

Vivien Kiyek ◽

Christiaan Corbet ◽

...

Keyword(s):

Ring Opening ◽

High Performance ◽

Recommendation System ◽

Polymeric Materials ◽

Monomer Conversion ◽

Design Strategy ◽

Ring Opening Polymerization ◽

Inverse Design ◽

False Negatives ◽

Accelerate Development

The convergence of artificial intelligence and machine learning with material science holds significant promise to rapidly accelerate development timelines of new high-performance polymeric materials. Within this context, we report an inverse design strategy for polycarbonate and polyester discovery based on a recommendation system that proposes polymerization experiments that are likely to produce materials with targeted properties. Following recommendations of the system driven by the historical ring-opening polymerization results, we carried out experiments targeting specific ranges of monomer conversion and dispersity of the polymers obtained from cyclic lactones and carbonates. The results of the experiments were in close agreement with the recommendation targets with few false negatives or positives obtained for each class.<br>

Download Full-text

Automatic Extraction of Acronyms from Text

10.26686/wgtn.12922298 ◽

2020 ◽

Author(s):

Stuart Yeates

Keyword(s):

Digital Library ◽

False Positives ◽

Automatic Extraction ◽

False Negatives ◽

Library Research ◽

Communications Theory ◽

Textual Content

A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives. Introduction Digital library research seeks to build tools to enable access of content, while making as few as possible assumptions about the content, since assumptions limit the range of applicability of the tools. Generally, the broader the assumptions the more widely applicable the tools. For example, keyword based indexing [5] is based on communications theory and applies to all natural human textual languages (allowances for differences in character sets and similar localisation issues not withstanding) . The algorithm described in this paper makes much stronger assumptions about the content. It assumes textual content that contains acronyms, an assumption which is known to hold for...

Download Full-text

Locality-Sensitive Hashing Without False Negatives for $$l_p$$

A Projection-Based Locality-Sensitive Hashing Technique for Reducing False Negatives

Locality-sensitive hashing for the edit distance

Locality-sensitive Hashing without False Negatives

Locality sensitive hashing for the edit distance

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Exploring the Validity and Operational Impact of Using Allied Health Assistants to Conduct Dysphagia Screening for Low-Risk Patients Within the Acute Hospital Setting

Clinically Meaningful Change

Sky Segmentation for Enhanced Depth Reconstruction and Bokeh Rendering with Efficient Architectures

A Recommender System for Inverse Design of Polycarbonates and Polyesters

Automatic Extraction of Acronyms from Text

Export Citation Format