scholarly journals Locality-Sensitive Hashing Without False Negatives for $$l_p$$

Author(s):  
Andrzej Pacuk ◽  
Piotr Sankowski ◽  
Karol Wegrzycki ◽  
Piotr Wygocki
2012 ◽  
Vol 263-266 ◽  
pp. 1341-1346 ◽  
Author(s):  
Keon Myung Lee

It is challenging to efficiently find similar pairs of objects when the number of objects is huge. The locality-sensitive hashing techniques have been developed to address this issue. They employ the hash functions to map objects into buckets, where similar objects have high chances to fall into the same buckets. This paper is concerned with a locality-sensitive hashing technique, the projection-based method, which is applicable to the Euclidean distance-based similar pair identification problem. It proposes an extended method which allows an object to be hashed to more than one bucket by introducing additional hashing functions. From the experimental studies, it has been shown that the proposed method could provide better performance compared to the projection-based method.


2019 ◽  
Vol 35 (14) ◽  
pp. i127-i135 ◽  
Author(s):  
Guillaume Marçais ◽  
Dan DeBlasio ◽  
Prashant Pandey ◽  
Carl Kingsford

Abstract Motivation Sequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality-sensitive hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have high-quality alignment from those that may. Therefore, an LSH reduces the overall computational requirement while not introducing many false negatives (i.e. omitting to report a valid alignment). However, current LSH methods treat sequences as a bag of k-mers and do not take into account the relative ordering of k-mers in sequences. In addition, due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming similarity are used as a proxy. Results We present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is sensitive not only to the k-mer contents of the sequences but also to the relative order of the k-mers in the sequences. We present theoretical guarantees of the OMH as a gapped LSH. Availability and implementation The code to generate the results is available at http://github.com/Kingsford-Group/omhismb2019. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Guillaume Marçais ◽  
Dan DeBlasio ◽  
Prashant Pandey ◽  
Carl Kingsford

AbstractMotivationSequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality Sensitive Hashing (LSH) is one method used to estimate the likelihood of two sequences to have a proper alignment. Using an LSH, it is possible to separate, with high probability and relatively low computation, the pairs of sequences that do not have an alignment from those that may have an alignment. Therefore, an LSH reduces in the overall computational requirement while not introducing many false negatives (i.e., omitting to report a valid alignment). However, current LSH methods treat sequences as a bag ofk-mers and do not take into account the relative ordering ofk-mers in sequences. And due to the lack of a practical LSH method for edit distance, in practice, LSH methods for Jaccard similarity or Hamming distance are used as a proxy.ResultsWe present an LSH method, called Order Min Hash (OMH), for the edit distance. This method is a refinement of the minHash LSH used to approximate the Jaccard similarity, in that OMH is not only sensitive to thek-mer contents of the sequences but also to the relative order of thek-mers in the sequences. We present theoretical guarantees of the OMH as a gapped [email protected],[email protected]


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Jingjing Wang ◽  
Chen Lin

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.


2020 ◽  
Vol 29 (4) ◽  
pp. 1944-1955 ◽  
Author(s):  
Maria Schwarz ◽  
Elizabeth C. Ward ◽  
Petrea Cornwell ◽  
Anne Coccetti ◽  
Pamela D'Netto ◽  
...  

Purpose The purpose of this study was to examine (a) the agreement between allied health assistants (AHAs) and speech-language pathologists (SLPs) when completing dysphagia screening for low-risk referrals and at-risk patients under a delegation model and (b) the operational impact of this delegation model. Method All AHAs worked in the adult acute inpatient settings across three hospitals and completed training and competency evaluation prior to conducting independent screening. Screening (pass/fail) was based on results from pre-screening exclusionary questions in combination with a water swallow test and the Eating Assessment Tool. To examine the agreement of AHAs' decision making with SLPs, AHAs ( n = 7) and SLPs ( n = 8) conducted an independent, simultaneous dysphagia screening on 51 adult inpatients classified as low-risk/at-risk referrals. To examine operational impact, AHAs independently completed screening on 48 low-risk/at-risk patients, with subsequent clinical swallow evaluation conducted by an SLP with patients who failed screening. Results Exact agreement between AHAs and SLPs on overall pass/fail screening criteria for the first 51 patients was 100%. Exact agreement for the two tools was 100% for the Eating Assessment Tool and 96% for the water swallow test. In the operational impact phase ( n = 48), 58% of patients failed AHA screening, with only 10% false positives on subjective SLP assessment and nil identified false negatives. Conclusion AHAs demonstrated the ability to reliably conduct dysphagia screening on a cohort of low-risk patients, with a low rate of false negatives. Data support high level of agreement and positive operational impact of using trained AHAs to perform dysphagia screening in low-risk patients.


Methodology ◽  
2019 ◽  
Vol 15 (3) ◽  
pp. 97-105
Author(s):  
Rodrigo Ferrer ◽  
Antonio Pardo

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.


2020 ◽  
Vol 2020 (14) ◽  
pp. 378-1-378-7
Author(s):  
Tyler Nuanes ◽  
Matt Elsey ◽  
Radek Grzeszczuk ◽  
John Paul Shen

We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.


2020 ◽  
Author(s):  
Nathaniel Park ◽  
Dmitry Yu. Zubarev ◽  
James L. Hedrick ◽  
Vivien Kiyek ◽  
Christiaan Corbet ◽  
...  

The convergence of artificial intelligence and machine learning with material science holds significant promise to rapidly accelerate development timelines of new high-performance polymeric materials. Within this context, we report an inverse design strategy for polycarbonate and polyester discovery based on a recommendation system that proposes polymerization experiments that are likely to produce materials with targeted properties. Following recommendations of the system driven by the historical ring-opening polymerization results, we carried out experiments targeting specific ranges of monomer conversion and dispersity of the polymers obtained from cyclic lactones and carbonates. The results of the experiments were in close agreement with the recommendation targets with few false negatives or positives obtained for each class.<br>


2020 ◽  
Author(s):  
Stuart Yeates

A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives. Introduction Digital library research seeks to build tools to enable access of content, while making as few as possible assumptions about the content, since assumptions limit the range of applicability of the tools. Generally, the broader the assumptions the more widely applicable the tools. For example, keyword based indexing [5] is based on communications theory and applies to all natural human textual languages (allowances for differences in character sets and similar localisation issues not withstanding) . The algorithm described in this paper makes much stronger assumptions about the content. It assumes textual content that contains acronyms, an assumption which is known to hold for...


Sign in / Sign up

Export Citation Format

Share Document