scholarly journals Bounds for discrepancies in the Hamming space

2021 ◽  
pp. 101552
Author(s):  
Alexander Barg ◽  
Maxim Skriganov
Keyword(s):  
Author(s):  
Zhansheng Jiang ◽  
Lingxi Xie ◽  
Xiaotie Deng ◽  
Weiwei Xu ◽  
Jingdong Wang

2021 ◽  
pp. 772-784
Author(s):  
Lifang Wu ◽  
Yukun Chen ◽  
Wenjin Hu ◽  
Ge Shi
Keyword(s):  

Author(s):  
Vladimir Mic ◽  
Pavel Zezula

This chapter focuses on data searching, which is nowadays mostly based on similarity. The similarity search is challenging due to its computational complexity, and also the fact that similarity is subjective and context dependent. The authors assume the metric space model of similarity, defined by the domain of objects and the metric function that measures the dissimilarity of object pairs. The volume of contemporary data is large, and the time efficiency of similarity query executions is essential. This chapter investigates transformations of metric space to Hamming space to decrease the memory and computational complexity of the search. Various challenges of the similarity search with sketches in the Hamming space are addressed, including the definition of sketching transformation and efficient search algorithms that exploit sketches to speed-up searching. The indexing of Hamming space and a heuristic to facilitate the selection of a suitable sketching technique for any given application are also considered.


Author(s):  
Jianbin Qin ◽  
Chuan Xiao ◽  
Yaoshu Wang ◽  
Wei Wang ◽  
Xuemin Lin ◽  
...  

2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
David S. Campo ◽  
Yury Khudyakov

Abstract Background In molecular epidemiology, comparison of intra-host viral variants among infected persons is frequently used for tracing transmissions in human population and detecting viral infection outbreaks. Application of Ultra-Deep Sequencing (UDS) immensely increases the sensitivity of transmission detection but brings considerable computational challenges when comparing all pairs of sequences. We developed a new population comparison method based on convex hulls in hamming space. We applied this method to a large set of UDS samples obtained from unrelated cases infected with hepatitis C virus (HCV) and compared its performance with three previously published methods. Results The convex hull in hamming space is a data structure that provides information on: (1) average hamming distance within the set, (2) average hamming distance between two sets; (3) closeness centrality of each sequence; and (4) lower and upper bound of all the pairwise distances among the members of two sets. This filtering strategy rapidly and correctly removes 96.2% of all pairwise HCV sample comparisons, outperforming all previous methods. The convex hull distance (CHD) algorithm showed variable performance depending on sequence heterogeneity of the studied populations in real and simulated datasets, suggesting the possibility of using clustering methods to improve the performance. To address this issue, we developed a new clustering algorithm, k-hulls, that reduces heterogeneity of the convex hull. This efficient algorithm is an extension of the k-means algorithm and can be used with any type of categorical data. It is 6.8-times more accurate than k-mode, a previously developed clustering algorithm for categorical data. Conclusions CHD is a fast and efficient filtering strategy for massively reducing the computational burden of pairwise comparison among large samples of sequences, and thus, aiding the calculation of transmission links among infected individuals using threshold-based methods. In addition, the convex hull efficiently obtains important summary metrics for intra-host viral populations.


2014 ◽  
Vol 651-653 ◽  
pp. 2168-2171
Author(s):  
Qin Zhen Guo ◽  
Zhi Zeng ◽  
Shu Wu Zhang ◽  
Yuan Zhang ◽  
Gui Xuan Zhang

Hashing which maps data into binary codes in Hamming space has attracted more and more attentions for approximate nearest neighbor search due to its high efficiency and reduced storage cost. K-means hashing (KH) is a novel hashing method which firstly quantizes the data by codewords and then uses the indices of codewords to encode the data. However, in KH, only the codewords are updated to minimize the quantization error and affinity error while the indices of codewords remain the same after they are initialized. In this paper, we propose an optimized k-means hashing (OKH) method to encode data by binary codes. In our method, we simultaneously optimize the codewords and the indices of them to minimize the quantization error and the affinity error. Our OKH method can find both the optimal codewords and the optiaml indices, and the resulting binary codes in Hamming space can better preserve the original neighborhood structure of the data. Besides, OKH can further be generalized to a product space. Extensive experiments have verified the superiority of OKH over KH and other state-of-the-art hashing methods.


Sign in / Sign up

Export Citation Format

Share Document