Complementary Binary Quantization for Joint Multiple Indexing

Building multiple hash tables has been proven a successful technique for indexing massive databases, which can guarantee a desired level of overall performance. However, existing hash based multi-indexing methods suffer from the heavy redundancy, without strong table complementarity and effective hash code learning. To address the problems, this paper proposes a complementary binary quantization (CBQ) method to jointly learning multiple hash tables. It exploits the power of incomplete binary coding based on prototypes to align the original space and the Hamming space, and further utilizes the nature of multi-indexing search to jointly reduce the quantization loss based on the prototype based hash function. Our alternating optimization adaptively discovers the complementary prototype sets and the corresponding code sets of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes. Extensive experiments carried out on two popular large-scale tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed CBQ method enjoys the strong table complementarity and significantly outperforms the state-of-the-art, with up to 57.76\% performance gains relatively.

Download Full-text

A Meta-Algorithm for Improving Top-N Prediction Efficiency of Matrix Factorization Models in Collaborative Filtering

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420590077 ◽

2019 ◽

Vol 34 (03) ◽

pp. 2059007

Author(s):

A. Murat Yagci ◽

Tevfik Aytekin ◽

Fikret S. Gurgen

Keyword(s):

Collaborative Filtering ◽

Matrix Factorization ◽

Large Scale ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Space Efficiency ◽

Neighbor Search ◽

Prediction Time ◽

Low Dimensional ◽

Prediction Efficiency

Matrix factorization models often reveal the low-dimensional latent structure in high-dimensional spaces while bringing space efficiency to large-scale collaborative filtering problems. Improving training and prediction time efficiencies of these models are also important since an accurate model may raise practical concerns if it is slow to capture the changing dynamics of the system. For the training task, powerful improvements have been proposed especially using SGD, ALS, and their parallel versions. In this paper, we focus on the prediction task and combine matrix factorization with approximate nearest neighbor search methods to improve the efficiency of top-N prediction queries. Our efforts result in a meta-algorithm, MMFNN, which can employ various common matrix factorization models, drastically improve their prediction efficiency, and still perform comparably to standard prediction approaches or sometimes even better in terms of predictive power. Using various batch, online, and incremental matrix factorization models, we present detailed empirical analysis results on many large implicit feedback datasets from different application domains.

Download Full-text