Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search

Xinyan Dai; Xiao Yan; Kelvin K. W. Ng; Jiu Liu; James Cheng

doi:10.1609/aaai.v34i01.5333

Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5333 ◽

2020 ◽

Vol 34 (01) ◽

pp. 51-58 ◽

Cited By ~ 1

Author(s):

Xinyan Dai ◽

Xiao Yan ◽

Kelvin K. W. Ng ◽

Jiu Liu ◽

James Cheng

Keyword(s):

Data Compression ◽

Vector Quantization ◽

Similarity Search ◽

Euclidean Distance ◽

Quantization Error ◽

Experimental Results ◽

Inner Product ◽

Direction Error ◽

Product Search ◽

Norm Error

Vector quantization (VQ) techniques are widely used in similarity search for data compression, computation acceleration and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly minimize the quantization error. In this paper, we present a new angle to analyze the quantization error, which decomposes the quantization error into norm error and direction error. We show that quantization errors in norm have much higher influence on inner products than quantization errors in direction, and small quantization error does not necessarily lead to good performance in maximum inner product search (MIPS). Based on this observation, we propose norm-explicit quantization (NEQ) — a general paradigm that improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the direction vectors, NEQ can simply reuse an existing VQ technique to quantize them without modification. We conducted extensive experiments on a variety of datasets and parameter configurations. The experimental results show that NEQ improves the performance of various VQ techniques for MIPS, including PQ, OPQ, RQ and AQ.

Download Full-text

LSF Vector Quantizer for 2.4kb/s Codec Based on Speech Unvoiced/Voiced Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2185 ◽

2014 ◽

Vol 644-650 ◽

pp. 2185-2188

Author(s):

Qiang Li ◽

Xiao Hong Zhang ◽

Qing Yu Niu

Keyword(s):

Vector Quantization ◽

Error Propagation ◽

Quantization Error ◽

Experimental Results ◽

Bit Rate ◽

Vector Quantizer

In order to reduce bit-rate and still maintain fine distortion performance,this paper proposed a method of LSF quantization based on speech unvoiced/voiced classification. This method use differential LSF parameters from unvoiced/voiced database to train codebook. And using this method can suppress the quantization error propagation caused by directly vector quantization of LSF parameters. Experimental results show that using this method to quantify LSF will have a better quality while allocating the same number of bits.

Download Full-text

Fusion of dynamic predictive block adaptive quantization and vector quantization for staggered SAR data compression

Remote Sensing Letters ◽

10.1080/2150704x.2020.1851796 ◽

2020 ◽

Vol 12 (2) ◽

pp. 206-215

Author(s):

Hang Zou ◽

Fengjun Zhao ◽

Xiaoxue Jia ◽

Heng Zhang ◽

Wei Wang

Keyword(s):

Data Compression ◽

Vector Quantization ◽

Adaptive Quantization ◽

Sar Data

Download Full-text

Impact of Lossy Data Compression Using Vector Quantization on the Retrieval of Surface Reflectance fromcasiImaging Spectrometry Data: A Case Study

Canadian Journal of Remote Sensing ◽

10.1080/07038992.2001.10854915 ◽

2001 ◽

Vol 27 (1) ◽

pp. 1-19 ◽

Cited By ~ 5

Author(s):

B. Hu ◽

S.-E. Qian ◽

A.B. Hollinger

Keyword(s):

Data Compression ◽

Vector Quantization ◽

Surface Reflectance

Download Full-text

Palm Vein Recognition Based on Independent Component Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.333-335.1106 ◽

2013 ◽

Vol 333-335 ◽

pp. 1106-1109

Author(s):

Wei Wu

Keyword(s):

Pattern Recognition ◽

Independent Component Analysis ◽

Euclidean Distance ◽

Recognition Performance ◽

Component Analysis ◽

Independent Component ◽

Experimental Results ◽

Vein Pattern ◽

Vein Recognition ◽

Palm Vein

Palm vein pattern recognition is one of the newest biometric techniques researched today. This paper proposes project the palm vein image matrix based on independent component analysis directly, then calculates the Euclidean distance of the projection matrix, seeks the nearest distance for classification. The experiment has been done in a self-build palm vein database. Experimental results show that the algorithm of independent component analysis is suitable for palm vein recognition and the recognition performance is practical.

Download Full-text

A Fast Encoding Method for Vector Quantization by using Multi Euclidean Distance Estimations

Intelligent Automation & Soft Computing ◽

10.1080/10798587.2004.10642873 ◽

2004 ◽

Vol 10 (2) ◽

pp. 167-174

Author(s):

Zhibin Pan ◽

Tadahiro Ohmi ◽

Koji Kotani

Keyword(s):

Vector Quantization ◽

Euclidean Distance ◽

Fast Encoding ◽

Encoding Method

Download Full-text

Learning Sparse Binary Code for Maximum Inner Product Search

10.1145/3459637.3482132 ◽

2021 ◽

Author(s):

Changyi Ma ◽

Fangchen Yu ◽

Yueyao Yu ◽

Wenye Li

Keyword(s):

Binary Code ◽

Inner Product ◽

Product Search

Download Full-text

Quality driven gold washing adaptive vector quantization and its application to ECG data compression

IEEE Transactions on Biomedical Engineering ◽

10.1109/10.821761 ◽

2000 ◽

Vol 47 (2) ◽

pp. 209-218 ◽

Cited By ~ 29

Author(s):

Shaou-Gang Miaou ◽

Heng-Lin Yen

Keyword(s):

Data Compression ◽

Vector Quantization ◽

Adaptive Vector Quantization ◽

Ecg Data Compression ◽

Ecg Data

Download Full-text

Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-67658-2_25 ◽

2021 ◽

pp. 439-455

Author(s):

Stephan S. Lorenzen ◽

Ninh Pham

Keyword(s):

Inner Product ◽

Product Search

Download Full-text

Constrained K-Means Classification

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2149 ◽

2018 ◽

Vol 8 (4) ◽

pp. 3203-3208

Author(s):

P. N. Smyrlis ◽

D. C. Tsouros ◽

M. G. Tsipouras

Keyword(s):

Euclidean Distance ◽

Similarity Criterion ◽

Experimental Results ◽

Classification Tasks

Classification-via-clustering (CvC) is a widely used method, using a clustering procedure to perform classification tasks. In this paper, a novel K-Means-based CvC algorithm is presented, analysed and evaluated. Two additional techniques are employed to reduce the effects of the limitations of K-Means. A hypercube of constraints is defined for each centroid and weights are acquired for each attribute of each class, for the use of a weighted Euclidean distance as a similarity criterion in the clustering procedure. Experiments are made with 42 well–known classification datasets. The experimental results demonstrate that the proposed algorithm outperforms CvC with simple K-Means.

Download Full-text

A Boosting Framework of Factorization Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421590369 ◽

2021 ◽

pp. 2159036

Author(s):

Jun Zhou ◽

Longfei Li ◽

Ziqi Liu ◽

Chaochao Chen

Keyword(s):

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Low Rank ◽

Inner Product ◽

Adaptive Boosting ◽

Rank Matrix ◽

Factorization Machine ◽

Fixed Rank ◽

Low Rank Matrix

Recently, Factorization Machine (FM) has become more and more popular for recommendation systems due to its effectiveness in finding informative interactions between features. Usually, the weights for the interactions are learned as a low rank weight matrix, which is formulated as an inner product of two low rank matrices. This low rank matrix can help improve the generalization ability of Factorization Machine. However, to choose the rank properly, it usually needs to run the algorithm for many times using different ranks, which clearly is inefficient for some large-scale datasets. To alleviate this issue, we propose an Adaptive Boosting framework of Factorization Machine (AdaFM), which can adaptively search for proper ranks for different datasets without re-training. Instead of using a fixed rank for FM, the proposed algorithm will gradually increase its rank according to its performance until the performance does not grow. Extensive experiments are conducted to validate the proposed method on multiple large-scale datasets. The experimental results demonstrate that the proposed method can be more effective than the state-of-the-art Factorization Machines.

Download Full-text