matrix sketching Latest Research Papers

AbstractThe presence of imbalanced classes is more and more common in practical applications and it is known to heavily compromise the learning process. In this paper we propose a new method aimed at addressing this issue in binary supervised classification. Re-balancing the class sizes has turned out to be a fruitful strategy to overcome this problem. Our proposal performs re-balancing through matrix sketching. Matrix sketching is a recently developed data compression technique that is characterized by the property of preserving most of the linear information that is present in the data. Such property is guaranteed by the Johnson-Lindenstrauss’ Lemma (1984) and allows to embed an n-dimensional space into a reduced one without distorting, within an $$\epsilon $$ ϵ -size interval, the distances between any pair of points. We propose to use matrix sketching as an alternative to the standard re-balancing strategies that are based on random under-sampling the majority class or random over-sampling the minority one. We assess the properties of our method when combined with linear discriminant analysis (LDA), classification trees (C4.5) and Support Vector Machines (SVM) on simulated and real data. Results show that sketching can represent a sound alternative to the most widely used rebalancing methods.

Download Full-text

Sharp Asymptotics of Matrix Sketching for a Rank-One Spiked Model

10.1109/isit45174.2021.9517724 ◽

2021 ◽

Author(s):

Fumito Tagashira ◽

Tomoyuki Obuchi ◽

Toshiyuki Tanaka

Keyword(s):

Rank One ◽

Matrix Sketching

Download Full-text

FREDE

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447713 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1102-1110

Author(s):

Anton Tsitsulin ◽

Marina Munkhoeva ◽

Davide Mottin ◽

Panagiotis Karras ◽

Ivan Oseledets ◽

...

Keyword(s):

Data Science ◽

State Of The Art ◽

Graph Embedding ◽

Space Complexity ◽

Similarity Matrix ◽

Data Engineering ◽

Matrix Sketching ◽

Low Dimensional ◽

Diverse Data ◽

Embedding Methods

Low-dimensional representations, or embeddings , of a graph's nodes facilitate several practical data science and data engineering tasks. As such embeddings rely, explicitly or implicitly, on a similarity measure among nodes, they require the computation of a quadratic similarity matrix, inducing a tradeoff between space complexity and embedding quality. To date, no graph embedding work combines (i) linear space complexity, (ii) a nonlinear transform as its basis, and (iii) nontrivial quality guarantees. In this paper we introduce FREDE ( FREquent Directions Embedding ), a graph embedding based on matrix sketching that combines those three desiderata. Starting out from the observation that embedding methods aim to preserve the covariance among the rows of a similarity matrix, FREDE iteratively improves on quality while individually processing rows of a nonlinearly transformed PPR similarity matrix derived from a state-of-the-art graph embedding method and provides, at any iteration , column-covariance approximation guarantees in due course almost indistinguishable from those of the optimal approximation by SVD. Our experimental evaluation on variably sized networks shows that FREDE performs almost as well as SVD and competitively against state-of-the-art embedding methods in diverse data science tasks, even when it is based on as little as 10% of node similarities.

Download Full-text

Deep Unsupervised Active Learning via Matrix Sketching

IEEE Transactions on Image Processing ◽

10.1109/tip.2021.3124317 ◽

2021 ◽

pp. 1-1

Author(s):

Changsheng Li ◽

Rongqing Li ◽

Ye Yuan ◽

Guoren Wang ◽

Dong Xu

Keyword(s):

Active Learning ◽

Matrix Sketching

Download Full-text

Frequent Directions for Matrix Sketching with Provable Bounds: A Generalized Approach

10.31219/osf.io/yqnwp ◽

2020 ◽

Author(s):

Qianli Liao

Keyword(s):

Error Bound ◽

Additive Error ◽

Recent Approach ◽

Special Cases ◽

Matrix Sketching ◽

Practical Performance ◽

Frequent Directions

We consider the task of matrix sketching, which is obtaining a significantly smaller representation of matrix A while retaining most of its information (or in other words, approximates A well). In particular, we investigate a recent approach called Frequent Directions (FD) initially proposed by Liberty [5] in 2013, which has drawn wide attention due to its elegancy, nice theoretical guarantees and outstanding performance in practice. Two follow-up papers [3] and [2] in 2014 further refined the theoretical bounds as well as improved the practical performance. In this report, we summarize the three papers and propose a Generalized Frequent Directions (GFD) algorithm for matrix sketching, which captures all the previous FD algorithms as special cases without losing any of the theoretical bounds. Interestingly, our additive error bound seems to apply to the previously non-guaranteed well-performing heuristic iSVD.

Download Full-text

Efficient and Robust High-Dimensional Linear Contextual Bandits

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/588 ◽

2020 ◽

Author(s):

Cheng Chen ◽

Luo Luo ◽

Weinan Zhang ◽

Yong Yu ◽

Yijiang Lian

Keyword(s):

Large Scale ◽

Approximation Error ◽

High Dimensional ◽

Data Sets ◽

Sequential Decision ◽

Large Scale Data ◽

Decision Making Problem ◽

The Matrix ◽

Matrix Sketching ◽

Regret Bound

The linear contextual bandits is a sequential decision-making problem where an agent decides among sequential actions given their corresponding contexts. Since large-scale data sets become more and more common, we study the linear contextual bandits in high-dimensional situations. Recent works focus on employing matrix sketching methods to accelerating contextual bandits. However, the matrix approximation error will bring additional terms to the regret bound. In this paper we first propose a novel matrix sketching method which is called Spectral Compensation Frequent Directions (SCFD). Then we propose an efficient approach for contextual bandits by adopting SCFD to approximate the covariance matrices. By maintaining and manipulating sketched matrices, our method only needs O(md) space and O(md) updating time in each round, where d is the dimensionality of the data and m is the sketching size. Theoretical analysis reveals that our method has better regret bounds than previous methods in high-dimensional cases. Experimental results demonstrate the effectiveness of our algorithm and verify our theoretical guarantees.

Download Full-text