scholarly journals Efficient Adaptive Online Learning via Frequent Directions

Author(s):  
Yuanyu Wan ◽  
Nan Wei ◽  
Lijun Zhang

By employing time-varying proximal functions, adaptive subgradient methods (ADAGRAD) have improved the regret bound and been widely used in online learning and optimization. However, ADAGRAD with full matrix proximal functions (ADA-FULL) cannot deal with large-scale problems due to the impractical time and space complexities, though it has better performance when gradients are correlated. In this paper, we propose ADA-FD, an efficient variant of ADA-FULL based on a deterministic matrix sketching technique called frequent directions. Following ADA-FULL, we incorporate our ADA-FD into both primal-dual subgradient method and composite mirror descent method to develop two efficient methods. By maintaining and manipulating low-rank matrices, at each iteration, the space complexity is reduced from $O(d^2)$ to $O(\tau d)$ and the time complexity is reduced from $O(d^3)$ to $O(\tau^2d)$, where $d$ is the dimensionality of the data and $\tau \ll d$ is the sketching size. Theoretical analysis reveals that the regret of our methods is close to that of ADA-FULL as long as the outer product matrix of gradients is approximately low-rank. Experimental results show that our ADA-FD is comparable to ADA-FULL and outperforms other state-of-the-art algorithms in online convex optimization as well as in training convolutional neural networks (CNN).

2018 ◽  
Vol 58 (11) ◽  
pp. 1728-1736 ◽  
Author(s):  
A. S. Bayandina ◽  
A. V. Gasnikov ◽  
E. V. Gasnikova ◽  
S. V. Matsievskii

2016 ◽  
Vol 177 ◽  
pp. 643-650 ◽  
Author(s):  
Jueyou Li ◽  
Guo Chen ◽  
Zhaoyang Dong ◽  
Zhiyou Wu

2016 ◽  
Vol 12 (6) ◽  
pp. 1179-1197 ◽  
Author(s):  
Jueyou Li ◽  
Guoquan Li ◽  
Zhiyou Wu ◽  
Changzhi Wu

2020 ◽  
Vol 34 (04) ◽  
pp. 5093-5100
Author(s):  
Wenye Ma

This paper considers online convex optimization (OCO) problems - the paramount framework for online learning algorithm design. The loss function of learning task in OCO setting is based on streaming data so that OCO is a powerful tool to model large scale applications such as online recommender systems. Meanwhile, real-world data are usually of extreme high-dimensional due to modern feature engineering techniques so that the quadratic regression is impractical. Factorization Machine as well as its variants are efficient models for capturing feature interactions with low-rank matrix model but they can't fulfill the OCO setting due to their non-convexity. In this paper, We propose a projective quadratic regression (PQR) model. First, it can capture the import second-order feature information. Second, it is a convex model, so the requirements of OCO are fulfilled and the global optimal solution can be achieved. Moreover, existing modern online optimization methods such as Online Gradient Descent (OGD) or Follow-The-Regularized-Leader (FTRL) can be applied directly. In addition, by choosing a proper hyper-parameter, we show that it has the same order of space and time complexity as the linear model and thus can handle high-dimensional data. Experimental results demonstrate the performance of the proposed PQR model in terms of accuracy and efficiency by comparing with the state-of-the-art methods.


Complexity ◽  
2016 ◽  
Vol 21 (S2) ◽  
pp. 178-190 ◽  
Author(s):  
Jueyou Li ◽  
Guo Chen ◽  
Zhaoyang Dong ◽  
Zhiyou Wu ◽  
Minghai Yao

2019 ◽  
Vol 80 (9) ◽  
pp. 1607-1627 ◽  
Author(s):  
A. V. Nazin ◽  
A. S. Nemirovsky ◽  
A. B. Tsybakov ◽  
A. B. Juditsky

2020 ◽  
Vol 4 (3) ◽  
pp. 548-553 ◽  
Author(s):  
Yue Yu ◽  
Behcet Acikmese

Sign in / Sign up

Export Citation Format

Share Document