The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218.v1 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Stochastic sub-sampled Newton method with variance reduction

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319500413 ◽

2019 ◽

Vol 17 (06) ◽

pp. 1950041

Author(s):

Zhijian Luo ◽

Yuntao Qian

Keyword(s):

Machine Learning ◽

Newton Method ◽

Large Scale ◽

Variance Reduction ◽

Linear Time ◽

Computational Cost ◽

Gradient Methods ◽

Nonlinear Problems ◽

Learning Problems ◽

Reduced Gradient

Stochastic optimization on large-scale machine learning problems has been developed dramatically since stochastic gradient methods with variance reduction technique were introduced. Several stochastic second-order methods, which approximate curvature information by the Hessian in stochastic setting, have been proposed for improvements. In this paper, we introduce a Stochastic Sub-Sampled Newton method with Variance Reduction (S2NMVR), which incorporates the sub-sampled Newton method and stochastic variance-reduced gradient. For many machine learning problems, the linear time Hessian-vector production provides evidence to the computational efficiency of S2NMVR. We then develop two variations of S2NMVR that preserve the estimation of Hessian inverse and decrease the computational cost of Hessian-vector product for nonlinear problems.

Download Full-text

Entropy-Penalized Semidefinite Programming

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/157 ◽

2019 ◽

Cited By ~ 2

Author(s):

Mikhail Krechetov ◽

Jakub Marecek ◽

Yury Maximov ◽

Martin Takac

Keyword(s):

Machine Learning ◽

Time Complexity ◽

Optimization Problems ◽

Linear Time ◽

Broad Class ◽

Low Rank ◽

Learning Problems ◽

Unified Framework ◽

Gradient Computation ◽

Machine Learning Applications

Low-rank methods for semi-definite programming (SDP) have gained a lot of interest recently, especially in machine learning applications. Their analysis often involves determinant-based or Schatten-norm penalties, which are difficult to implement in practice due to high computational efforts. In this paper, we propose Entropy-Penalized Semi-Definite Programming (EP-SDP), which provides a unified framework for a broad class of penalty functions used in practice to promote a low-rank solution. We show that EP-SDP problems admit an efficient numerical algorithm, having (almost) linear time complexity of the gradient computation; this makes it useful for many machine learning and optimization problems. We illustrate the practical efficiency of our approach on several combinatorial optimization and machine learning problems.

Download Full-text

Sufficient descent conjugate gradient methods for large-scale optimization problems

International Journal of Computer Mathematics ◽

10.1080/00207160.2011.592938 ◽

2011 ◽

Vol 88 (16) ◽

pp. 3436-3447 ◽

Cited By ~ 2

Author(s):

Xiuyun Zheng ◽

Hongwei Liu ◽

Aiguo Lu

Keyword(s):

Conjugate Gradient ◽

Large Scale ◽

Optimization Problems ◽

Gradient Methods ◽

Conjugate Gradient Methods ◽

Large Scale Optimization ◽

Sufficient Descent ◽

Scale Optimization

Download Full-text

Randomized sketch descent methods for non-separable linearly constrained optimization

IMA Journal of Numerical Analysis ◽

10.1093/imanum/draa018 ◽

2020 ◽

Author(s):

Ion Necoara ◽

Martin Takáč

Keyword(s):

Objective Function ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Sufficient Conditions ◽

Linear Constraints ◽

Descent Methods ◽

Constrained Problems ◽

Special Cases ◽

Linearly Constrained

Abstract In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we develop new sketch descent methods for solving general smooth linearly constrained problems, in particular, random sketch descent (RSD) and accelerated random sketch descent (A-RSD) methods. To our knowledge, this is the first convergence analysis of RSD algorithms for optimization problems with multiple non-separable linear constraints. For the general case, when the objective function is smooth and non-convex, we prove for the non-accelerated variant sublinear rate in expectation for an appropriate optimality measure. In the smooth convex case, we derive for both algorithms, non-accelerated and A-RSD, sublinear convergence rates in the expected values of the objective function. Additionally, if the objective function satisfies a strong convexity type condition, both algorithms converge linearly in expectation. In special cases, where complexity bounds are known for some particular sketching algorithms, such as coordinate descent methods for optimization problems with a single linear coupled constraint, our theory recovers the best known bounds. Finally, we present several numerical examples to illustrate the performances of our new algorithms.

Download Full-text

Predicting solutions of large-scale optimization problems via machine learning: A case study in blood supply chain management

Computers & Operations Research ◽

10.1016/j.cor.2020.104941 ◽

2020 ◽

Vol 119 ◽

pp. 104941 ◽

Cited By ~ 4

Author(s):

Babak Abbasi ◽

Toktam Babaei ◽

Zahra Hosseinifard ◽

Kate Smith-Miles ◽

Maryam Dehghani

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Supply Chain Management ◽

Blood Supply ◽

Large Scale ◽

Optimization Problems ◽

Large Scale Optimization ◽

Chain Management ◽

Scale Optimization

Download Full-text

Asynchronous Delay-Aware Accelerated Proximal Coordinate Descent for Nonconvex Nonsmooth Problems

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011528 ◽

2019 ◽

Vol 33 ◽

pp. 1528-1535

Author(s):

Ehsan Kazemi ◽

Liqiang Wang

Keyword(s):

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Coordinate Descent ◽

Performance Guarantee ◽

Nonsmooth Problems ◽

Descent Property ◽

Large Scale Problems ◽

Nonconvex And Nonsmooth Optimization ◽

Sufficient Descent

Nonconvex and nonsmooth problems have recently attracted considerable attention in machine learning. However, developing efficient methods for the nonconvex and nonsmooth optimization problems with certain performance guarantee remains a challenge. Proximal coordinate descent (PCD) has been widely used for solving optimization problems, but the knowledge of PCD methods in the nonconvex setting is very limited. On the other hand, the asynchronous proximal coordinate descent (APCD) recently have received much attention in order to solve large-scale problems. However, the accelerated variants of APCD algorithms are rarely studied. In this paper, we extend APCD method to the accelerated algorithm (AAPCD) for nonsmooth and nonconvex problems that satisfies the sufficient descent property, by comparing between the function values at proximal update and a linear extrapolated point using a delay-aware momentum value. To the best of our knowledge, we are the first to provide stochastic and deterministic accelerated extension of APCD algorithms for general nonconvex and nonsmooth problems ensuring that for both bounded delays and unbounded delays every limit point is a critical point. By leveraging Kurdyka-Łojasiewicz property, we will show linear and sublinear convergence rates for the deterministic AAPCD with bounded delays. Numerical results demonstrate the practical efficiency of our algorithm in speed.

Download Full-text

A New Homotopy Proximal Variable-Metric Framework for Composite Convex Minimization

Mathematics of Operations Research ◽

10.1287/moor.2021.1138 ◽

2021 ◽

Author(s):

Quoc Tran-Dinh ◽

Ling Liang ◽

Kim-Chuan Toh

Keyword(s):

Convergence Rates ◽

Optimization Problems ◽

Linear Convergence ◽

Convex Minimization ◽

Variable Metric ◽

Complexity Bounds ◽

Convex Problems ◽

Convex Minimization Problems ◽

Global Iteration ◽

Primal Dual

This paper suggests two novel ideas to develop new proximal variable-metric methods for solving a class of composite convex optimization problems. The first idea is to utilize a new parameterization strategy of the optimality condition to design a class of homotopy proximal variable-metric algorithms that can achieve linear convergence and finite global iteration-complexity bounds. We identify at least three subclasses of convex problems in which our approach can apply to achieve linear convergence rates. The second idea is a new primal-dual-primal framework for implementing proximal Newton methods that has attractive computational features for a subclass of nonsmooth composite convex minimization problems. We specialize the proposed algorithm to solve a covariance estimation problem in order to demonstrate its computational advantages. Numerical experiments on the four concrete applications are given to illustrate the theoretical and computational advances of the new methods compared with other state-of-the-art algorithms.

Download Full-text

A Conjugate Gradient Type Method for the Nonnegative Constraints Optimization Problems

Journal of Applied Mathematics ◽

10.1155/2013/986317 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6

Author(s):

Can Li

Keyword(s):

Conjugate Gradient ◽

Large Scale ◽

Optimization Problems ◽

Gradient Methods ◽

Type Method ◽

Unconstrained Optimization Problems ◽

Feasible Direction Method ◽

Gradient Type ◽

Large Scale Unconstrained Optimization ◽

Nonnegative Constraints

We are concerned with the nonnegative constraints optimization problems. It is well known that the conjugate gradient methods are efficient methods for solving large-scale unconstrained optimization problems due to their simplicity and low storage. Combining the modified Polak-Ribière-Polyak method proposed by Zhang, Zhou, and Li with the Zoutendijk feasible direction method, we proposed a conjugate gradient type method for solving the nonnegative constraints optimization problems. If the current iteration is a feasible point, the direction generated by the proposed method is always a feasible descent direction at the current iteration. Under appropriate conditions, we show that the proposed method is globally convergent. We also present some numerical results to show the efficiency of the proposed method.

Download Full-text

Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization

IEEE Signal Processing Magazine ◽

10.1109/msp.2020.2975210 ◽

2020 ◽

Vol 37 (3) ◽

pp. 92-101 ◽

Cited By ~ 3

Author(s):

Angelia Nedic

Keyword(s):

Machine Learning ◽

Distributed Optimization ◽

Gradient Methods ◽

Learning Problems

Download Full-text