Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

On convergence analysis of dual proximal-gradient methods with approximate gradient for a class of nonsmooth convex minimization problems

Journal of Industrial and Management Optimization ◽

10.3934/jimo.2016.12.389 ◽

2015 ◽

Vol 12 (1) ◽

pp. 389-402 ◽

Cited By ~ 1

Author(s):

Sanming Liu ◽

Zhijie Wang ◽

Chongyang Liu

Keyword(s):

Convergence Analysis ◽

Gradient Methods ◽

Convex Minimization ◽

Minimization Problems ◽

Convex Minimization Problems ◽

Nonsmooth Convex Minimization

Download Full-text

Convergence Analysis of Nonlinear Conjugate Gradient Methods

Optimization and Regularization for Computational Inverse Problems and Applications ◽

10.1007/978-3-642-13742-6_8 ◽

2010 ◽

pp. 157-181 ◽

Cited By ~ 2

Author(s):

Yuhong Dai

Keyword(s):

Convergence Analysis ◽

Conjugate Gradient ◽

Gradient Methods ◽

Conjugate Gradient Methods ◽

Nonlinear Conjugate Gradient ◽

Nonlinear Conjugate Gradient Methods

Download Full-text

Local convergence analysis of conjugate gradient methods for solving algebraic Riccati equations

IEEE Transactions on Automatic Control ◽

10.1109/9.148374 ◽

1992 ◽

Vol 37 (7) ◽

pp. 1062-1067 ◽

Cited By ~ 11

Author(s):

A.R. Ghavimi ◽

C. Kenney ◽

A.J. Laub

Keyword(s):

Convergence Analysis ◽

Conjugate Gradient ◽

Local Convergence ◽

Gradient Methods ◽

Riccati Equations ◽

Conjugate Gradient Methods ◽

Algebraic Riccati Equations ◽

Local Convergence Analysis

Download Full-text

Asymptotic Convergence Analysis of Some Inexact Proximal Point Algorithms for Minimization

SIAM Journal on Optimization ◽

10.1137/s1052623494255923 ◽

1996 ◽

Vol 6 (3) ◽

pp. 626-637 ◽

Cited By ~ 3

Author(s):

Ciyou Zhu

Keyword(s):

Convergence Analysis ◽

Asymptotic Convergence ◽

Proximal Point ◽

Proximal Point Algorithms ◽

Inexact Proximal ◽

Inexact Proximal Point Algorithms

Download Full-text

Proximal Gradient Methods for Machine Learning and Imaging

Harmonic and Applied Analysis - Applied and Numerical Harmonic Analysis ◽

10.1007/978-3-030-86664-8_4 ◽

2021 ◽

pp. 149-244

Author(s):

Saverio Salzo ◽

Silvia Villa

Keyword(s):

Machine Learning ◽

Gradient Methods

Download Full-text

A note on the accelerated proximal gradient method for nonconvex optimization

Carpathian Journal of Mathematics ◽

10.37193/cjm.2018.03.22 ◽

2018 ◽

Vol 34 (3) ◽

pp. 449-457

Author(s):

HUIJUAN WANG ◽

◽

HONG-KUN XU ◽

Keyword(s):

Machine Learning ◽

Objective Function ◽

Convergence Analysis ◽

Nonconvex Optimization ◽

Gradient Method ◽

Optimization Problem ◽

Proximal Gradient Method ◽

International Conference ◽

Variable Stepsizes ◽

Sydney Australia

We improve a recent accelerated proximal gradient (APG) method in [Li, Q., Zhou, Y., Liang, Y. and Varshney, P. K., Convergence analysis of proximal gradient with momentum for nonconvex optimization, in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017] for nonconvex optimization by allowing variable stepsizes. We prove the convergence of the APG method for a composite nonconvex optimization problem under the assumption that the composite objective function satisfies the Kurdyka-Łojasiewicz property.

Download Full-text

Integration of Leaky-Integrate-and-Fire Neurons in Standard Machine Learning Architectures to Generate Hybrid Networks: A Surrogate Gradient Approach

Neural Computation ◽

10.1162/neco_a_01424 ◽

2021 ◽

pp. 1-26

Author(s):

Richard C. Gerum ◽

Achim Schilling

Keyword(s):

Machine Learning ◽

Learning Community ◽

Gradient Methods ◽

Image Data ◽

Classification Performance ◽

Data Sets ◽

Neuron Models ◽

Data Set ◽

Integrate And Fire ◽

Gradient Approach

Up to now, modern machine learning (ML) has been based on approximating big data sets with high-dimensional functions, taking advantage of huge computational resources. We show that biologically inspired neuron models such as the leaky-integrate-and-fire (LIF) neuron provide novel and efficient ways of information processing. They can be integrated in machine learning models and are a potential target to improve ML performance. Thus, we have derived simple update rules for LIF units to numerically integrate the differential equations. We apply a surrogate gradient approach to train the LIF units via backpropagation. We demonstrate that tuning the leak term of the LIF neurons can be used to run the neurons in different operating modes, such as simple signal integrators or coincidence detectors. Furthermore, we show that the constant surrogate gradient, in combination with tuning the leak term of the LIF units, can be used to achieve the learning dynamics of more complex surrogate gradients. To prove the validity of our method, we applied it to established image data sets (the Oxford 102 flower data set, MNIST), implemented various network architectures, used several input data encodings and demonstrated that the method is suitable to achieve state-of-the-art classification performance. We provide our method as well as further surrogate gradient methods to train spiking neural networks via backpropagation as an open-source KERAS package to make it available to the neuroscience and machine learning community. To increase the interpretability of the underlying effects and thus make a small step toward opening the black box of machine learning, we provide interactive illustrations, with the possibility of systematically monitoring the effects of parameter changes on the learning characteristics.

Download Full-text