scholarly journals Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity

2017 ◽  
Vol 32 (4) ◽  
pp. 963-992 ◽  
Author(s):  
Anthony Man-Cho So ◽  
Zirui Zhou
2020 ◽  
Author(s):  
Qing Tao

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.


2018 ◽  
Vol 34 (3) ◽  
pp. 449-457
Author(s):  
HUIJUAN WANG ◽  
◽  
HONG-KUN XU ◽  

We improve a recent accelerated proximal gradient (APG) method in [Li, Q., Zhou, Y., Liang, Y. and Varshney, P. K., Convergence analysis of proximal gradient with momentum for nonconvex optimization, in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017] for nonconvex optimization by allowing variable stepsizes. We prove the convergence of the APG method for a composite nonconvex optimization problem under the assumption that the composite objective function satisfies the Kurdyka-Łojasiewicz property.


2021 ◽  
pp. 1-26
Author(s):  
Richard C. Gerum ◽  
Achim Schilling

Up to now, modern machine learning (ML) has been based on approximating big data sets with high-dimensional functions, taking advantage of huge computational resources. We show that biologically inspired neuron models such as the leaky-integrate-and-fire (LIF) neuron provide novel and efficient ways of information processing. They can be integrated in machine learning models and are a potential target to improve ML performance. Thus, we have derived simple update rules for LIF units to numerically integrate the differential equations. We apply a surrogate gradient approach to train the LIF units via backpropagation. We demonstrate that tuning the leak term of the LIF neurons can be used to run the neurons in different operating modes, such as simple signal integrators or coincidence detectors. Furthermore, we show that the constant surrogate gradient, in combination with tuning the leak term of the LIF units, can be used to achieve the learning dynamics of more complex surrogate gradients. To prove the validity of our method, we applied it to established image data sets (the Oxford 102 flower data set, MNIST), implemented various network architectures, used several input data encodings and demonstrated that the method is suitable to achieve state-of-the-art classification performance. We provide our method as well as further surrogate gradient methods to train spiking neural networks via backpropagation as an open-source KERAS package to make it available to the neuroscience and machine learning community. To increase the interpretability of the underlying effects and thus make a small step toward opening the black box of machine learning, we provide interactive illustrations, with the possibility of systematically monitoring the effects of parameter changes on the learning characteristics.


Sign in / Sign up

Export Citation Format

Share Document