Proximal Gradient Methods for Machine Learning and Imaging

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Integration of Leaky-Integrate-and-Fire Neurons in Standard Machine Learning Architectures to Generate Hybrid Networks: A Surrogate Gradient Approach

Neural Computation ◽

10.1162/neco_a_01424 ◽

2021 ◽

pp. 1-26

Author(s):

Richard C. Gerum ◽

Achim Schilling

Keyword(s):

Machine Learning ◽

Learning Community ◽

Gradient Methods ◽

Image Data ◽

Classification Performance ◽

Data Sets ◽

Neuron Models ◽

Data Set ◽

Integrate And Fire ◽

Gradient Approach

Up to now, modern machine learning (ML) has been based on approximating big data sets with high-dimensional functions, taking advantage of huge computational resources. We show that biologically inspired neuron models such as the leaky-integrate-and-fire (LIF) neuron provide novel and efficient ways of information processing. They can be integrated in machine learning models and are a potential target to improve ML performance. Thus, we have derived simple update rules for LIF units to numerically integrate the differential equations. We apply a surrogate gradient approach to train the LIF units via backpropagation. We demonstrate that tuning the leak term of the LIF neurons can be used to run the neurons in different operating modes, such as simple signal integrators or coincidence detectors. Furthermore, we show that the constant surrogate gradient, in combination with tuning the leak term of the LIF units, can be used to achieve the learning dynamics of more complex surrogate gradients. To prove the validity of our method, we applied it to established image data sets (the Oxford 102 flower data set, MNIST), implemented various network architectures, used several input data encodings and demonstrated that the method is suitable to achieve state-of-the-art classification performance. We provide our method as well as further surrogate gradient methods to train spiking neural networks via backpropagation as an open-source KERAS package to make it available to the neuroscience and machine learning community. To increase the interpretability of the underlying effects and thus make a small step toward opening the black box of machine learning, we provide interactive illustrations, with the possibility of systematically monitoring the effects of parameter changes on the learning characteristics.

Download Full-text

Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization

IEEE Signal Processing Magazine ◽

10.1109/msp.2020.2975210 ◽

2020 ◽

Vol 37 (3) ◽

pp. 92-101 ◽

Cited By ~ 3

Author(s):

Angelia Nedic

Keyword(s):

Machine Learning ◽

Distributed Optimization ◽

Gradient Methods ◽

Learning Problems

Download Full-text

Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-18038-0_29 ◽

2015 ◽

pp. 369-379

Author(s):

Ziqiang Shi ◽

Rujie Liu

Keyword(s):

Machine Learning ◽

Gradient Methods ◽

Hölder Continuous ◽

Finite Sums

Download Full-text

The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Development of hydrophobic paper substrates using silane and sol–gel based processes and deriving the best coating technique using machine learning strategies

Scientific Reports ◽

10.1038/s41598-021-90855-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kapil Manoharan ◽

Mohd. Tahir Anwar ◽

Shantanu Bhattacharya

Keyword(s):

Machine Learning ◽

Learning Strategies ◽

Sol Gel ◽

Gradient Methods ◽

Dip Coating ◽

Machine Learning Algorithms ◽

Coating Materials ◽

Wide Range ◽

Application Techniques ◽

Paper Substrates

AbstractLow energy surface coatings have found wide range of applications for generating hydrophobic and superhydrophobic surfaces. Most of the studies have been related to use of a single coating material over a single substrate or using a single technique. The degree of hydrophobicity is highly dependent on fabrication processes as well as materials being coated and as such warrants a high-level study using experimental optimization leading to the evaluation of the parametric behavior of coatings and their application techniques. Also, a single platform or system which can predict the required set of parameters for generating hydrophobic surface of required nature for given substrate is of requirement. This work applies the powerful machine learning algorithms (Levenberg Marquardt using Gauss Newton and Gradient methods) to evaluate the various processes affecting the anti-wetting behavior of coated printable paper substrates with the capability to predict the most optimized method of coating and materials that may lead to a desirable surface contact angle. The major application techniques used for this study pertain to dip coating, spray coating, spin coating and inkjet printing and silane and sol–gel base coating materials.

Download Full-text

Autonomous Vehicle Path Tracking Based on Natural Gradient Methods

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0888 ◽

2012 ◽

Vol 16 (7) ◽

pp. 888-893

Author(s):

Ki-Young Kwon ◽

◽

Keun-Woo Jung ◽

Dong-Su Yang ◽

Jooyoung Park ◽

...

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Gradient Methods ◽

Pid Controllers ◽

Natural Gradient ◽

Natural Evolution ◽

Engineering Problems ◽

Gradient Based ◽

Vehicle Path

Recently, reinforcement learning and evolution strategy have become major tools in the field of machine learning, and have shown excellent performance in various engineering problems. In particular, the Natural Actor-Critic (NAC) approach and the Natural Evolution Strategies (NES) have led to considerable interests in the area of natural-gradient-based machine learning methods with many successful applications. In this paper, we apply the NAC and the NES to pathtracking control problems for autonomous vehicles. Simulation results show that these methods can yield better performance compared to the conventional PID controllers.

Download Full-text

Stochastic sub-sampled Newton method with variance reduction

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319500413 ◽

2019 ◽

Vol 17 (06) ◽

pp. 1950041

Author(s):

Zhijian Luo ◽

Yuntao Qian

Keyword(s):

Machine Learning ◽

Newton Method ◽

Large Scale ◽

Variance Reduction ◽

Linear Time ◽

Computational Cost ◽

Gradient Methods ◽

Nonlinear Problems ◽

Learning Problems ◽

Reduced Gradient

Stochastic optimization on large-scale machine learning problems has been developed dramatically since stochastic gradient methods with variance reduction technique were introduced. Several stochastic second-order methods, which approximate curvature information by the Hessian in stochastic setting, have been proposed for improvements. In this paper, we introduce a Stochastic Sub-Sampled Newton method with Variance Reduction (S2NMVR), which incorporates the sub-sampled Newton method and stochastic variance-reduced gradient. For many machine learning problems, the linear time Hessian-vector production provides evidence to the computational efficiency of S2NMVR. We then develop two variations of S2NMVR that preserve the estimation of Hessian inverse and decrease the computational cost of Hessian-vector product for nonlinear problems.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text