Second-Order Methods for Neural Networks

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Download Full-text

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

10.20944/preprints202112.0097.v1 ◽

2021 ◽

Author(s):

S. Indrapriyadarsini ◽

Shahrzad Mahboubi ◽

Hiroshi Ninomiya ◽

Takeshi Kamio ◽

Hideki Asai

Keyword(s):

Neural Networks ◽

Newton Method ◽

Trust Region ◽

Nonlinear Problems ◽

Classification Problem ◽

Second Order ◽

Highly Nonlinear ◽

Gradient Based ◽

Quasi Newton ◽

Second Order Methods

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Download Full-text

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Algorithms ◽

10.3390/a15010006 ◽

2021 ◽

Vol 15 (1) ◽

pp. 6

Author(s):

S. Indrapriyadarsini ◽

Shahrzad Mahboubi ◽

Hiroshi Ninomiya ◽

Takeshi Kamio ◽

Hideki Asai

Keyword(s):

Neural Networks ◽

Newton Method ◽

Trust Region ◽

Nonlinear Problems ◽

Classification Problem ◽

Second Order ◽

Highly Nonlinear ◽

Gradient Based ◽

Quasi Newton ◽

Second Order Methods

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Download Full-text

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Mathematics ◽

10.3390/math9131533 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1533

Author(s):

Jingcheng Zhou ◽

Wei Wei ◽

Ruizhi Zhang ◽

Zhiming Zheng

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Hessian Matrix ◽

Second Order ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Classification Problems ◽

Training Time ◽

Second Order Methods

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.

Download Full-text

Second‐order Methods for Neural Networks:983Adrian J. Shepherd. Second‐order Methods for Neural Networks: Fast and Reliable Training Methods for Multi‐Layer Perceptrons (Perspectives in Neural Computing Series). London: Springer 1997. xiv + 145 pp, ISBN: 3‐540‐76100‐4 German DM76.00 Austrian Schilling 554.80 Swiss Francs 67.00 £29.00 or US$49.95.

Kybernetes ◽

10.1108/k.1998.27.2.201.3 ◽

1998 ◽

Vol 27 (2) ◽

pp. 201-203

Author(s):

Alex M. Andrew

Keyword(s):

Neural Networks ◽

Second Order ◽

Training Methods ◽

Neural Computing ◽

Second Order Methods

Download Full-text

New Conditions for the Oscillation of Second-Order Differential Equations with Sublinear Neutral Terms

Mathematics ◽

10.3390/math9111159 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1159

Author(s):

Shyam Sundar Santra ◽

Omar Bazighifan ◽

Mihai Postolache

Keyword(s):

Fluid Dynamics ◽

Neural Networks ◽

Differential Equations ◽

Delay Differential Equations ◽

Sufficient Conditions ◽

Second Order ◽

Qualitative Behavior ◽

Neutral Differential Equations ◽

Second Order Differential Equations ◽

Delay Differential

In continuous applications in electrodynamics, neural networks, quantum mechanics, electromagnetism, and the field of time symmetric, fluid dynamics, neutral differential equations appear when modeling many problems and phenomena. Therefore, it is interesting to study the qualitative behavior of solutions of such equations. In this study, we obtained some new sufficient conditions for oscillations to the solutions of a second-order delay differential equations with sub-linear neutral terms. The results obtained improve and complement the relevant results in the literature. Finally, we show an example to validate the main results, and an open problem is included.

Download Full-text