Second-Order Methods for Neural Networks

Author(s):  
Adrian J. Shepherd
Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Algorithms ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 6
Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1533
Author(s):  
Jingcheng Zhou ◽  
Wei Wei ◽  
Ruizhi Zhang ◽  
Zhiming Zheng

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.


Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1159
Author(s):  
Shyam Sundar Santra ◽  
Omar Bazighifan ◽  
Mihai Postolache

In continuous applications in electrodynamics, neural networks, quantum mechanics, electromagnetism, and the field of time symmetric, fluid dynamics, neutral differential equations appear when modeling many problems and phenomena. Therefore, it is interesting to study the qualitative behavior of solutions of such equations. In this study, we obtained some new sufficient conditions for oscillations to the solutions of a second-order delay differential equations with sub-linear neutral terms. The results obtained improve and complement the relevant results in the literature. Finally, we show an example to validate the main results, and an open problem is included.


2015 ◽  
Vol 164 ◽  
pp. 252-261 ◽  
Author(s):  
Wu Yang ◽  
Yan-Wu Wang ◽  
Zhi-Gang Zeng ◽  
Ding-Fu Zheng

2019 ◽  
Vol 49 (1) ◽  
pp. 14-26 ◽  
Author(s):  
Honggui Han ◽  
Lu Zhang ◽  
Xiaolong Wu ◽  
Junfei Qiao

Sign in / Sign up

Export Citation Format

Share Document