scholarly journals Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Algorithms ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 6
Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Author(s):  
Po Ting Lin ◽  
Wei-Hao Lu ◽  
Shu-Ping Lin

In the past few years, researchers have begun to investigate the existence of arbitrary uncertainties in the design optimization problems. Most traditional reliability-based design optimization (RBDO) methods transform the design space to the standard normal space for reliability analysis but may not work well when the random variables are arbitrarily distributed. It is because that the transformation to the standard normal space cannot be determined or the distribution type is unknown. The methods of Ensemble of Gaussian-based Reliability Analyses (EoGRA) and Ensemble of Gradient-based Transformed Reliability Analyses (EGTRA) have been developed to estimate the joint probability density function using the ensemble of kernel functions. EoGRA performs a series of Gaussian-based kernel reliability analyses and merged them together to compute the reliability of the design point. EGTRA transforms the design space to the single-variate design space toward the constraint gradient, where the kernel reliability analyses become much less costly. In this paper, a series of comprehensive investigations were performed to study the similarities and differences between EoGRA and EGTRA. The results showed that EGTRA performs accurate and effective reliability analyses for both linear and nonlinear problems. When the constraints are highly nonlinear, EGTRA may have little problem but still can be effective in terms of starting from deterministic optimal points. On the other hands, the sensitivity analyses of EoGRA may be ineffective when the random distribution is completely inside the feasible space or infeasible space. However, EoGRA can find acceptable design points when starting from deterministic optimal points. Moreover, EoGRA is capable of delivering estimated failure probability of each constraint during the optimization processes, which may be convenient for some applications.


2003 ◽  
Vol 03 (04) ◽  
pp. 443-460 ◽  
Author(s):  
S. L. CHAN ◽  
A. Y. T. CHU ◽  
F. G. ALBERMANI

A robust computer procedure for the reliable design of scaffolding systems is proposed. The design of scaffolding is not detailed in design codes and considered by many researchers and engineers as intractable. The proposed method is based on the classical stability function, which performs excellently in highly nonlinear problems. The method is employed to predict the ultimate design load capacities of four tested 3-storey steel scaffolding units, and for the design of a 30 m×20 m×1.3 m 3-dimensional scaffolding system. As the approach is based on the rigorous second-order analysis allowing for the P-δ and P-Δ effects and for notional disturbance forces, no assumption of effective length is required. It is superior to the conventional second-order analysis of plotting only the bending moment diagram with allowance for P-Δ effect since it considers both P-Δ and P-δ effects such that section capacity check is adequate for strength and stability checking. The proposed method can be applied to large deflection and stability analysis and design of practical scaffolding systems in place of the conventional and unreliable effective length method which carries the disadvantages of uncertain assumption of effective length factor (L e /L).


Geophysics ◽  
2003 ◽  
Vol 68 (4) ◽  
pp. 1310-1319 ◽  
Author(s):  
Antoine Guitton ◽  
William W. Symes

The “Huber function” (or “Huber norm” ) is one of several robust error measures which interpolates between smooth (ℓ2) treatment of small residuals and robust (ℓ1) treatment of large residuals. Since the Huber function is differentiable, it may be minimized reliably with a standard gradient‐based optimizer. We propose to minimize the Huber function with a quasi‐Newton method that has the potential of being faster and more robust than conjugate‐gradient methods when solving nonlinear problems. Tests with a linear inverse problem for velocity analysis with both synthetic and field data suggest that the Huber function gives far more robust model estimates than does a least‐squares fit with or without damping.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1533
Author(s):  
Jingcheng Zhou ◽  
Wei Wei ◽  
Ruizhi Zhang ◽  
Zhiming Zheng

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.


Sign in / Sign up

Export Citation Format

Share Document