second order methods
Recently Published Documents


TOTAL DOCUMENTS

84
(FIVE YEARS 14)

H-INDEX

17
(FIVE YEARS 2)

Algorithms ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 6
Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


Author(s):  
S. Indrapriyadarsini ◽  
Shahrzad Mahboubi ◽  
Hiroshi Ninomiya ◽  
Takeshi Kamio ◽  
Hideki Asai

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.


2021 ◽  
Vol 191 (1) ◽  
pp. 1-30
Author(s):  
Yurii Nesterov

AbstractIn this paper, we present new second-order methods with convergence rate $$O\left( k^{-4}\right) $$ O k - 4 , where k is the iteration counter. This is faster than the existing lower bound for this type of schemes (Agarwal and Hazan in Proceedings of the 31st conference on learning theory, PMLR, pp. 774–792, 2018; Arjevani and Shiff in Math Program 178(1–2):327–360, 2019), which is $$O\left( k^{-7/2} \right) $$ O k - 7 / 2 . Our progress can be explained by a finer specification of the problem class. The main idea of this approach consists in implementation of the third-order scheme from Nesterov (Math Program 186:157–183, 2021) using the second-order oracle. At each iteration of our method, we solve a nontrivial auxiliary problem by a linearly convergent scheme based on the relative non-degeneracy condition (Bauschke et al. in Math Oper Res 42:330–348, 2016; Lu et al. in SIOPT 28(1):333–354, 2018). During this process, the Hessian of the objective function is computed once, and the gradient is computed $$O\left( \ln {1 \over \epsilon }\right) $$ O ln 1 ϵ times, where $$\epsilon $$ ϵ is the desired accuracy of the solution for our problem.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1533
Author(s):  
Jingcheng Zhou ◽  
Wei Wei ◽  
Ruizhi Zhang ◽  
Zhiming Zheng

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.


2020 ◽  
Vol 28 (3) ◽  
pp. 175-192
Author(s):  
William Layton ◽  
Michael McLaughlin

AbstractThis report presents adaptive artificial compression methods in which the time-step and artificial compression parameter ε are independently adapted. The resulting algorithms are supported by analysis and numerical tests. The first and second-order methods are embedded. As a result, the computational, cognitive, and space complexities of the adaptive ε, k algorithms are negligibly greater than that of the simplest, first-order, constant ε, constant k artificial compression method.


2020 ◽  
Vol 65 (2) ◽  
pp. 846-853
Author(s):  
Natasa Krklec Jerinkic ◽  
Dusan Jakovetic ◽  
Natasa Krejic ◽  
Dragana Bajovic

Sign in / Sign up

Export Citation Format

Share Document