second order methods Latest Research Papers

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Download Full-text

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

10.20944/preprints202112.0097.v2 ◽

2021 ◽

Author(s):

S. Indrapriyadarsini ◽

Shahrzad Mahboubi ◽

Hiroshi Ninomiya ◽

Takeshi Kamio ◽

Hideki Asai

Keyword(s):

Neural Networks ◽

Newton Method ◽

Trust Region ◽

Nonlinear Problems ◽

Classification Problem ◽

Second Order ◽

Highly Nonlinear ◽

Gradient Based ◽

Quasi Newton ◽

Second Order Methods

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Download Full-text

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

10.20944/preprints202112.0097.v1 ◽

2021 ◽

Author(s):

S. Indrapriyadarsini ◽

Shahrzad Mahboubi ◽

Hiroshi Ninomiya ◽

Takeshi Kamio ◽

Hideki Asai

Keyword(s):

Neural Networks ◽

Newton Method ◽

Trust Region ◽

Nonlinear Problems ◽

Classification Problem ◽

Second Order ◽

Highly Nonlinear ◽

Gradient Based ◽

Quasi Newton ◽

Second Order Methods

Gradient based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method though less commonly used in training neural networks, are known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks and briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

Download Full-text

Second-Order Methods in Variational Data Assimilation

10.1007/978-3-030-77722-7_7 ◽

2021 ◽

pp. 155-183

Author(s):

François-Xavier Le Dimet ◽

Victor Shutyaev

Keyword(s):

Data Assimilation ◽

Second Order ◽

Variational Data Assimilation ◽

Second Order Methods

Download Full-text

Nonsmooth exact penalization second-order methods for incompressible bi-viscous fluids

Computational Optimization and Applications ◽

10.1007/s10589-021-00314-2 ◽

2021 ◽

Author(s):

Sergio González-Andrade ◽

Sofía López-Ordóñez ◽

Pedro Merino

Keyword(s):

Second Order ◽

Viscous Fluids ◽

Exact Penalization ◽

Second Order Methods

Download Full-text

Superfast Second-Order Methods for Unconstrained Convex Optimization

Journal of Optimization Theory and Applications ◽

10.1007/s10957-021-01930-y ◽

2021 ◽

Vol 191 (1) ◽

pp. 1-30

Author(s):

Yurii Nesterov

Keyword(s):

Auxiliary Problem ◽

Main Idea ◽

Second Order ◽

Third Order ◽

The Third ◽

Problem Class ◽

Degeneracy Condition ◽

Math Program ◽

Order Scheme ◽

Second Order Methods

AbstractIn this paper, we present new second-order methods with convergence rate $$O\left( k^{-4}\right) $$ O k - 4 , where k is the iteration counter. This is faster than the existing lower bound for this type of schemes (Agarwal and Hazan in Proceedings of the 31st conference on learning theory, PMLR, pp. 774–792, 2018; Arjevani and Shiff in Math Program 178(1–2):327–360, 2019), which is $$O\left( k^{-7/2} \right) $$ O k - 7 / 2 . Our progress can be explained by a finer specification of the problem class. The main idea of this approach consists in implementation of the third-order scheme from Nesterov (Math Program 186:157–183, 2021) using the second-order oracle. At each iteration of our method, we solve a nontrivial auxiliary problem by a linearly convergent scheme based on the relative non-degeneracy condition (Bauschke et al. in Math Oper Res 42:330–348, 2016; Lu et al. in SIOPT 28(1):333–354, 2018). During this process, the Hessian of the objective function is computed once, and the gradient is computed $$O\left( \ln {1 \over \epsilon }\right) $$ O ln 1 ϵ times, where $$\epsilon $$ ϵ is the desired accuracy of the solution for our problem.

Download Full-text

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Mathematics ◽

10.3390/math9131533 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1533

Author(s):

Jingcheng Zhou ◽

Wei Wei ◽

Ruizhi Zhang ◽

Zhiming Zheng

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Hessian Matrix ◽

Second Order ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Classification Problems ◽

Training Time ◽

Second Order Methods

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.

Download Full-text

Doubly-adaptive artificial compression methods for incompressible flow

Journal of Numerical Mathematics ◽

10.1515/jnma-2019-0015 ◽

2020 ◽

Vol 28 (3) ◽

pp. 175-192

Author(s):

William Layton ◽

Michael McLaughlin

Keyword(s):

Incompressible Flow ◽

Second Order ◽

Compression Method ◽

Time Step ◽

First Order ◽

Numerical Tests ◽

Compression Parameter ◽

Artificial Compression ◽

Artificial Compression Method ◽

Second Order Methods

AbstractThis report presents adaptive artificial compression methods in which the time-step and artificial compression parameter ε are independently adapted. The resulting algorithms are supported by analysis and numerical tests. The first and second-order methods are embedded. As a result, the computational, cognitive, and space complexities of the adaptive ε, k algorithms are negligibly greater than that of the simplest, first-order, constant ε, constant k artificial compression method.

Download Full-text

Hermitian second-order methods for excited electronic states: Unitary coupled cluster in comparison with algebraic–diagrammatic construction schemes

The Journal of Chemical Physics ◽

10.1063/1.5142354 ◽

2020 ◽

Vol 152 (9) ◽

pp. 094106 ◽

Cited By ~ 5

Author(s):

Manuel Hodecker ◽

Dirk R. Rehn ◽

Andreas Dreuw

Keyword(s):

Electronic States ◽

Second Order ◽

Coupled Cluster ◽

Excited Electronic States ◽

Second Order Methods

Download Full-text

Distributed Second-Order Methods With Increasing Number of Working Nodes

IEEE Transactions on Automatic Control ◽

10.1109/tac.2019.2922191 ◽

2020 ◽

Vol 65 (2) ◽

pp. 846-853

Author(s):

Natasa Krklec Jerinkic ◽

Dusan Jakovetic ◽

Natasa Krejic ◽

Dragana Bajovic

Keyword(s):

Second Order ◽

Second Order Methods

Download Full-text

second order methods
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Second-Order Methods in Variational Data Assimilation

Nonsmooth exact penalization second-order methods for incompressible bi-viscous fluids

Superfast Second-Order Methods for Unconstrained Convex Optimization

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Doubly-adaptive artificial compression methods for incompressible flow

Hermitian second-order methods for excited electronic states: Unitary coupled cluster in comparison with algebraic–diagrammatic construction schemes

Distributed Second-Order Methods With Increasing Number of Working Nodes

Export Citation Format

second order methodsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

Second-Order Methods in Variational Data Assimilation

Nonsmooth exact penalization second-order methods for incompressible bi-viscous fluids

Superfast Second-Order Methods for Unconstrained Convex Optimization

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Doubly-adaptive artificial compression methods for incompressible flow

Hermitian second-order methods for excited electronic states: Unitary coupled cluster in comparison with algebraic–diagrammatic construction schemes

Distributed Second-Order Methods With Increasing Number of Working Nodes

second order methods
Recently Published Documents