EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013337 ◽

2019 ◽

Vol 33 ◽

pp. 3337-3346

Author(s):

Sheng-Wei Chen ◽

Chun-Nan Chou ◽

Edward Y. Chang

Keyword(s):

Neural Networks ◽

Empirical Studies ◽

Hessian Matrix ◽

Coefficient Matrix ◽

Second Order ◽

Order Method ◽

Criterion Functions ◽

Rank Approximation ◽

Fully Connected ◽

Approximate Hessian

For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kroneckerfactored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.

Download Full-text

Adjoint-based exact Hessian computation

BIT Numerical Mathematics ◽

10.1007/s10543-020-00833-0 ◽

2021 ◽

Cited By ~ 1

Author(s):

Shin-ichi Ito ◽

Takeru Matsuda ◽

Yuto Miyatake

Keyword(s):

Krylov Subspace ◽

Hessian Matrix ◽

Adjoint System ◽

Coefficient Matrix ◽

Scalar Function ◽

Second Order ◽

Subspace Method ◽

Initial Value ◽

Research Fields ◽

Memory Efficiency

AbstractWe consider a scalar function depending on a numerical solution of an initial value problem, and its second-derivative (Hessian) matrix for the initial value. The need to extract the information of the Hessian or to solve a linear system having the Hessian as a coefficient matrix arises in many research fields such as optimization, Bayesian estimation, and uncertainty quantification. From the perspective of memory efficiency, these tasks often employ a Krylov subspace method that does not need to hold the Hessian matrix explicitly and only requires computing the multiplication of the Hessian and a given vector. One of the ways to obtain an approximation of such Hessian-vector multiplication is to integrate the so-called second-order adjoint system numerically. However, the error in the approximation could be significant even if the numerical integration to the second-order adjoint system is sufficiently accurate. This paper presents a novel algorithm that computes the intended Hessian-vector multiplication exactly and efficiently. For this aim, we give a new concise derivation of the second-order adjoint system and show that the intended multiplication can be computed exactly by applying a particular numerical method to the second-order adjoint system. In the discussion, symplectic partitioned Runge–Kutta methods play an essential role.

Download Full-text

Approximate Hessian matrices and second-order optimality conditions for nonlinear programming problems with C1-data

The Journal of the Australian Mathematical Society Series B Applied Mathematics ◽

10.1017/s0334270000010985 ◽

1999 ◽

Vol 40 (3) ◽

pp. 403-420 ◽

Cited By ~ 25

Author(s):

V. Jeyakumar ◽

X. Wang

Keyword(s):

Nonlinear Programming ◽

Optimality Conditions ◽

Jacobian Matrix ◽

Hessian Matrix ◽

Second Order ◽

Second Order Optimality Conditions ◽

Differentiable Functions ◽

Continuous Maps ◽

Mathematical Programming Problems ◽

Approximate Hessian

AbstractIn this paper, we present generalizations of the Jacobian matrix and the Hessian matrix to continuous maps and continuously differentiable functions respectively. We then establish second-order optimality conditions for mathematical programming problems with continuously differentiable functions. The results also sharpen the corresponding results for problems involving C1.1-functions.

Download Full-text

Matrix Analysis of Second-Order Kinematic Constraints of Single-Loop Linkages in Screw Coordinates

Volume 5B: 42nd Mechanisms and Robotics Conference ◽

10.1115/detc2018-85433 ◽

2018 ◽

Author(s):

Liheng Wu ◽

Andreas Müller ◽

Jian S. Dai

Keyword(s):

Quadratic Forms ◽

Hessian Matrix ◽

Coefficient Matrix ◽

Higher Order ◽

Second Order ◽

Matrix Analysis ◽

Kinematic Constraints ◽

Single Loop ◽

Order Analysis ◽

Order Constraints

Higher order loop constraints play a key role in the local mobility, singularity and dynamic analysis of closed loop linkages. Recently, closed forms of higher order kinematic constraints have been achieved with nested Lie product in screw coordinates, and are purely algebraic operations. However, the complexity of expressions makes the higher order analysis complicated and highly reliant on computer implementations. In this paper matrix expressions of first and second-order kinematic constraints, i.e. involving the Jacobian and Hessian matrix, are formulated explicitly for single-loop linkages in terms of screw coordinates. For overconstrained linkages, which possess self-stress, the first- and second-order constraints are reduced to a set of quadratic forms. The test for the order of mobility relies on solutions of higher order constraints. Second-order mobility analysis boils down to testing the property of coefficient matrix of the quadratic forms (i.e. the Hessian) rather than to solving them. Thus, the second-order analysis is simplified.

Download Full-text

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Mathematics ◽

10.3390/math9131533 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1533

Author(s):

Jingcheng Zhou ◽

Wei Wei ◽

Ruizhi Zhang ◽

Zhiming Zheng

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Hessian Matrix ◽

Second Order ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Classification Problems ◽

Training Time ◽

Second Order Methods

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.

Download Full-text

Routing Optimization with Efficient Second Order Distributed Approach Using Congestion Control Rules

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0208 ◽

2017 ◽

Vol 7 (7) ◽

pp. 363

Author(s):

Jaya Pratha Sebastiyar ◽

Martin Sahayaraj Joseph

Keyword(s):

Congestion Control ◽

Second Order ◽

Back Pressure ◽

Order Method ◽

Delay Performance ◽

Routing Optimization ◽

Distributed Approach ◽

Control Rules ◽

Primal Dual ◽

Queue Stability

Distributed joint congestion control and routing optimization has received a significant amount of attention recently. To date, however, most of the existing schemes follow a key idea called the back-pressure algorithm. Despite having many salient features, the first-order sub gradient nature of the back-pressure based schemes results in slow convergence and poor delay performance. To overcome these limitations, the present study was made as first attempt at developing a second-order joint congestion control and routing optimization framework that offers utility-optimality, queue-stability, fast convergence, and low delay. Contributions in this project are three-fold. The present study propose a new second-order joint congestion control and routing framework based on a primal-dual interior-point approach and established utility-optimality and queue-stability of the proposed second-order method. The results of present study showed that how to implement the proposed second-order method in a distributed fashion.

Download Full-text

A second-order method of characteristics for two-dimensional unsteady flow with application to turbomachinery cascades

10.31274/rtd-180813-3403 ◽

1974 ◽

Author(s):

Robert Anthony Delaney

Keyword(s):

Unsteady Flow ◽

Method Of Characteristics ◽

Second Order ◽

Two Dimensional ◽

Order Method ◽

Turbomachinery Cascades

Download Full-text

Stabilization for a class of delayed switched inertial neural networks via non-reduced order method

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220985944 ◽

2021 ◽

pp. 014233122098594

Author(s):

Xuan Chen ◽

Dongyun Lin

Keyword(s):

Neural Networks ◽

Comparative Research ◽

Lyapunov Functional ◽

Sufficient Conditions ◽

Global Stabilization ◽

Order Method ◽

Reduced Order ◽

Global Asymptotic Stabilization ◽

Inertial Neural Networks ◽

Barbalat Lemma

This paper tackles the issue of global stabilization for a class of delayed switched inertial neural networks (SINN). Distinct from the frequently employed reduced-order technique, this paper studies SINN directly through non-reduced order method. By constructing a novel Lyapunov functional and using Barbalat Lemma, sufficient conditions for the global asymptotic stabilization issue and global exponential stabilization issue of the considered SINN are established. Numerical simulations further confirm the feasibility of the main results. The comparative research shows that global stabilization results of this paper complement and improve some existing work.

Download Full-text

Non-reduced order method to global h-stability criteria for proportional delay high-order inertial neural networks

Applied Mathematics and Computation ◽

10.1016/j.amc.2021.126308 ◽

2021 ◽

Vol 407 ◽

pp. 126308

Author(s):

Junlan Wang ◽

Xin Wang ◽

Yantao Wang ◽

Xian Zhang

Keyword(s):

Neural Networks ◽

High Order ◽

Proportional Delay ◽

Order Method ◽

Stability Criteria ◽

Reduced Order ◽

Inertial Neural Networks

Download Full-text

Robust topology optimization with low rank approximation using artificial neural networks

Computational Mechanics ◽

10.1007/s00466-021-02069-3 ◽

2021 ◽

Author(s):

Vahid Keshavarzzadeh ◽

Robert M. Kirby ◽

Akil Narayan

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Topology Optimization ◽

Low Rank ◽

Low Rank Approximation ◽

Robust Topology Optimization ◽

Rank Approximation ◽

Artificial Neural

Download Full-text

New Conditions for the Oscillation of Second-Order Differential Equations with Sublinear Neutral Terms

Mathematics ◽

10.3390/math9111159 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1159

Author(s):

Shyam Sundar Santra ◽

Omar Bazighifan ◽

Mihai Postolache

Keyword(s):

Fluid Dynamics ◽

Neural Networks ◽

Differential Equations ◽

Delay Differential Equations ◽

Sufficient Conditions ◽

Second Order ◽

Qualitative Behavior ◽

Neutral Differential Equations ◽

Second Order Differential Equations ◽

Delay Differential

In continuous applications in electrodynamics, neural networks, quantum mechanics, electromagnetism, and the field of time symmetric, fluid dynamics, neutral differential equations appear when modeling many problems and phenomena. Therefore, it is interesting to study the qualitative behavior of solutions of such equations. In this study, we obtained some new sufficient conditions for oscillations to the solutions of a second-order delay differential equations with sub-linear neutral terms. The results obtained improve and complement the relevant results in the literature. Finally, we show an example to validate the main results, and an open problem is included.

Download Full-text