scholarly journals EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks

Author(s):  
Sheng-Wei Chen ◽  
Chun-Nan Chou ◽  
Edward Y. Chang

For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kroneckerfactored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.

Author(s):  
Shin-ichi Ito ◽  
Takeru Matsuda ◽  
Yuto Miyatake

AbstractWe consider a scalar function depending on a numerical solution of an initial value problem, and its second-derivative (Hessian) matrix for the initial value. The need to extract the information of the Hessian or to solve a linear system having the Hessian as a coefficient matrix arises in many research fields such as optimization, Bayesian estimation, and uncertainty quantification. From the perspective of memory efficiency, these tasks often employ a Krylov subspace method that does not need to hold the Hessian matrix explicitly and only requires computing the multiplication of the Hessian and a given vector. One of the ways to obtain an approximation of such Hessian-vector multiplication is to integrate the so-called second-order adjoint system numerically. However, the error in the approximation could be significant even if the numerical integration to the second-order adjoint system is sufficiently accurate. This paper presents a novel algorithm that computes the intended Hessian-vector multiplication exactly and efficiently. For this aim, we give a new concise derivation of the second-order adjoint system and show that the intended multiplication can be computed exactly by applying a particular numerical method to the second-order adjoint system. In the discussion, symplectic partitioned Runge–Kutta methods play an essential role.


Author(s):  
V. Jeyakumar ◽  
X. Wang

AbstractIn this paper, we present generalizations of the Jacobian matrix and the Hessian matrix to continuous maps and continuously differentiable functions respectively. We then establish second-order optimality conditions for mathematical programming problems with continuously differentiable functions. The results also sharpen the corresponding results for problems involving C1.1-functions.


Author(s):  
Liheng Wu ◽  
Andreas Müller ◽  
Jian S. Dai

Higher order loop constraints play a key role in the local mobility, singularity and dynamic analysis of closed loop linkages. Recently, closed forms of higher order kinematic constraints have been achieved with nested Lie product in screw coordinates, and are purely algebraic operations. However, the complexity of expressions makes the higher order analysis complicated and highly reliant on computer implementations. In this paper matrix expressions of first and second-order kinematic constraints, i.e. involving the Jacobian and Hessian matrix, are formulated explicitly for single-loop linkages in terms of screw coordinates. For overconstrained linkages, which possess self-stress, the first- and second-order constraints are reduced to a set of quadratic forms. The test for the order of mobility relies on solutions of higher order constraints. Second-order mobility analysis boils down to testing the property of coefficient matrix of the quadratic forms (i.e. the Hessian) rather than to solving them. Thus, the second-order analysis is simplified.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1533
Author(s):  
Jingcheng Zhou ◽  
Wei Wei ◽  
Ruizhi Zhang ◽  
Zhiming Zheng

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.


Author(s):  
Jaya Pratha Sebastiyar ◽  
Martin Sahayaraj Joseph

Distributed joint congestion control and routing optimization has received a significant amount of attention recently. To date, however, most of the existing schemes follow a key idea called the back-pressure algorithm. Despite having many salient features, the first-order sub gradient nature of the back-pressure based schemes results in slow convergence and poor delay performance. To overcome these limitations, the present study was made as first attempt at developing a second-order joint congestion control and routing optimization framework that offers utility-optimality, queue-stability, fast convergence, and low delay.  Contributions in this project are three-fold. The present study propose a new second-order joint congestion control and routing framework based on a primal-dual interior-point approach and established utility-optimality and queue-stability of the proposed second-order method. The results of present study showed that how to implement the proposed second-order method in a distributed fashion.


Author(s):  
Xuan Chen ◽  
Dongyun Lin

This paper tackles the issue of global stabilization for a class of delayed switched inertial neural networks (SINN). Distinct from the frequently employed reduced-order technique, this paper studies SINN directly through non-reduced order method. By constructing a novel Lyapunov functional and using Barbalat Lemma, sufficient conditions for the global asymptotic stabilization issue and global exponential stabilization issue of the considered SINN are established. Numerical simulations further confirm the feasibility of the main results. The comparative research shows that global stabilization results of this paper complement and improve some existing work.


Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1159
Author(s):  
Shyam Sundar Santra ◽  
Omar Bazighifan ◽  
Mihai Postolache

In continuous applications in electrodynamics, neural networks, quantum mechanics, electromagnetism, and the field of time symmetric, fluid dynamics, neutral differential equations appear when modeling many problems and phenomena. Therefore, it is interesting to study the qualitative behavior of solutions of such equations. In this study, we obtained some new sufficient conditions for oscillations to the solutions of a second-order delay differential equations with sub-linear neutral terms. The results obtained improve and complement the relevant results in the literature. Finally, we show an example to validate the main results, and an open problem is included.


Sign in / Sign up

Export Citation Format

Share Document