scholarly journals Adjoint-based exact Hessian computation

Author(s):  
Shin-ichi Ito ◽  
Takeru Matsuda ◽  
Yuto Miyatake

AbstractWe consider a scalar function depending on a numerical solution of an initial value problem, and its second-derivative (Hessian) matrix for the initial value. The need to extract the information of the Hessian or to solve a linear system having the Hessian as a coefficient matrix arises in many research fields such as optimization, Bayesian estimation, and uncertainty quantification. From the perspective of memory efficiency, these tasks often employ a Krylov subspace method that does not need to hold the Hessian matrix explicitly and only requires computing the multiplication of the Hessian and a given vector. One of the ways to obtain an approximation of such Hessian-vector multiplication is to integrate the so-called second-order adjoint system numerically. However, the error in the approximation could be significant even if the numerical integration to the second-order adjoint system is sufficiently accurate. This paper presents a novel algorithm that computes the intended Hessian-vector multiplication exactly and efficiently. For this aim, we give a new concise derivation of the second-order adjoint system and show that the intended multiplication can be computed exactly by applying a particular numerical method to the second-order adjoint system. In the discussion, symplectic partitioned Runge–Kutta methods play an essential role.

Author(s):  
Liheng Wu ◽  
Andreas Müller ◽  
Jian S. Dai

Higher order loop constraints play a key role in the local mobility, singularity and dynamic analysis of closed loop linkages. Recently, closed forms of higher order kinematic constraints have been achieved with nested Lie product in screw coordinates, and are purely algebraic operations. However, the complexity of expressions makes the higher order analysis complicated and highly reliant on computer implementations. In this paper matrix expressions of first and second-order kinematic constraints, i.e. involving the Jacobian and Hessian matrix, are formulated explicitly for single-loop linkages in terms of screw coordinates. For overconstrained linkages, which possess self-stress, the first- and second-order constraints are reduced to a set of quadratic forms. The test for the order of mobility relies on solutions of higher order constraints. Second-order mobility analysis boils down to testing the property of coefficient matrix of the quadratic forms (i.e. the Hessian) rather than to solving them. Thus, the second-order analysis is simplified.


Author(s):  
Sheng-Wei Chen ◽  
Chun-Nan Chou ◽  
Edward Y. Chang

For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kroneckerfactored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.


2015 ◽  
Vol 12 (11) ◽  
pp. 4584-4592
Author(s):  
Zhongming Teng ◽  
Linzhang Lu ◽  
Xiaoqian Niu

2018 ◽  
Vol 5 (1) ◽  
pp. 102-112 ◽  
Author(s):  
Shekhar Singh Negi ◽  
Syed Abbas ◽  
Muslim Malik

AbstractBy using of generalized Opial’s type inequality on time scales, a new oscillation criterion is given for a singular initial-value problem of second-order dynamic equation on time scales. Some oscillatory results of its generalizations are also presented. Example with various time scales is given to illustrate the analytical findings.


Author(s):  
Yuka Hashimoto ◽  
Takashi Nodera

AbstractThe Krylov subspace method has been investigated and refined for approximating the behaviors of finite or infinite dimensional linear operators. It has been used for approximating eigenvalues, solutions of linear equations, and operator functions acting on vectors. Recently, for time-series data analysis, much attention is being paid to the Krylov subspace method as a viable method for estimating the multiplications of a vector by an unknown linear operator referred to as a transfer operator. In this paper, we investigate a convergence analysis for Krylov subspace methods for estimating operator-vector multiplications.


Author(s):  
Jonas Dünnebacke ◽  
Stefan Turek ◽  
Christoph Lohmann ◽  
Andriy Sokolov ◽  
Peter Zajac

We discuss how “parallel-in-space & simultaneous-in-time” Newton-multigrid approaches can be designed which improve the scaling behavior of the spatial parallelism by reducing the latency costs. The idea is to solve many time steps at once and therefore solving fewer but larger systems. These large systems are reordered and interpreted as a space-only problem leading to multigrid algorithm with semi-coarsening in space and line smoothing in time direction. The smoother is further improved by embedding it as a preconditioner in a Krylov subspace method. As a prototypical application, we concentrate on scalar partial differential equations (PDEs) with up to many thousands of time steps which are discretized in time, resp., space by finite difference, resp., finite element methods. For linear PDEs, the resulting method is closely related to multigrid waveform relaxation and its theoretical framework. In our parabolic test problems the numerical behavior of this multigrid approach is robust w.r.t. the spatial and temporal grid size and the number of simultaneously treated time steps. Moreover, we illustrate how corresponding time-simultaneous fixed-point and Newton-type solvers can be derived for nonlinear nonstationary problems that require the described solution of linearized problems in each outer nonlinear step. As the main result, we are able to generate much larger problem sizes to be treated by a large number of cores so that the combination of the robustly scaling multigrid solvers together with a larger degree of parallelism allows a faster solution procedure for nonstationary problems.


Sign in / Sign up

Export Citation Format

Share Document