gradient computation
Recently Published Documents


TOTAL DOCUMENTS

102
(FIVE YEARS 24)

H-INDEX

12
(FIVE YEARS 2)

2022 ◽  
Vol 41 (2) ◽  
pp. 1-21
Author(s):  
Tao Du ◽  
Kui Wu ◽  
Pingchuan Ma ◽  
Sebastien Wah ◽  
Andrew Spielberg ◽  
...  

We present a novel, fast differentiable simulator for soft-body learning and control applications. Existing differentiable soft-body simulators can be classified into two categories based on their time integration methods: Simulators using explicit timestepping schemes require tiny timesteps to avoid numerical instabilities in gradient computation, and simulators using implicit time integration typically compute gradients by employing the adjoint method and solving the expensive linearized dynamics. Inspired by Projective Dynamics ( PD ), we present Differentiable Projective Dynamics ( DiffPD ), an efficient differentiable soft-body simulator based on PD with implicit time integration. The key idea in DiffPD is to speed up backpropagation by exploiting the prefactorized Cholesky decomposition in forward PD simulation. In terms of contact handling, DiffPD supports two types of contacts: a penalty-based model describing contact and friction forces and a complementarity-based model enforcing non-penetration conditions and static friction. We evaluate the performance of DiffPD and observe it is 4–19 times faster compared with the standard Newton’s method in various applications including system identification, inverse design problems, trajectory optimization, and closed-loop control. We also apply DiffPD in a reality-to-simulation ( real-to-sim ) example with contact and collisions and show its capability of reconstructing a digital twin of real-world scenes.


Author(s):  
Jingyan Xu ◽  
Frédéric Noo

Abstract We are interested in learning the hyperparameters in a convex objective function in a supervised setting. The complex relationship between the input data to the convex problem and the desirable hyperparameters can be modeled by a neural network; the hyperparameters and the data then drive the convex minimization problem, whose solution is then compared to training labels. In our previous work [1], we evaluated a prototype of this learning strategy in an optimization-based sinogram smoothing plus FBP reconstruction framework. A question arising in this setting is how to efficiently compute (backpropagate) the gradient from the solution of the optimization problem, to the hyperparameters to enable end-to-end training. In this work, we first develop general formulas for gradient backpropagation for a subset of convex problems, namely the proximal mapping. To illustrate the value of the general formulas and to demonstrate how to use them, we consider the specific instance of 1-D quadratic smoothing (denoising) whose solution admits a dynamic programming (DP) algorithm. The general formulas lead to another DP algorithm for exact computation of the gradient of the hyperparameters. Our numerical studies demonstrate a 55%- 65% computation time savings by providing a custom gradient instead of relying on automatic differentiation in deep learning libraries. While our discussion focuses on 1-D quadratic smoothing, our initial results (not presented) support the statement that the general formulas and the computational strategy apply equally well to TV or Huber smoothing problems on simple graphs whose solutions can be computed exactly via DP.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Stephan Thaler ◽  
Julija Zavadlav

AbstractIn molecular dynamics (MD), neural network (NN) potentials trained bottom-up on quantum mechanical data have seen tremendous success recently. Top-down approaches that learn NN potentials directly from experimental data have received less attention, typically facing numerical and computational challenges when backpropagating through MD simulations. We present the Differentiable Trajectory Reweighting (DiffTRe) method, which bypasses differentiation through the MD simulation for time-independent observables. Leveraging thermodynamic perturbation theory, we avoid exploding gradients and achieve around 2 orders of magnitude speed-up in gradient computation for top-down learning. We show effectiveness of DiffTRe in learning NN potentials for an atomistic model of diamond and a coarse-grained model of water based on diverse experimental observables including thermodynamic, structural and mechanical properties. Importantly, DiffTRe also generalizes bottom-up structural coarse-graining methods such as iterative Boltzmann inversion to arbitrary potentials. The presented method constitutes an important milestone towards enriching NN potentials with experimental data, particularly when accurate bottom-up data is unavailable.


Quantum ◽  
2021 ◽  
Vol 5 ◽  
pp. 529
Author(s):  
Chenyi Zhang ◽  
Jiaqi Leng ◽  
Tongyang Li

We initiate the study of quantum algorithms for escaping from saddle points with provable guarantee. Given a function f:Rn→R, our quantum algorithm outputs an ϵ-approximate second-order stationary point using O~(log2⁡(n)/ϵ1.75) queries to the quantum evaluation oracle (i.e., the zeroth-order oracle). Compared to the classical state-of-the-art algorithm by Jin et al. with O~(log6⁡(n)/ϵ1.75) queries to the gradient oracle (i.e., the first-order oracle), our quantum algorithm is polynomially better in terms of log⁡n and matches its complexity in terms of 1/ϵ. Technically, our main contribution is the idea of replacing the classical perturbations in gradient descent methods by simulating quantum wave equations, which constitutes the improvement in the quantum query complexity with log⁡n factors for escaping from saddle points. We also show how to use a quantum gradient computation algorithm due to Jordan to replace the classical gradient queries by quantum evaluation queries with the same complexity. Finally, we also perform numerical experiments that support our theoretical findings.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5124
Author(s):  
Haijie Pan ◽  
Lirong Zheng

Machine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed in this study. DisSAGD corrects the gradient estimate for each iteration by using the gradient variance of historical iterations without full gradient computation or additional storage, i.e., it reduces the mean variance of historical gradients in order to reduce the error in updating parameters. We implemented DisSAGD in distributed clusters in order to train a machine learning model by sharing parameters among nodes using an asynchronous communication protocol. We also propose an adaptive learning rate strategy, as well as a sampling strategy, to address the update lag of the overall parameter distribution, which helps to improve the convergence speed when the parameters deviate from the optimal value—when one working node is faster than another, this node will have more time to compute the local gradient and sample more samples for the next iteration. Our experiments demonstrate that DisSAGD significantly reduces waiting times during loop iterations and improves convergence speed when compared to traditional methods, and that our method can achieve speed increases for distributed clusters.


Sign in / Sign up

Export Citation Format

Share Document