gradient methods
Recently Published Documents


TOTAL DOCUMENTS

1166
(FIVE YEARS 300)

H-INDEX

62
(FIVE YEARS 7)

2022 ◽  
Vol 2022 (1) ◽  
Author(s):  
Zabidin Salleh ◽  
Adel Almarashi ◽  
Ahmad Alhawarat

AbstractThe conjugate gradient method can be applied in many fields, such as neural networks, image restoration, machine learning, deep learning, and many others. Polak–Ribiere–Polyak and Hestenses–Stiefel conjugate gradient methods are considered as the most efficient methods to solve nonlinear optimization problems. However, both methods cannot satisfy the descent property or global convergence property for general nonlinear functions. In this paper, we present two new modifications of the PRP method with restart conditions. The proposed conjugate gradient methods satisfy the global convergence property and descent property for general nonlinear functions. The numerical results show that the new modifications are more efficient than recent CG methods in terms of number of iterations, number of function evaluations, number of gradient evaluations, and CPU time.


2022 ◽  
Vol 73 ◽  
pp. 117-171
Author(s):  
Adrien Bolland ◽  
Ioannis Boukas ◽  
Mathias Berger ◽  
Damien Ernst

We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the policy are all parametrized, assumed known and differentiable with respect to their parameters. We then introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation and takes projected gradient ascent steps in the space of environment and policy parameters. This algorithm is referred to as Direct Environment and Policy Search (DEPS). We assess the performance of our algorithm in three environments concerned with the design and control of a mass-spring-damper system, a small-scale off-grid power system and a drone, respectively. In addition, our algorithm is benchmarked against a state-of-the-art deep reinforcement learning algorithm used to tackle joint design and control problems. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations. Finally, solutions produced by our algorithm are also compared with solutions produced by an algorithm that does not jointly optimize environment and policy parameters, highlighting the fact that higher returns can be achieved when joint optimization is performed.


2022 ◽  
pp. 61-82
Author(s):  
J J McKeown ◽  
D Meegan ◽  
D Sprevak
Keyword(s):  

2022 ◽  
pp. 83-104
Author(s):  
J J McKeown ◽  
D Meegan ◽  
D Sprevak
Keyword(s):  

Author(s):  
Eghbal Hosseini ◽  
Line Reinhardt ◽  
Danda B. Rawat

2021 ◽  
Vol 1 (2) ◽  
pp. 33-39
Author(s):  
Mónika Farsang ◽  
Luca Szegletes

Learning the optimal behavior is the ultimate goal in reinforcement learning. This can be achieved by many different approaches, the most successful of them are policy gradient methods. However, they can suffer from undesirably large updates of policies, leading to poor performance. In recent years there has been a clear trend toward designing more reliable algorithms. This paper addresses to examine different restriction strategies applied to the widely used Proximal Policy Optimization (PPO-Clip) technique. We also question whether the analyzed methods are able to adapt not only to low-dimensional tasks but also to complex, high-dimensional problems in control and robotic domains. The analysis of the learned behavior shows that these methods can lead to better performance compared to the original PPO-Clip algorithm, moreover, they are also able to achieve complex behavior and policies in high-dimensional environments.


2021 ◽  
Author(s):  
Shicong Cen ◽  
Chen Cheng ◽  
Yuxin Chen ◽  
Yuting Wei ◽  
Yuejie Chi

Preconditioning and Regularization Enable Faster Reinforcement Learning Natural policy gradient (NPG) methods, in conjunction with entropy regularization to encourage exploration, are among the most popular policy optimization algorithms in contemporary reinforcement learning. Despite the empirical success, the theoretical underpinnings for NPG methods remain severely limited. In “Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization”, Cen, Cheng, Chen, Wei, and Chi develop nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on tabular discounted Markov decision processes. Assuming access to exact policy evaluation, the authors demonstrate that the algorithm converges linearly at an astonishing rate that is independent of the dimension of the state-action space. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Accommodating a wide range of learning rates, this convergence result highlights the role of preconditioning and regularization in enabling fast convergence.


Sign in / Sign up

Export Citation Format

Share Document