convergence and optimality
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 7)

H-INDEX

11
(FIVE YEARS 1)

2021 ◽  
pp. 027836492098587
Author(s):  
Jonathan N. Lee ◽  
Michael Laskey ◽  
Ajay Kumar Tanwani ◽  
Anil Aswani ◽  
Ken Goldberg

On-policy imitation learning algorithms such as DAgger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for DAgger. However, in more realistic models for robotics, the underlying trajectory distribution is dynamic because it is a function of the policy. Recent results show it is possible to prove convergence of DAgger when a regularity condition on the rate of change of the trajectory distributions is satisfied. In this article, we reframe this result using dynamic regret theory from the field of online optimization and show that dynamic regret can be applied to any on-policy algorithm to analyze its convergence and optimality. These results inspire a new algorithm, Adaptive On-Policy Regularization (Aor), that ensures the conditions for convergence. We present simulation results with cart–pole balancing and locomotion benchmarks that suggest Aor can significantly decrease dynamic regret and chattering as the robot learns. To the best of the authors’ knowledge, this is the first application of dynamic regret theory to imitation learning.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Yingfeng Zhao ◽  
Ting Zhao

Applications of generalized linear multiplicative programming problems (LMP) can be frequently found in various areas of engineering practice and management science. In this paper, we present a simple global optimization algorithm for solving linear multiplicative programming problem (LMP). The algorithm is developed by a fusion of a new convex relaxation method and the branch and bound scheme with some accelerating techniques. Global convergence and optimality of the algorithm are also presented and extensive computational results are reported on a wide range of problems from recent literature and GLOBALLib. Numerical experiments show that the proposed algorithm with a new convex relaxation method is more efficient than usual branch and bound algorithm that used linear relaxation for solving the LMP.


Sign in / Sign up

Export Citation Format

Share Document