Applying statistical generalization to determine search direction for reinforcement learning of movement primitives

Author(s):  
Bojan Nemec ◽  
Denis Forte ◽  
Rok Vuga ◽  
Minija Tamosiunaite ◽  
Florentin Worgotter ◽  
...  
2019 ◽  
Vol 38 (14) ◽  
pp. 1560-1580 ◽  
Author(s):  
Carlos Celemin ◽  
Guilherme Maeda ◽  
Javier Ruiz-del-Solar ◽  
Jan Peters ◽  
Jens Kober

Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.


2018 ◽  
Vol 23 (1) ◽  
pp. 121-131 ◽  
Author(s):  
Zhijun Li ◽  
Ting Zhao ◽  
Fei Chen ◽  
Yingbai Hu ◽  
Chun-Yi Su ◽  
...  

Robotica ◽  
2022 ◽  
pp. 1-16
Author(s):  
Peng Zhang ◽  
Junxia Zhang

Abstract In order to assist patients with lower limb disabilities in normal walking, a new trajectory learning scheme of limb exoskeleton robot based on dynamic movement primitives (DMP) combined with reinforcement learning (RL) was proposed. The developed exoskeleton robot has six degrees of freedom (DOFs). The hip and knee of each artificial leg can provide two electric-powered DOFs for flexion/extension. And two passive-installed DOFs of the ankle were used to achieve the motion of inversion/eversion and plantarflexion/dorsiflexion. The five-point segmented gait planning strategy is proposed to generate gait trajectories. The gait Zero Moment Point stability margin is used as a parameter to construct a stability criteria to ensure the stability of human-exoskeleton system. Based on the segmented gait trajectory planning formation strategy, the multiple-DMP sequences were proposed to model the generation trajectories. Meanwhile, in order to eliminate the effect of uncertainties in joint space, the RL was adopted to learn the trajectories. The experiment demonstrated that the proposed scheme can effectively remove interferences and uncertainties.


2021 ◽  
Vol 11 (23) ◽  
pp. 11184
Author(s):  
Ang Li ◽  
Zhenze Liu ◽  
Wenrui Wang ◽  
Mingchao Zhu ◽  
Yanhui Li ◽  
...  

Dynamic movement primitives (DMPs) are a robust framework for movement generation from demonstrations. This framework can be extended by adding a perturbing term to achieve obstacle avoidance without sacrificing stability. The additional term is usually constructed based on potential functions. Although different potentials are adopted to improve the performance of obstacle avoidance, the profiles of potentials are rarely incorporated into reinforcement learning (RL) framework. In this contribution, we present a RL based method to learn not only the profiles of potentials but also the shape parameters of a motion. The algorithm employed is PI2 (Policy Improvement with Path Integrals), a model-free, sampling-based learning method. By using the PI2, the profiles of potentials and the parameters of the DMPs are learned simultaneously; therefore, we can optimize obstacle avoidance while completing specified tasks. We validate the presented method in simulations and with a redundant robot arm in experiments.


Sign in / Sign up

Export Citation Format

Share Document