Offline reinforcement learning with anderson acceleration for robotic tasks

Author(s):  
Guoyu Zuo ◽  
Shuai Huang ◽  
Jiangeng Li ◽  
Daoxiong Gong
2019 ◽  
Vol 4 (2) ◽  
pp. 1838-1845 ◽  
Author(s):  
Fabio Amadio ◽  
Adria Colome ◽  
Carme Torras

2020 ◽  
Vol 17 (1) ◽  
pp. 172988141989834
Author(s):  
Guoyu Zuo ◽  
Qishen Zhao ◽  
Jiahao Lu ◽  
Jiangeng Li

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.


Robotica ◽  
2019 ◽  
Vol 38 (9) ◽  
pp. 1558-1575
Author(s):  
Vahid Azimirad ◽  
Mohammad Fattahi Sani

SUMMARYIn this paper, the behavioral learning of robots through spiking neural networks is studied in which the architecture of the network is based on the thalamo-cortico-thalamic circuitry of the mammalian brain. According to a variety of neurons, the Izhikevich model of single neuron is used for the representation of neuronal behaviors. One thousand and ninety spiking neurons are considered in the network. The spiking model of the proposed architecture is derived and prepared for the learning problem of robots. The reinforcement learning algorithm is based on spike-timing-dependent plasticity and dopamine release as a reward. It results in strengthening the synaptic weights of the neurons that are involved in the robot’s proper performance. Sensory and motor neurons are placed in the thalamus and cortical module, respectively. The inputs of thalamo-cortico-thalamic circuitry are the signals related to distance of the target from robot, and the outputs are the velocities of actuators. The target attraction task is used as an example to validate the proposed method in which dopamine is released when the robot catches the target. Some simulation studies, as well as experimental implementation, are done on a mobile robot named Tabrizbot. Experimental studies illustrate that after successful learning, the meantime of catching target is decreased by about 36%. These prove that through the proposed method, thalamo-cortical structure could be trained successfully to learn to perform various robotic tasks.


2016 ◽  
Vol 51 (3) ◽  
pp. 911-940 ◽  
Author(s):  
Javier García ◽  
Roberto Iglesias ◽  
Miguel A. Rodríguez ◽  
Carlos V. Regueiro

Author(s):  
Qiangxing Tian ◽  
Guanchu Wang ◽  
Jinxin Liu ◽  
Donglin Wang ◽  
Yachen Kang

Recently, diverse primitive skills have been learned by adopting the entropy as intrinsic reward, which further shows that new practical skills can be produced by combining a variety of primitive skills. This is essentially skill transfer, very useful for learning high-level skills but quite challenging due to the low efficiency of transferring primitive skills. In this paper, we propose a novel efficient skill transfer method, where we learn independent skills and only independent components of skills are transferred instead of the whole set of skills. More concretely, independent components of skills are obtained through independent component analysis (ICA), which always have a smaller amount (or lower dimension) compared with their mixtures. With a lower dimension, independent skill transfer (IST) exhibits a higher efficiency on learning a given task. Extensive experiments including three robotic tasks demonstrate the effectiveness and high efficiency of our proposed IST method in comparison to direct primitive-skill transfer and conventional reinforcement learning.


2018 ◽  
Vol 100 ◽  
pp. 246-259 ◽  
Author(s):  
Angel Martínez-Tenor ◽  
Juan Antonio Fernández-Madrigal ◽  
Ana Cruz-Martín ◽  
Javier González-Jiménez

Decision ◽  
2016 ◽  
Vol 3 (2) ◽  
pp. 115-131 ◽  
Author(s):  
Helen Steingroever ◽  
Ruud Wetzels ◽  
Eric-Jan Wagenmakers

Sign in / Sign up

Export Citation Format

Share Document