Adaptive Reinforcement Learning and Its Application to Robot Compliance Learning

1995 ◽  
Vol 7 (3) ◽  
pp. 250-262 ◽  
Author(s):  
Boo-Ho Yang ◽  
◽  
Haruhiko Asada

A new learning algorithm for connectionist networks that solves a class of optimal control problems is presented. The algorithm, called Adaptive Reinforcement Learning Algorithm, employs a second network to model immediate reinforcement provided from the task environment and adaptively identities it through repeated experience. Output perturbation and correlation techniques are used to translate mere critic signals into useful learning signals for the connectionist controller. Compared with the direct approaches of reinforcement learning, this algorithm shows faster and guaranteed improvement in the control performance. Robustness against inaccuracy of the model is also discussed. It is demonstrated by simulation that the adaptive reinforcement learning method is efficient and useful in learning a compliance control law in a class of robotic assembly tasks. A simple box palletizing task is used as an example, where a robot is required to move a rectangular part to the corner of a box. In the simulation, the robot is initially provided with only predetermined velocity command to follow the nominal trajectory. At each attempt, the box is randomly located and the part is randomly oriented within the grasp of the end-effector. Therefore, compliant motion control is necessary to guide the part to the corner of the box while avoiding excessive reaction forces caused by the collision with a wall. After repeating the failure in performing the task, the robot can successfully learn force feedback gains to modify its nominal motion. Our results show that the new learning method can be used to learn a compliance control law effectively.

2010 ◽  
Vol 44-47 ◽  
pp. 3611-3615 ◽  
Author(s):  
Zhi Cong Zhang ◽  
Kai Shun Hu ◽  
Hui Yu Huang ◽  
Shuai Li ◽  
Shao Yong Zhao

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.


2014 ◽  
Vol 2014 ◽  
pp. 1-6
Author(s):  
Yuchen Fu ◽  
Quan Liu ◽  
Xionghong Ling ◽  
Zhiming Cui

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.


Author(s):  
Zhen Yu ◽  
Yimin Feng ◽  
Lijun Liu

In general reinforcement learning tasks, the formulation of reward functions is a very important step in reinforcement learning. The reward function is not easy to formulate in a large number of systems. The network training effect is sensitive to the reward function, and different reward value functions will get different results. For a class of systems that meet specific conditions, the traditional reinforcement learning method is improved. A state quantity function is designed to replace the reward function, which is more efficient than the traditional reward function. At the same time, the predictive network link is designed so that the network can learn the value of the general state by using the special state. The overall structure of the network will be improved based on the Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the algorithm was successfully applied in the environment of FrozenLake, and achieved good performance. The experiment proves the effectiveness of the algorithm and realizes rewardless reinforcement learning in a class of systems.


2022 ◽  
Vol 73 ◽  
pp. 102227
Author(s):  
Rong Zhang ◽  
Qibing Lv ◽  
Jie Li ◽  
Jinsong Bao ◽  
Tianyuan Liu ◽  
...  

2014 ◽  
Vol 981 ◽  
pp. 258-261
Author(s):  
Jin Long Du ◽  
Yan Qian

An algorithm which named Boosting algorithm occurred in the last ten years. It can raise the learning algorithm accuracy with multiple learning and obviously improve the efficiency of learning algorithm by adopting the principle of “comprehensive optimizing”. In addition, it can effectively develop the “weak learning algorithm” with low efficiency into a “strong learning algorithm” with high efficiency. The boosting, a new learning method to integration machine, is based in learning theory and show its good qualities in many fields. The paper elaborates and summarizes the basic ideas of Boosting algorithm and applies it into Data Mining (DM). In recent years, the DM has been extensively adopted out of labs such as in commerce, technological research and engineering technology. In this background, the paper tries improving traditional DM algorithm to solve these problems in industrial application.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 198
Author(s):  
Xinhua Wang ◽  
Yuchen Wang ◽  
Lei Guo ◽  
Liancheng Xu ◽  
Baozhong Gao ◽  
...  

Digital library as one of the most important ways in helping students acquire professional knowledge and improve their professional level has gained great attention in recent years. However, its large collection (especially the book resources) hinders students from finding the resources that they are interested in. To overcome this challenge, many researchers have already turned to recommendation algorithms. Compared with traditional recommendation tasks, in the digital library, there are two challenges in book recommendation problems. The first is that users may borrow books that they are not interested in (i.e., noisy borrowing behaviours), such as borrowing books for classmates. The second is that the number of books in a digital library is usually very large, which means one student can only borrow a small set of books in history (i.e., data sparsity issue). As the noisy interactions in students’ borrowing sequences may harm the recommendation performance of a book recommender, we focus on refining recommendations via filtering out data noises. Moreover, due to the the lack of direct supervision information, we treat noise filtering in sequences as a decision-making process and innovatively introduce a reinforcement learning method as our recommendation framework. Furthermore, to overcome the sparsity issue of students’ borrowing behaviours, a clustering-based reinforcement learning algorithm is further developed. Experimental results on two real-world datasets demonstrate the superiority of our proposed method compared with several state-of-the-art recommendation methods.


Robotica ◽  
2004 ◽  
Vol 22 (1) ◽  
pp. 29-39 ◽  
Author(s):  
Chee-Meng Chew ◽  
Gill A. Pratt

This paper presents two frontal plane algorithms for 3D dynamic bipedal walking. One of which is based on the notion of symmetry and the other uses reinforcement learning algorithm to learn the lateral foot placement. The algorithms are combined with a sagittal plane algorithm and successfully applied to a simulated 3D bipedal robot to achieve level ground walking. The simulation results showed that the choice of the local control law for the stance-ankle roll joint could significantly affect the performance of the frontal plane algorithms.


Sign in / Sign up

Export Citation Format

Share Document