scholarly journals Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Author(s):  
Farzan Memarian ◽  
Wonjoon Goo ◽  
Rudolf Lioutikov ◽  
Scott Niekum ◽  
Ufuk Topcu
Keyword(s):  
AI Magazine ◽  
2014 ◽  
Vol 35 (4) ◽  
pp. 61-74 ◽  
Author(s):  
Logan Yliniemi ◽  
Adrian K. Agogino ◽  
Kagan Tumer

Teams of artificially intelligent planetary rovers have tremendous potential for space exploration, allowing for reduced cost, increased flexibility and increased reliability. However, having these multiple autonomous devices acting simultaneously leads to a problem of coordination: to achieve the best results, the they should work together. This is not a simple task. Due to the large distances and harsh environments, a rover must be able to perform a wide variety of tasks with a wide variety of potential teammates in uncertain and unsafe environments. Directly coding all the necessary rules that can reliably handle all of this coordination and uncertainty is problematic. Instead, this article examines tackling this problem through the use of coordinated reinforcement learning: rather than being programmed what to do, the rovers iteratively learn through trial and error to take take actions that lead to high overall system return. To allow for coordination, yet allow each agent to learn and act independently, we employ state-of-the-art reward shaping techniques. This article uses visualization techniques to break down complex performance indicators into an accessible form, and identifies key future research directions.


2011 ◽  
Vol 14 (02) ◽  
pp. 251-278 ◽  
Author(s):  
SAM DEVLIN ◽  
DANIEL KUDENKO ◽  
MAREK GRZEŚ

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.


2016 ◽  
Vol 31 (1) ◽  
pp. 44-58 ◽  
Author(s):  
Sam Devlin ◽  
Daniel Kudenko

AbstractRecent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.


Robotics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 105
Author(s):  
Andrew Lobbezoo ◽  
Yanjun Qian ◽  
Hyock-Ju Kwon

The field of robotics has been rapidly developing in recent years, and the work related to training robotic agents with reinforcement learning has been a major focus of research. This survey reviews the application of reinforcement learning for pick-and-place operations, a task that a logistics robot can be trained to complete without support from a robotics engineer. To introduce this topic, we first review the fundamentals of reinforcement learning and various methods of policy optimization, such as value iteration and policy search. Next, factors which have an impact on the pick-and-place task, such as reward shaping, imitation learning, pose estimation, and simulation environment are examined. Following the review of the fundamentals and key factors for reinforcement learning, we present an extensive review of all methods implemented by researchers in the field to date. The strengths and weaknesses of each method from literature are discussed, and details about the contribution of each manuscript to the field are reviewed. The concluding critical discussion of the available literature, and the summary of open problems indicates that experiment validation, model generalization, and grasp pose selection are topics that require additional research.


Sign in / Sign up

Export Citation Format

Share Document