Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Teams of artificially intelligent planetary rovers have tremendous potential for space exploration, allowing for reduced cost, increased flexibility and increased reliability. However, having these multiple autonomous devices acting simultaneously leads to a problem of coordination: to achieve the best results, the they should work together. This is not a simple task. Due to the large distances and harsh environments, a rover must be able to perform a wide variety of tasks with a wide variety of potential teammates in uncertain and unsafe environments. Directly coding all the necessary rules that can reliably handle all of this coordination and uncertainty is problematic. Instead, this article examines tackling this problem through the use of coordinated reinforcement learning: rather than being programmed what to do, the rovers iteratively learn through trial and error to take take actions that lead to high overall system return. To allow for coordination, yet allow each agent to learn and act independently, we employ state-of-the-art reward shaping techniques. This article uses visualization techniques to break down complex performance indicators into an accessible form, and identifies key future research directions.

Download Full-text

Reward Shaping

10.1007/springerreference_179443 ◽

2012 ◽

Keyword(s):

Reward Shaping

Download Full-text

Structured Reward Shaping using Signal Temporal Logic specifications

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros40897.2019.8968254 ◽

2019 ◽

Author(s):

Anand Balakrishnan ◽

Jyotirmoy V. Deshmukh

Keyword(s):

Temporal Logic ◽

Reward Shaping

Download Full-text

Continuous Reinforcement Learning With Knowledge-Inspired Reward Shaping for Autonomous Cavity Filter Tuning

2018 IEEE International Conference on Cyborg and Bionic Systems (CBS) ◽

10.1109/cbs.2018.8612197 ◽

2018 ◽

Cited By ~ 1

Author(s):

Zhiyang Wang ◽

Yongsheng Ou ◽

Xinyu Wu ◽

Wei Feng

Keyword(s):

Reinforcement Learning ◽

Continuous Reinforcement ◽

Filter Tuning ◽

Reward Shaping

Download Full-text

Combining Reward Shaping and Curriculum Learning for Training Agents with High Dimensional Continuous Action Spaces

2018 International Conference on Information and Communication Technology Convergence (ICTC) ◽

10.1109/ictc.2018.8539438 ◽

2018 ◽

Cited By ~ 1

Author(s):

Sooyoung Jang ◽

Mikyong Han

Keyword(s):

High Dimensional ◽

Continuous Action ◽

Reward Shaping ◽

Action Spaces

Download Full-text

AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS

Advances in Complex Systems ◽

10.1142/s0219525911002998 ◽

2011 ◽

Vol 14 (02) ◽

pp. 251-278 ◽

Cited By ~ 20

Author(s):

SAM DEVLIN ◽

DANIEL KUDENKO ◽

MAREK GRZEŚ

Keyword(s):

Domain Knowledge ◽

Group Performance ◽

Stochastic Game ◽

Multi Agent Systems ◽

Learning Agents ◽

Specific Behaviour ◽

Reward Shaping ◽

Multi Agent ◽

The Impact ◽

Final Group

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.

Download Full-text

Plan-based reward shaping for multi-agent reinforcement learning

The Knowledge Engineering Review ◽

10.1017/s0269888915000181 ◽

2016 ◽

Vol 31 (1) ◽

pp. 44-58 ◽

Cited By ~ 3

Author(s):

Sam Devlin ◽

Daniel Kudenko

Keyword(s):

Reinforcement Learning ◽

Potential Function ◽

Single Agent ◽

Individual Agent ◽

Reward Shaping ◽

Multi Agent ◽

Theoretical Results ◽

Planning Knowledge

AbstractRecent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Download Full-text

Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

Robotics ◽

10.3390/robotics10030105 ◽

2021 ◽

Vol 10 (3) ◽

pp. 105

Author(s):

Andrew Lobbezoo ◽

Yanjun Qian ◽

Hyock-Ju Kwon

Keyword(s):

Reinforcement Learning ◽

Critical Discussion ◽

Value Iteration ◽

Open Problems ◽

Work Related ◽

Model Generalization ◽

Pick And Place ◽

Reward Shaping ◽

Place Task ◽

Policy Optimization

The field of robotics has been rapidly developing in recent years, and the work related to training robotic agents with reinforcement learning has been a major focus of research. This survey reviews the application of reinforcement learning for pick-and-place operations, a task that a logistics robot can be trained to complete without support from a robotics engineer. To introduce this topic, we first review the fundamentals of reinforcement learning and various methods of policy optimization, such as value iteration and policy search. Next, factors which have an impact on the pick-and-place task, such as reward shaping, imitation learning, pose estimation, and simulation environment are examined. Following the review of the fundamentals and key factors for reinforcement learning, we present an extensive review of all methods implemented by researchers in the field to date. The strengths and weaknesses of each method from literature are discussed, and details about the contribution of each manuscript to the field are reviewed. The concluding critical discussion of the available literature, and the summary of open problems indicates that experiment validation, model generalization, and grasp pose selection are topics that require additional research.

Download Full-text

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Air Combat Strategies Generation of CGF Based on MADDPG and Reward Shaping

Using plan-based reward shaping to learn strategies in StarCraft: Broodwar

Multirobot Coordination for Space Exploration

Reward Shaping

Structured Reward Shaping using Signal Temporal Logic specifications

Continuous Reinforcement Learning With Knowledge-Inspired Reward Shaping for Autonomous Cavity Filter Tuning

Combining Reward Shaping and Curriculum Learning for Training Agents with High Dimensional Continuous Action Spaces

AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS

Plan-based reward shaping for multi-agent reinforcement learning

Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

Export Citation Format