Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning

AbstractThe majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.

Download Full-text

Scalarized Q Multi-Objective Reinforcement Learning for Area Coverage Control and Light Control Implementation

ECTI Transactions on Electrical Engineering, Electronics, and Communications ◽

10.37936/ecti-eec.2018162.171333 ◽

2018 ◽

Vol 16 (2) ◽

pp. 72-82

Author(s):

Akkhachai Phuphanin ◽

Wipawee Usaha

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Area Coverage ◽

Coverage Area ◽

Lighting Control ◽

Multi Objective ◽

Coverage Control ◽

Control Schemes ◽

Multi Agent ◽

Testbed Experiments

Coverage control is crucial for the deployment of wireless sensor networks (WSNs). However, most coverage control schemes are based on single objective optimization such as coverage area only, which do not consider other contradicting objectives such as energy consumption, the number of working nodes, wasteful overlapping areas. This paper proposes on a Multi-Objective Optimization (MOO) coverage control called Scalarized Q Multi-Objective Reinforcement Learning (SQMORL). The two objectives are to achieve the maximize area coverage and to minimize the overlapping area to reduce energy consumption. Performance evaluation is conducted for both simulation and multi-agent lighting control testbed experiments. Simulation results show that SQMORL can obtain more efficient area coverage with fewer working nodes than other existing schemes. The hardware testbed results show that SQMORL algorithm can find the optimal policy with good accuracy from the repeated runs.

Download Full-text

Plan-based reward shaping for multi-agent reinforcement learning

The Knowledge Engineering Review ◽

10.1017/s0269888915000181 ◽

2016 ◽

Vol 31 (1) ◽

pp. 44-58 ◽

Cited By ~ 3

Author(s):

Sam Devlin ◽

Daniel Kudenko

Keyword(s):

Reinforcement Learning ◽

Potential Function ◽

Single Agent ◽

Individual Agent ◽

Reward Shaping ◽

Multi Agent ◽

Theoretical Results ◽

Planning Knowledge

AbstractRecent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Download Full-text

A multi-agent reinforcement learning based approach to Quality of Experience control in Future Internet networks

2015 34th Chinese Control Conference (CCC) ◽

10.1109/chicc.2015.7260662 ◽

2015 ◽

Author(s):

Battilotti Stefano ◽

Delli Priscoli Francesco ◽

Gori Giorgi Claudio ◽

Monaco Salvatore ◽

Panfili Martina ◽

...

Keyword(s):

Reinforcement Learning ◽

Quality Of Experience ◽

Future Internet ◽

Multi Agent ◽

Internet Networks

Download Full-text

Multi-Advisor Reinforcement Learning for Multi-Agent Multi-Objective Smart Home Energy Control

IEEE Transactions on Artificial Intelligence ◽

10.1109/tai.2021.3125918 ◽

2021 ◽

pp. 1-1

Author(s):

Andrew Tittaferrante ◽

Abdulsalam Yassine

Keyword(s):

Reinforcement Learning ◽

Smart Home ◽

Energy Control ◽

Multi Objective ◽

Multi Agent

Download Full-text

A Reliable Behavioral Model

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100107 ◽

2021 ◽

Vol 12 (4) ◽

pp. 125-145

Author(s):

Wafa Aouadj ◽

Mohamed-Rida Abdessemed ◽

Rachid Seghir

Keyword(s):

Optimization Problem ◽

Behavioral Model ◽

Multi Objective Optimization ◽

Activity Time ◽

Multi Objective ◽

Simulation Process ◽

Behavioral Rules ◽

Close Relationship ◽

Single Objective

This study concerns a swarm of autonomous reactive mobile robots, qualified of naïve because of their simple constitution, having the mission of gathering objects randomly distributed while respecting two contradictory objectives: maximizing quality of the emergent heap-formation and minimizing energy consumed by aforesaid robots. This problem poses two challenges: it is a multi-objective optimization problem and it is a hard problem. To solve it, one of renowned multi-objective evolutionary algorithms is used. Obtained solution, via a simulation process, unveils a close relationship between behavioral-rules and consumed energy; it represents the sought behavioral model, optimizing the grouping quality and energy consumption. Its reliability is shown by evaluating its robustness, scalability, and flexibility. Also, it is compared with a single-objective behavioral model. Results' analysis proves its high robustness, its superiority in terms of scalability and flexibility, and its longevity measured based on the activity time of the robotic system that it integrates.

Download Full-text

Multi-agent Multi-objective Learning Using Heuristically Accelerated Reinforcement Learning

2012 Brazilian Robotics Symposium and Latin American Robotics Symposium ◽

10.1109/sbr-lars.2012.10 ◽

2012 ◽

Cited By ~ 2

Author(s):

Leonardo A. Ferreira ◽

Reinaldo A. C. Bianchi ◽

Carlos H. C. Ribeiro

Keyword(s):

Reinforcement Learning ◽

Multi Objective ◽

Multi Agent

Download Full-text

Knowledge-based multi-objective genetic algorithms for the design of water distribution networks

Journal of Hydroinformatics ◽

10.2166/hydro.2019.106 ◽

2019 ◽

Vol 22 (2) ◽

pp. 402-422 ◽

Cited By ~ 3

Author(s):

Matthew B. Johns ◽

Edward Keedwell ◽

Dragan Savic

Keyword(s):

Real World ◽

Water Distribution ◽

Water System ◽

Solution Space ◽

Distribution Networks ◽

Water Distribution Networks ◽

Design Problems ◽

Multi Objective ◽

Knowledge Based ◽

Real World Problems

Abstract Water system design problems are complex and difficult to optimise. It has been demonstrated that involving engineering expertise is required to tackle real-world problems. This paper presents two engineering inspired hybrid evolutionary algorithms (EAs) for the multi-objective design of water distribution networks. The heuristics are developed from traditional design approaches of practicing engineers and integrated into the mutation operator of a multi-objective EA. The first engineering inspired heuristic is designed to identify hydraulic bottlenecks within the network and eliminate them with a view to speeding up the algorithm's search to the feasible solution space. The second heuristic is based on the notion that pipe diameters smoothly transition from large, at the source, to small at the extremities of the network. The performance of the engineering inspired hybrid EAs is compared with Non-Dominated Sorting Genetic Algorithm II and assessed on three networks of varying complexity, two benchmarks and one real-world network. The experiments presented in this paper demonstrate that the incorporation of engineering expertise can improve EA performance, often producing superior solutions both in terms of mathematical optimality and also engineering feasibility.

Download Full-text

Accelerating Interactive Reinforcement Learning by Human Advice for an Assembly Task by a Cobot

Robotics ◽

10.3390/robotics8040104 ◽

2019 ◽

Vol 8 (4) ◽

pp. 104

Author(s):

Joris De Winter ◽

Albert De Beir ◽

Ilias El Makrini ◽

Greet Van de Perre ◽

Ann Nowé ◽

...

Keyword(s):

Reinforcement Learning ◽

Solution Space ◽

Assembly Sequence ◽

Assembly Task ◽

Human Interactions ◽

Simulated Environment ◽

Learning Speed ◽

Reward Shaping ◽

Feedback Strategies ◽

Changing Knowledge

The assembly industry is shifting more towards customizable products, or requiring assembly of small batches. This requires a lot of reprogramming, which is expensive because a specialized engineer is required. It would be an improvement if untrained workers could help a cobot to learn an assembly sequence by giving advice. Learning an assembly sequence is a hard task for a cobot, because the solution space increases drastically when the complexity of the task increases. This work introduces a novel method where human knowledge is used to reduce this solution space, and as a result increases the learning speed. The method proposed is the IRL-PBRS method, which uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS), in a simulated environment, to focus learning on a smaller part of the solution space. The method was compared in simulation to two other feedback strategies. The results show that IRL-PBRS converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions. Finally, a use case is presented where participants were asked to program an assembly task. Here, the results show that IRL-PBRS learns quickly enough to keep up with advice given by a user, and is able to adapt online to a changing knowledge base.

Download Full-text

Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning

IEEE Access ◽

10.1109/access.2019.2902846 ◽

2019 ◽

Vol 7 ◽

pp. 39974-39982 ◽

Cited By ~ 50

Author(s):

Yuandou Wang ◽

Hang Liu ◽

Wanbo Zheng ◽

Yunni Xia ◽

Yawen Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Workflow Scheduling ◽

Multi Objective ◽

Multi Agent

Download Full-text

ACTION DISCOVERY FOR SINGLE AND MULTI-AGENT REINFORCEMENT LEARNING

Advances in Complex Systems ◽

10.1142/s0219525911002937 ◽

2011 ◽

Vol 14 (02) ◽

pp. 279-305 ◽

Cited By ~ 1

Author(s):

BIKRAMJIT BANERJEE ◽

LANDON KRAEMER

Keyword(s):

Reinforcement Learning ◽

Sample Complexity ◽

Solution Quality ◽

Heuristic Solution ◽

Multiagent Coordination ◽

Coordination Problems ◽

Atomic Actions ◽

Decision Points ◽

Multi Agent

The design of reinforcement learning solutions to many problems artificially constrain the action set available to an agent, in order to limit the exploration/sample complexity. While exploring, if an agent can discover new actions that can break through the constraints of its basic/atomic action set, then the quality of the learned decision policy could improve. On the flipside, considering all possible non-atomic actions might explode the exploration complexity. We present a novel heuristic solution to this dilemma, and empirically evaluate it in grid navigation tasks. In particular, we show that both the solution quality and the sample complexity improve significantly when basic reinforcement learning is coupled with action discovery. Our approach relies on reducing the number of decision points, which is particularly suited for multiagent coordination learning, since agents tend to learn more easily with fewer coordination problems (CPs). To demonstrate this we extend action discovery to multi-agent reinforcement learning. We show that Joint Action Learners (JALs) indeed learn coordination policies of higher quality with lower sample complexity when coupled with action discovery, in a multi-agent box-pushing task.

Download Full-text