Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning

Author(s):  
Patrick Mannion ◽  
Sam Devlin ◽  
Jim Duggan ◽  
Enda Howley

AbstractThe majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.

Author(s):  
Akkhachai Phuphanin ◽  
Wipawee Usaha

Coverage control is crucial for the deployment of wireless sensor networks (WSNs). However, most coverage control schemes are based on single objective optimization such as coverage area only, which do not consider other contradicting objectives such as energy consumption, the number of working nodes, wasteful overlapping areas. This paper proposes on a Multi-Objective Optimization (MOO) coverage control called Scalarized Q Multi-Objective Reinforcement Learning (SQMORL). The two objectives are to achieve the maximize area coverage and to minimize the overlapping area to reduce energy consumption. Performance evaluation is conducted for both simulation and multi-agent lighting control testbed experiments. Simulation results show that SQMORL can obtain more efficient area coverage with fewer working nodes than other existing schemes.  The hardware testbed results show that SQMORL algorithm can find the optimal policy with good accuracy from the repeated runs.


2016 ◽  
Vol 31 (1) ◽  
pp. 44-58 ◽  
Author(s):  
Sam Devlin ◽  
Daniel Kudenko

AbstractRecent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.


Author(s):  
Battilotti Stefano ◽  
Delli Priscoli Francesco ◽  
Gori Giorgi Claudio ◽  
Monaco Salvatore ◽  
Panfili Martina ◽  
...  

2021 ◽  
Vol 12 (4) ◽  
pp. 125-145
Author(s):  
Wafa Aouadj ◽  
Mohamed-Rida Abdessemed ◽  
Rachid Seghir

This study concerns a swarm of autonomous reactive mobile robots, qualified of naïve because of their simple constitution, having the mission of gathering objects randomly distributed while respecting two contradictory objectives: maximizing quality of the emergent heap-formation and minimizing energy consumed by aforesaid robots. This problem poses two challenges: it is a multi-objective optimization problem and it is a hard problem. To solve it, one of renowned multi-objective evolutionary algorithms is used. Obtained solution, via a simulation process, unveils a close relationship between behavioral-rules and consumed energy; it represents the sought behavioral model, optimizing the grouping quality and energy consumption. Its reliability is shown by evaluating its robustness, scalability, and flexibility. Also, it is compared with a single-objective behavioral model. Results' analysis proves its high robustness, its superiority in terms of scalability and flexibility, and its longevity measured based on the activity time of the robotic system that it integrates.


2019 ◽  
Vol 22 (2) ◽  
pp. 402-422 ◽  
Author(s):  
Matthew B. Johns ◽  
Edward Keedwell ◽  
Dragan Savic

Abstract Water system design problems are complex and difficult to optimise. It has been demonstrated that involving engineering expertise is required to tackle real-world problems. This paper presents two engineering inspired hybrid evolutionary algorithms (EAs) for the multi-objective design of water distribution networks. The heuristics are developed from traditional design approaches of practicing engineers and integrated into the mutation operator of a multi-objective EA. The first engineering inspired heuristic is designed to identify hydraulic bottlenecks within the network and eliminate them with a view to speeding up the algorithm's search to the feasible solution space. The second heuristic is based on the notion that pipe diameters smoothly transition from large, at the source, to small at the extremities of the network. The performance of the engineering inspired hybrid EAs is compared with Non-Dominated Sorting Genetic Algorithm II and assessed on three networks of varying complexity, two benchmarks and one real-world network. The experiments presented in this paper demonstrate that the incorporation of engineering expertise can improve EA performance, often producing superior solutions both in terms of mathematical optimality and also engineering feasibility.


Robotics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 104
Author(s):  
Joris De Winter ◽  
Albert De Beir ◽  
Ilias El Makrini ◽  
Greet Van de Perre ◽  
Ann Nowé ◽  
...  

The assembly industry is shifting more towards customizable products, or requiring assembly of small batches. This requires a lot of reprogramming, which is expensive because a specialized engineer is required. It would be an improvement if untrained workers could help a cobot to learn an assembly sequence by giving advice. Learning an assembly sequence is a hard task for a cobot, because the solution space increases drastically when the complexity of the task increases. This work introduces a novel method where human knowledge is used to reduce this solution space, and as a result increases the learning speed. The method proposed is the IRL-PBRS method, which uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS), in a simulated environment, to focus learning on a smaller part of the solution space. The method was compared in simulation to two other feedback strategies. The results show that IRL-PBRS converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions. Finally, a use case is presented where participants were asked to program an assembly task. Here, the results show that IRL-PBRS learns quickly enough to keep up with advice given by a user, and is able to adapt online to a changing knowledge base.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 39974-39982 ◽  
Author(s):  
Yuandou Wang ◽  
Hang Liu ◽  
Wanbo Zheng ◽  
Yunni Xia ◽  
Yawen Li ◽  
...  

2011 ◽  
Vol 14 (02) ◽  
pp. 279-305 ◽  
Author(s):  
BIKRAMJIT BANERJEE ◽  
LANDON KRAEMER

The design of reinforcement learning solutions to many problems artificially constrain the action set available to an agent, in order to limit the exploration/sample complexity. While exploring, if an agent can discover new actions that can break through the constraints of its basic/atomic action set, then the quality of the learned decision policy could improve. On the flipside, considering all possible non-atomic actions might explode the exploration complexity. We present a novel heuristic solution to this dilemma, and empirically evaluate it in grid navigation tasks. In particular, we show that both the solution quality and the sample complexity improve significantly when basic reinforcement learning is coupled with action discovery. Our approach relies on reducing the number of decision points, which is particularly suited for multiagent coordination learning, since agents tend to learn more easily with fewer coordination problems (CPs). To demonstrate this we extend action discovery to multi-agent reinforcement learning. We show that Joint Action Learners (JALs) indeed learn coordination policies of higher quality with lower sample complexity when coupled with action discovery, in a multi-agent box-pushing task.


Sign in / Sign up

Export Citation Format

Share Document