Enforcing Signal Temporal Logic Specifications in Multi-Agent Adversarial Environments: A Deep Q-Learning Approach

Author(s):  
Devaprakash Muniraj ◽  
Kyriakos G. Vamvoudakis ◽  
Mazen Farhood
Author(s):  
Tianhao Zhang ◽  
Qiwei Ye ◽  
Jiang Bian ◽  
Guangming Xie ◽  
Tie-Yan Liu

Value function decomposition (VFD) methods under the popular paradigm of centralized training and decentralized execution (CTDE) have promoted multi-agent reinforcement learning progress. However, existing VFD methods proceed from a group's value function decomposition to only solve cooperative tasks. With the individual value function decomposition, we propose MFVFD, a novel multi-agent Q-learning approach for solving cooperative and non-cooperative tasks based on mean-field theory. Our analysis on the Hawk-Dove and Nonmonotonic Cooperation matrix games evaluate MFVFD's convergent solution. Empirical studies on the challenging mixed cooperative-competitive tasks where hundreds of agents coexist demonstrate that MFVFD significantly outperforms existing baselines.


Games ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 8
Author(s):  
Gustavo Chica-Pedraza ◽  
Eduardo Mojica-Nava ◽  
Ernesto Cadena-Muñoz

Multi-Agent Systems (MAS) have been used to solve several optimization problems in control systems. MAS allow understanding the interactions between agents and the complexity of the system, thus generating functional models that are closer to reality. However, these approaches assume that information between agents is always available, which means the employment of a full-information model. Some tendencies have been growing in importance to tackle scenarios where information constraints are relevant issues. In this sense, game theory approaches appear as a useful technique that use a strategy concept to analyze the interactions of the agents and achieve the maximization of agent outcomes. In this paper, we propose a distributed control method of learning that allows analyzing the effect of the exploration concept in MAS. The dynamics obtained use Q-learning from reinforcement learning as a way to include the concept of exploration into the classic exploration-less Replicator Dynamics equation. Then, the Boltzmann distribution is used to introduce the Boltzmann-Based Distributed Replicator Dynamics as a tool for controlling agents behaviors. This distributed approach can be used in several engineering applications, where communications constraints between agents are considered. The behavior of the proposed method is analyzed using a smart grid application for validation purposes. Results show that despite the lack of full information of the system, by controlling some parameters of the method, it has similar behavior to the traditional centralized approaches.


Sign in / Sign up

Export Citation Format

Share Document