Discounted Semi-Markov Games and Algorithms for Solving Two Structured Classes

Author(s):  
Prasenjit Mondal
Keyword(s):  
2020 ◽  
Vol 309 ◽  
pp. 02012
Author(s):  
Yan Sun ◽  
Weifeng Ji ◽  
Jiang Weng ◽  
Beiying Zhao

Mobile target defense (MTD) is a research hotspot in the field of network security. The decision method of network defense based on game theory is an important technique to guide MTD to make the optimal defense behavior in different network environments (GT-MTD). A lot of related work has been put forward in this field. In this paper, we focus on the scope and field of GT-MTD, systematically introduce the application scenarios of MTD in combination with four different game theory models of classical games (static games, signal games), Markov games, differential games or evolutionary games, and put forward the future development direction. There are some new views and explanations on the research of GT-MTD.


Author(s):  
Daxue Liu ◽  
Jun Wu ◽  
Xin Xu

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.


2020 ◽  
Vol 122 ◽  
pp. 83-104
Author(s):  
Galit Ashkenazi-Golan ◽  
Catherine Rainer ◽  
Eilon Solan

Sign in / Sign up

Export Citation Format

Share Document