Discounted Semi-Markov Games and Algorithms for Solving Two Structured Classes

Mobile target defense (MTD) is a research hotspot in the field of network security. The decision method of network defense based on game theory is an important technique to guide MTD to make the optimal defense behavior in different network environments (GT-MTD). A lot of related work has been put forward in this field. In this paper, we focus on the scope and field of GT-MTD, systematically introduce the application scenarios of MTD in combination with four different game theory models of classical games (static games, signal games), Markov games, differential games or evolutionary games, and put forward the future development direction. There are some new views and explanations on the research of GT-MTD.

Download Full-text

Bi-personal stochastic transient Markov games with stopping times and total reward criterion

Kybernetika ◽

10.14736/kyb-2021-1-0001 ◽

2021 ◽

pp. 1-14

Author(s):

Martínez-Cortés Victor Manuel

Keyword(s):

Stopping Times ◽

Markov Games ◽

Total Reward ◽

Reward Criterion

Download Full-text

Multi-agent reinforcement learning using ordinal action selection and approximate policy iteration

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691316500533 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650053

Author(s):

Daxue Liu ◽

Jun Wu ◽

Xin Xu

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Action Selection ◽

Policy Iteration ◽

Common Interest ◽

Policy Space ◽

Markov Games ◽

Approximate Policy Iteration ◽

Multi Agent ◽

Agent Coordination

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.

Download Full-text