opponent modeling Latest Research Papers

Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.

Download Full-text

Formalizing Opponent Modeling with the Rock, Paper, Scissors Game

Games ◽

10.3390/g12030070 ◽

2021 ◽

Vol 12 (3) ◽

pp. 70

Author(s):

Erik Brockbank ◽

Edward Vul

Keyword(s):

Sequential Dependencies ◽

Reasoning Abilities ◽

Opponent Modeling ◽

Precise Characterization ◽

Quantitative Understanding ◽

Human Opponent ◽

Adversarial Reasoning ◽

Fundamental Limitation

In simple dyadic games such as rock, paper, scissors (RPS), people exhibit peculiar sequential dependencies across repeated interactions with a stable opponent. These regularities seem to arise from a mutually adversarial process of trying to outwit their opponent. What underlies this process, and what are its limits? Here, we offer a novel framework for formally describing and quantifying human adversarial reasoning in the rock, paper, scissors game. We first show that this framework enables a precise characterization of the complexity of patterned behaviors that people exhibit themselves, and appear to exploit in others. This combination allows for a quantitative understanding of human opponent modeling abilities. We apply these tools to an experiment in which people played 300 rounds of RPS in stable dyads. We find that although people exhibit very complex move dependencies, they cannot exploit these dependencies in their opponents, indicating a fundamental limitation in people’s capacity for adversarial reasoning. Taken together, the results presented here show how the rock, paper, scissors game allows for precise formalization of human adaptive reasoning abilities.

Download Full-text

Can Social Agents Efficiently Perform in Automated Negotiation?

Applied Sciences ◽

10.3390/app11136022 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6022

Author(s):

Victor Sanchez-Anguix ◽

Okan Tunalı ◽

Reyhan Aydoğan ◽

Vicente Julian

Keyword(s):

Utility Function ◽

Nash Bargaining ◽

Equilibrium Strategy ◽

Automated Negotiation ◽

Social Agents ◽

Social Agent ◽

Opponent Modeling ◽

Social Metrics ◽

Tit For Tat ◽

Final Agreement

In the last few years, we witnessed a growing body of literature about automated negotiation. Mainly, negotiating agents are either purely self-driven by maximizing their utility function or by assuming a cooperative stance by all parties involved in the negotiation. We argue that, while optimizing one’s utility function is essential, agents in a society should not ignore the opponent’s utility in the final agreement to improve the agent’s long-term perspectives in the system. This article aims to show whether it is possible to design a social agent (i.e., one that aims to optimize both sides’ utility functions) while performing efficiently in an agent society. Accordingly, we propose a social agent supported by a portfolio of strategies, a novel tit-for-tat concession mechanism, and a frequency-based opponent modeling mechanism capable of adapting its behavior according to the opponent’s behavior and the state of the negotiation. The results show that the proposed social agent not only maximizes social metrics such as the distance to the Nash bargaining point or the Kalai point but also is shown to be a pure and mixed equilibrium strategy in some realistic agent societies.

Download Full-text

Opponent Modeling in Poker Games

2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS) ◽

10.1109/ddcls49620.2020.9275228 ◽

2020 ◽

Author(s):

Xi Yan ◽

Li Xia ◽

Jun Yang ◽

Qianchuan Zhao

Keyword(s):

Opponent Modeling

Download Full-text

Effective Acceptance Strategy Using Cluster-Based Opponent Modeling in Multilateral Negotiation

Advances in Automated Negotiations - Studies in Computational Intelligence ◽

10.1007/978-981-15-5869-6_6 ◽

2020 ◽

pp. 83-98

Author(s):

Zahra Khosravimehr ◽

Faria Nassiri-Mofakham

Keyword(s):

Multilateral Negotiation ◽

Opponent Modeling

Download Full-text

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/58 ◽

2020 ◽

Author(s):

Ying Wen ◽

Yaodong Yang ◽

Jun Wang

Keyword(s):

Graphical Models ◽

State Of The Art ◽

Rational Agents ◽

Opponent Modeling ◽

Agent Interactions ◽

Bayesian Equilibrium ◽

Thinking Ability ◽

Beauty Contest ◽

Multi Agent ◽

Thinking Process

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem. In this paper, we introduce generalized recursive reasoning (GR2) as a novel framework to model agents with different \emph{hierarchical} levels of rationality; our framework enables agents to exhibit varying levels of ``thinking'' ability thereby allowing higher-level agents to best respond to various less sophisticated learners. We contribute both theoretically and empirically. On the theory side, we devise the hierarchical framework of GR2 through probabilistic graphical models and prove the existence of a perfect Bayesian equilibrium. Within the GR2, we propose a practical actor-critic solver, and demonstrate its convergent property to a stationary point in two-player games through Lyapunov analysis. On the empirical side, we validate our findings on a variety of MARL benchmarks. Precisely, we first illustrate the hierarchical thinking process on the Keynes Beauty Contest, and then demonstrate significant improvements compared to state-of-the-art opponent modeling baselines on the normal-form games and the cooperative navigation benchmark.

Download Full-text

Opponent Modeling Under Partial Observability in StarCraft with Deep Convolutional Encoder-Decoders

Advances in Intelligent Systems and Computing - Intelligent Systems and Applications ◽

10.1007/978-3-030-29516-5_56 ◽

2019 ◽

pp. 751-759

Author(s):

Hyungu Kahng ◽

Seoung Bum Kim

Keyword(s):

Partial Observability ◽

Opponent Modeling ◽

Convolutional Encoder

Download Full-text

A Regularized Opponent Model with Maximum Entropy Objective

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/85 ◽

2019 ◽

Author(s):

Zheng Tian ◽

Ying Wen ◽

Zhichen Gong ◽

Faiz Punakkath ◽

Shihao Zou ◽

...

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Single Agent ◽

Exact Algorithm ◽

Random Variable ◽

Matrix Game ◽

Inference Problem ◽

Opponent Modeling ◽

Binary Random Variable ◽

Multi Agent

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

Download Full-text

Reinforcement Learning under Threats

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019939 ◽

2019 ◽

Vol 33 ◽

pp. 9939-9940 ◽

Cited By ~ 1

Author(s):

Victor Gallego ◽

Roi Naveiro ◽

David Rios Insua

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Potential Threat ◽

Q Learning ◽

Learning Framework ◽

Opponent Modeling ◽

Theoretical Approaches ◽

New Learning ◽

Markov Decision ◽

Multi Agent

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Download Full-text