Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Games constitute a challenging domain of reinforcement learning (RL) for acquiring strategies because many of them include multiple players and many unobservable variables in a large state space. The difficulty of solving such realistic multiagent problems with partial observability arises mainly from the fact that the computational cost for the estimation and prediction in the whole state space, including unobservable variables, is too heavy. To overcome this intractability and enable an agent to learn in an unknown environment, an effective approximation method is required with explicit learning of the environmental model. We present a model-based RL scheme for large-scale multiagent problems with partial observability and apply it to a card game, hearts. This game is a well-defined example of an imperfect information game and can be approximately formulated as a partially observable Markov decision process (POMDP) for a single learning agent. To reduce the computational cost, we use a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our method is effective in solving such a difficult, partially observable multiagent problem.

Download Full-text

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11478 ◽

2019 ◽

Vol 65 ◽

pp. 1-30 ◽

Cited By ~ 2

Author(s):

Vincent Francois-Lavet ◽

Guillaume Rabusseau ◽

Joelle Pineau ◽

Damien Ernst ◽

Raphael Fonteneau

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Asymptotic Bias ◽

State Representation ◽

Real World Data ◽

Partial Observability ◽

History Of ◽

Batch Reinforcement Learning ◽

Partially Observable ◽

Belief States

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.

Download Full-text

A Model-Based Factored Bayesian Reinforcement Learning Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1092 ◽

2014 ◽

Vol 513-517 ◽

pp. 1092-1095

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Practical Applications ◽

Model Based ◽

Online Planning ◽

Bayesian Reinforcement Learning ◽

Bayesian Inference Method ◽

Unknown Structure

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.

Download Full-text

Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2012.161 ◽

2012 ◽

Cited By ~ 1

Author(s):

Erkin Cilden ◽

Faruk Polat

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Extended Sequence ◽

Partially Observable

Download Full-text

Playing Card-Based RTS Games with Deep Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/631 ◽

2019 ◽

Author(s):

Tianyu Liu ◽

Zijie Zheng ◽

Hongchang Li ◽

Kaigui Bian ◽

Lingyang Song

Keyword(s):

Reinforcement Learning ◽

Decision Tree ◽

State Space ◽

Imperfect Information ◽

Rule Based ◽

Playing Card ◽

Board Games ◽

Deep Model ◽

Game Ai ◽

Rts Game

Game AI is of great importance as games are simulations of reality. Recent research on game AI has shown much progress in various kinds of games, such as console games, board games and MOBA games. However, the exploration in RTS games remains a challenge for their huge state space, imperfect information, sparse rewards and various strategies. Besides, the typical card-based RTS games have complex card features and are still lacking solutions. We present a deep model SEAT (selection-attention) to play card-based RTS games. The SEAT model includes two parts, a selection part for card choice and an attention part for card usage, and it learns from scratch via deep reinforcement learning. Comprehensive experiments are performed on Clash Royale, a popular mobile card-based RTS game. Empirical results show that the SEAT model agent makes it to reach a high winning rate against rule-based agents and decision-tree-based agent.

Download Full-text

A View on Deep Reinforcement Learning in Imperfect Information Games

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.03 ◽

2020 ◽

Vol 65 (2) ◽

pp. 31

Author(s):

T.V. Pricope

Keyword(s):

Reinforcement Learning ◽

Imperfect Information ◽

Large Scale ◽

Traditional Approach ◽

Search Space ◽

Fictitious Play ◽

Learning Agents ◽

Real World Applications ◽

Imperfect Information Games ◽

Human Player

Many real-world applications can be described as large-scale games of imperfect information. This kind of games is particularly harder than the deterministic one as the search space is even more sizeable. In this paper, I want to explore the power of reinforcement learning in such an environment; that is why I take a look at one of the most popular game of such type, no limit Texas Hold’em Poker, yet unsolved, developing multiple agents with different learning paradigms and techniques and then comparing their respective performances. When applied to no-limit Hold’em Poker, deep reinforcement learning agents clearly outperform agents with a more traditional approach. Moreover, if these last agents rival a human beginner level of play, the ones based on reinforcement learning compare to an amateur human player. The main algorithm uses Fictitious Play in combination with ANNs and some handcrafted metrics. We also applied the main algorithm to another game of imperfect information, less complex than Poker, in order to show the scalability of this solution and the increase in performance when put neck in neck with established classical approaches from the reinforcement learning literature.

Download Full-text

Hyperspace Neighbor Penetration Approach to Dynamic Programming for Model-Based Reinforcement Learning Problems with Slowly Changing Variables in a Continuous State Space

10.1109/iccma53594.2021.00018 ◽

2021 ◽

Author(s):

Vincent Zha ◽

Ivey Chiu

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

State Space ◽

Learning Problems ◽

Model Based ◽

Continuous State Space ◽

Continuous State

Download Full-text

Model-based reinforcement learning for a multi-player card game with partial observability

IEEE/WIC/ACM International Conference on Intelligent Agent Technology ◽

10.1109/iat.2005.99 ◽

2006 ◽

Author(s):

H. Fujita ◽

Shin Ishii

Keyword(s):

Reinforcement Learning ◽

Card Game ◽

Partial Observability ◽

Model Based

Download Full-text

Efficient Opponent Exploitation in No-Limit Texas Hold’em Poker: A Neuroevolutionary Method Combined with Reinforcement Learning

Electronics ◽

10.3390/electronics10172087 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2087

Author(s):

Jiahui Xu ◽

Jing Chen ◽

Shaofei Chen

Keyword(s):

Reinforcement Learning ◽

Imperfect Information ◽

Large Scale ◽

Higher Learning ◽

New Approach ◽

Hybrid Framework ◽

Gradient Based ◽

Novel Method ◽

Imperfect Information Games

In the development of artificial intelligence (AI), games have often served as benchmarks to promote remarkable breakthroughs in models and algorithms. No-limit Texas Hold’em (NLTH) is one of the most popular and challenging poker games. Despite numerous studies having been conducted on this subject, there are still some important problems that remain to be solved, such as opponent exploitation, which means to adaptively and effectively exploit specific opponent strategies; this is acknowledged as a vital issue especially in NLTH and many real-world scenarios. Previous researchers tried to use an off-policy reinforcement learning (RL) method to train agents that directly learn from historical strategy interactions but suffered from challenges of sparse rewards. Other researchers instead adopted neuroevolutionary (NE) method to replace RL for policy parameter updates but suffered from high sample complexity due to the large-scale problem of NLTH. In this work, we propose NE_RL, a novel method combing NE with RL for opponent exploitation in NLTH. Our method contains a hybrid framework that uses NE’s advantage of evolutionary computation with a long-term fitness metric to address the sparse rewards feedback in NLTH and retains RL’s gradient-based method for higher learning efficiency. Experimental results against multiple baseline opponents have proved the feasibility of our method with significant improvement compared to previous methods. We hope this paper provides an effective new approach for opponent exploitation in NLTH and other large-scale imperfect information games.

Download Full-text