Improving RTS Game AI by Supervised Policy Learning, Tactical Search, and Deep Reinforcement Learning

2019 ◽  
Vol 14 (3) ◽  
pp. 8-18 ◽  
Author(s):  
Nicolas A. Barriga ◽  
Marius Stanescu ◽  
Felipe Besoain ◽  
Michael Buro
Author(s):  
Tianyu Liu ◽  
Zijie Zheng ◽  
Hongchang Li ◽  
Kaigui Bian ◽  
Lingyang Song

Game AI is of great importance as games are simulations of reality. Recent research on game AI has shown much progress in various kinds of games, such as console games, board games and MOBA games. However, the exploration in RTS games remains a challenge for their huge state space, imperfect information, sparse rewards and various strategies. Besides, the typical card-based RTS games have complex card features and are still lacking solutions. We present a deep model SEAT (selection-attention) to play card-based RTS games. The SEAT model includes two parts, a selection part for card choice and an attention part for card usage, and it learns from scratch via deep reinforcement learning. Comprehensive experiments are performed on Clash Royale, a popular mobile card-based RTS game. Empirical results show that the SEAT model agent makes it to reach a high winning rate against rule-based agents and decision-tree-based agent.


2021 ◽  
pp. 503-562
Author(s):  
Adil Khan ◽  
Muhammad Naeem ◽  
Asad Masood Khattak ◽  
Muhammad Zubair Asghar ◽  
Abdul Haseeb Malik

Author(s):  
Carles Gelada ◽  
Marc G. Bellemare

In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pioneered by Hallak et al. (2017). Under this method, online updates to the value function are reweighted to avoid divergence issues typical of off-policy learning. While Hallak et al.’s solution is appealing, it cannot easily be transferred to nonlinear function approximation. First, it requires a projection step onto the probability simplex; second, even though the operator describing the expected behavior of the off-policy learning algorithm is convergent, it is not known to be a contraction mapping, and hence, may be more unstable in practice. We address these two issues by introducing a discount factor into COP-TD. We analyze the behavior of discounted COP-TD and find it better behaved from a theoretical perspective. We also propose an alternative soft normalization penalty that can be minimized online and obviates the need for an explicit projection step. We complement our analysis with an empirical evaluation of the two techniques in an off-policy setting on the game Pong from the Atari domain where we find discounted COP-TD to be better behaved in practice than the soft normalization penalty. Finally, we perform a more extensive evaluation of discounted COP-TD in 5 games of the Atari domain, where we find performance gains for our approach.


Author(s):  
Supaphon Kamon ◽  
Tung Due Nguyen ◽  
Tomohiro Harada ◽  
Ruck Thawonmas ◽  
Ikuko Nishikawa

2020 ◽  
Author(s):  
Ao Chen ◽  
Taresh Dewan ◽  
Manva Trivedi ◽  
Danning Jiang ◽  
Aloukik Aditya ◽  
...  

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.


Author(s):  
Andrew Anderson ◽  
Jonathan Dodge ◽  
Amrita Sadarangani ◽  
Zoe Juozapaitis ◽  
Evan Newman ◽  
...  

We present a user study to investigate the impact of explanations on non-experts? understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants? mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.


Sign in / Sign up

Export Citation Format

Share Document