Learning Macromanagement in Starcraft by Deep Reinforcement Learning

Wenzhen Huang; Qiyue Yin; Junge Zhang; Kaiqi Huang

doi:10.3390/s21103332

Learning Macromanagement in Starcraft by Deep Reinforcement Learning

Sensors ◽

10.3390/s21103332 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3332

Author(s):

Wenzhen Huang ◽

Qiyue Yin ◽

Junge Zhang ◽

Kaiqi Huang

Keyword(s):

Reinforcement Learning ◽

Imperfect Information ◽

Expert Knowledge ◽

High Rate ◽

Current State ◽

Policy Gradient ◽

Novel Method ◽

Strategy Game ◽

Real Time Strategy Game ◽

Very High

StarCraft is a real-time strategy game that provides a complex environment for AI research. Macromanagement, i.e., selecting appropriate units to build depending on the current state, is one of the most important problems in this game. To reduce the requirements for expert knowledge and enhance the coordination of the systematic bot, we select reinforcement learning (RL) to tackle the problem of macromanagement. We propose a novel deep RL method, Mean Asynchronous Advantage Actor-Critic (MA3C), which computes the approximate expected policy gradient instead of the gradient of sampled action to reduce the variance of the gradient, and encode the history queue with recurrent neural network to tackle the problem of imperfect information. The experimental results show that MA3C achieves a very high rate of winning, approximately 90%, against the weaker opponents and it improves the win rate about 30% against the stronger opponents. We also propose a novel method to visualize and interpret the policy learned by MA3C. Combined with the visualized results and the snapshots of games, we find that the learned macromanagement not only adapts to the game rules and the policy of the opponent bot, but also cooperates well with the other modules of MA3C-Bot.

Combining Case-Based Reasoning and Reinforcement Learning for Unit Navigation in Real-Time Strategy Game AI

Case-Based Reasoning Research and Development - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11209-1_36 ◽

2014 ◽

pp. 511-525 ◽

Cited By ~ 10

Author(s):

Stefan Wender ◽

Ian Watson

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Case Based Reasoning ◽

Game Ai ◽

Strategy Game ◽

Real Time Strategy Game ◽

Case Based

Integrating Case-Based Reasoning with Reinforcement Learning for Real-Time Strategy Game Micromanagement

Lecture Notes in Computer Science - PRICAI 2014: Trends in Artificial Intelligence ◽

10.1007/978-3-319-13560-1_6 ◽

2014 ◽

pp. 64-76 ◽

Cited By ~ 3

Author(s):

Stefan Wender ◽

Ian Watson

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Case Based Reasoning ◽

Strategy Game ◽

Real Time Strategy Game ◽

Case Based

Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar

2012 IEEE Conference on Computational Intelligence and Games (CIG) ◽

10.1109/cig.2012.6374183 ◽

2012 ◽

Cited By ~ 33

Author(s):

Stefan Wender ◽

Ian Watson

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Small Scale ◽

The Real ◽

Strategy Game ◽

Real Time Strategy Game

ANALYSIS OF THE APPLICATION OF REINFORCEMENT LEARNING ALGORITHMS ON THE STARCRAFT II VIDEO GAME

Revista Destaques Acadêmicos ◽

10.22410/issn.2176-3070.v11i4a2019.2403 ◽

2020 ◽

Vol 11 (4) ◽

Author(s):

Leandro Vian ◽

Marcelo De Gomensoro Malheiros

Keyword(s):

Reinforcement Learning ◽

Video Game ◽

Cost Effective ◽

Machine Learning Techniques ◽

Specific Training ◽

Board Games ◽

Learning Techniques ◽

Technical Issues ◽

Strategy Game ◽

Real Time Strategy Game

In recent years Machine Learning techniques have become the driving force behind the worldwide emergence of Artificial Intelligence, producing cost-effective and precise tools for pattern recognition and data analysis. A particular approach for the training of neural networks, Reinforcement Learning (RL), achieved prominence creating almost unbeatable artificial opponents in board games like Chess or Go, and also on video games. This paper gives an overview of Reinforcement Learning and tests this approach against a very popular real-time strategy game, Starcraft II. Our goal is to examine the tools and algorithms readily available for RL, also addressing different scenarios where a neural network can be linked to Starcraft II to learn by itself. This work describes both the technical issues involved and the preliminary results obtained by the application of two specific training strategies, A2C and DQN.

Automated Design of Energy Efficient Control Strategies for Building Clusters Using Reinforcement Learning

Journal of Mechanical Design ◽

10.1115/1.4041629 ◽

2018 ◽

Vol 141 (2) ◽

Cited By ~ 2

Author(s):

Philip Odonkor ◽

Kemper Lewis

Keyword(s):

Reinforcement Learning ◽

Control Strategies ◽

Action Space ◽

Continuous Control ◽

Energy Demands ◽

Current State ◽

Decision Cycle ◽

Policy Gradient ◽

Discrete Action ◽

Energy Assets

The control of shared energy assets within building clusters has traditionally been confined to a discrete action space, owing in part to a computationally intractable decision space. In this work, we leverage the current state of the art in reinforcement learning (RL) for continuous control tasks, the deep deterministic policy gradient (DDPG) algorithm, toward addressing this limitation. The goals of this paper are twofold: (i) to design an efficient charged/discharged dispatch policy for a shared battery system within a building cluster and (ii) to address the continuous domain task of determining how much energy should be charged/discharged at each decision cycle. Experimentally, our results demonstrate an ability to exploit factors such as energy arbitrage, along with the continuous action space toward demand peak minimization. This approach is shown to be computationally tractable, achieving efficient results after only 5 h of simulation. Additionally, the agent showed an ability to adapt to different building clusters, designing unique control strategies to address the energy demands of the clusters studied.

Efficient Opponent Exploitation in No-Limit Texas Hold’em Poker: A Neuroevolutionary Method Combined with Reinforcement Learning

Electronics ◽

10.3390/electronics10172087 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2087

Author(s):

Jiahui Xu ◽

Jing Chen ◽

Shaofei Chen

Keyword(s):

Reinforcement Learning ◽

Imperfect Information ◽

Large Scale ◽

Higher Learning ◽

New Approach ◽

Hybrid Framework ◽

Gradient Based ◽

Novel Method ◽

Imperfect Information Games

In the development of artificial intelligence (AI), games have often served as benchmarks to promote remarkable breakthroughs in models and algorithms. No-limit Texas Hold’em (NLTH) is one of the most popular and challenging poker games. Despite numerous studies having been conducted on this subject, there are still some important problems that remain to be solved, such as opponent exploitation, which means to adaptively and effectively exploit specific opponent strategies; this is acknowledged as a vital issue especially in NLTH and many real-world scenarios. Previous researchers tried to use an off-policy reinforcement learning (RL) method to train agents that directly learn from historical strategy interactions but suffered from challenges of sparse rewards. Other researchers instead adopted neuroevolutionary (NE) method to replace RL for policy parameter updates but suffered from high sample complexity due to the large-scale problem of NLTH. In this work, we propose NE_RL, a novel method combing NE with RL for opponent exploitation in NLTH. Our method contains a hybrid framework that uses NE’s advantage of evolutionary computation with a long-term fitness metric to address the sparse rewards feedback in NLTH and retains RL’s gradient-based method for higher learning efficiency. Experimental results against multiple baseline opponents have proved the feasibility of our method with significant improvement compared to previous methods. We hope this paper provides an effective new approach for opponent exploitation in NLTH and other large-scale imperfect information games.

Modelling Sentence Pairs via Reinforcement Learning: An Actor-Critic Approach to Learn the Irrelevant Words

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6230 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7358-7366

Author(s):

Mahtab Ahmed ◽

Robert E. Mercer

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Language Processing ◽

Learning Task ◽

Strong Impact ◽

Sequential Decision ◽

Sentence Pair ◽

Current State ◽

Policy Gradient ◽

The Rich

Learning sentence representation is a fundamental task in Natural Language Processing. Most of the existing sentence pair modelling architectures focus only on extracting and using the rich sentence pair features. The drawback of utilizing all of these features makes the learning process much harder. In this study, we propose a reinforcement learning (RL) method to learn a sentence pair representation when performing tasks like semantic similarity, paraphrase identification, and question-answer pair modelling. We formulate this learning problem as a sequential decision making task where the decision made in the current state will have a strong impact on the following decisions. We address this decision making with a policy gradient RL method which chooses the irrelevant words to delete by looking at the sub-optimal representation of the sentences being compared. With this policy, extensive experiments show that our model achieves on par performance when learning task-specific representations of sentence pairs without needing any further knowledge like parse trees. We suggest that the simplicity of each task inference provided by our RL model makes it easier to explain.

Combining Case-Based Reasoning and Reinforcement Learning for Tactical Unit Selection in Real-Time Strategy Game AI

Case-Based Reasoning Research and Development - Lecture Notes in Computer Science ◽

10.1007/978-3-319-47096-2_28 ◽

2016 ◽

pp. 413-429 ◽

Cited By ~ 2

Author(s):

Stefan Wender ◽

Ian Watson

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Case Based Reasoning ◽

Unit Selection ◽

Game Ai ◽

Strategy Game ◽

Real Time Strategy Game ◽

Case Based

A Multiagent Potential Field-Based Bot for Real-Time Strategy Games

International Journal of Computer Games Technology ◽

10.1155/2009/910819 ◽

2009 ◽

Vol 2009 ◽

pp. 1-10 ◽

Cited By ~ 16

Author(s):

Johan Hagelbäck ◽

Stefan J. Johansson

Keyword(s):

Real Time ◽

Potential Field ◽

Imperfect Information ◽

State Of The Art ◽

Performance Standards ◽

Dynamic Environments ◽

Potential Fields ◽

Strategy Games ◽

Strategy Game ◽

Real Time Strategy Game

Bots for real-time strategy (RTS) games may be very challenging to implement. A bot controls a number of units that will have to navigate in a partially unknown environment, while at the same time avoid each other, search for enemies, and coordinate attacks to fight them down. Potential fields are a technique originating from the area of robotics where it is used in controlling the navigation of robots in dynamic environments. Although attempts have been made to transfer the technology to the gaming sector, assumed problems with efficiency and high costs for implementation have made the industry reluctant to adopt it. We present a multiagent potential field-based bot architecture that is evaluated in two different real-time strategy game settings and compare them, both in terms of performance, and in terms of softer attributes such as configurability with other state-of-the-art solutions. We show that the solution is a highly configurable bot that can match the performance standards of traditional RTS bots. Furthermore, we show that our approach deals with Fog of War (imperfect information about the opponent units) surprisingly well. We also show that a multiagent potential field-based bot is highly competitive in a resource gathering scenario.

Explainable Reinforcement Learning through a Causal Lens

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5631 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2493-2500

Author(s):

Prashan Madumal ◽

Tim Miller ◽

Liz Sonenberg ◽

Frank Vetere

Keyword(s):

Reinforcement Learning ◽

Causal Model ◽

Causal Models ◽

Causal Relationships ◽

Causal Explanations ◽

Making Sense ◽

Model Free ◽

The World ◽

Strategy Game ◽

Real Time Strategy Game

Prominent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen by referring to counterfactuals — things that did not happen. In this paper, we use causal models to derive causal explanations of the behaviour of model-free reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We computationally evaluate the model in 6 domains and measure performance and task prediction accuracy. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigate: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.