Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deheng Ye; Zhao Liu; Mingfei Sun; Bei Shi; Peilin Zhao; Hao Wu; Hongsheng Yu; Shaojie Yang; Xipeng Wu; Qingwei Guo; Qiaobo Chen; Yinyuting Yin; Hao Zhang; Tengfei Shi; Liang Wang; Qiang Fu; Wei Yang; Lanxiao Huang

doi:10.1609/aaai.v34i04.6144

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6144 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6672-6679 ◽

Cited By ~ 1

Author(s):

Deheng Ye ◽

Zhao Liu ◽

Mingfei Sun ◽

Bei Shi ◽

Peilin Zhao ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Action Control ◽

Learning Problem ◽

Complex Action ◽

Complex Control ◽

Learning Framework ◽

High Scalability ◽

Action Spaces ◽

Level Performance

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, the trained AI agents can defeat top professional human players in full 1v1 games.

Download Full-text

An Introduction to Intertask Transfer for Reinforcement Learning

AI Magazine ◽

10.1609/aimag.v32i1.2329 ◽

2011 ◽

Vol 32 (1) ◽

pp. 15 ◽

Cited By ~ 18

Author(s):

Matthew E. Taylor ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Problem ◽

Open Problems ◽

Learning Framework ◽

Learning Domains ◽

Multiple Tasks ◽

Exciting Area ◽

Generalize Information ◽

Selection Of

Transfer learning has recently gained popularity due to the development of algorithms that can successfully generalize information across multiple tasks. This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difﬁcult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area.

Download Full-text

Configurable Environments in Reinforcement Learning: An Overview

Special Topics in Information Technology - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-3-030-85918-3_9 ◽

2022 ◽

pp. 101-113

Author(s):

Alberto Maria Metelli

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Learning Process ◽

Real World ◽

Decision Processes ◽

Learning Problem ◽

Complex Control ◽

Control Frequency ◽

Markov Decision ◽

And Control

AbstractReinforcement Learning (RL) has emerged as an effective approach to address a variety of complex control tasks. In a typical RL problem, an agent interacts with the environment by perceiving observations and performing actions, with the ultimate goal of maximizing the cumulative reward. In the traditional formulation, the environment is assumed to be a fixed entity that cannot be externally controlled. However, there exist several real-world scenarios in which the environment offers the opportunity to configure some of its parameters, with diverse effects on the agent’s learning process. In this contribution, we provide an overview of the main aspects of environment configurability. We start by introducing the formalism of the Configurable Markov Decision Processes (Conf-MDPs) and we illustrate the solutions concepts. Then, we revise the algorithms for solving the learning problem in Conf-MDPs. Finally, we present two applications of Conf-MDPs: policy space identification and control frequency adaptation.

Download Full-text

3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-Scale 3D Point Clouds

2017 IEEE International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2017.605 ◽

2017 ◽

Cited By ~ 20

Author(s):

Fangyu Liu ◽

Shuaipeng Li ◽

Liqiang Zhang ◽

Chenghu Zhou ◽

Rongtian Ye ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Point Clouds ◽

Semantic Parsing ◽

Learning Framework ◽

3D Point Clouds

Download Full-text

HRLB⌃2: A Reinforcement Learning Based Framework for Believable Bots

Applied Sciences ◽

10.3390/app8122453 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2453 ◽

Cited By ~ 5

Author(s):

Christian Arzate Cruz ◽

Jorge Ramirez Uresti

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

State Action ◽

Hierarchical Reinforcement Learning ◽

Learning Framework ◽

Novel Approach ◽

The Creation ◽

Action Spaces ◽

Human Player

The creation of believable behaviors for Non-Player Characters (NPCs) is key to improve the players’ experience while playing a game. To achieve this objective, we need to design NPCs that appear to be controlled by a human player. In this paper, we propose a hierarchical reinforcement learning framework for believable bots (HRLB⌃2). This novel approach has been designed so it can overcome two main challenges currently faced in the creation of human-like NPCs. The first difficulty is exploring domains with high-dimensional state–action spaces, while satisfying constraints imposed by traits that characterize human-like behavior. The second problem is generating behavior diversity, by also adapting to the opponent’s playing style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. The results of our tests demonstrate that our bot behaves in a human-like manner.

Download Full-text

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5857 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4328-4336

Author(s):

Vishal Jain ◽

William Fedus ◽

Hugo Larochelle ◽

Doina Precup ◽

Marc G. Bellemare

Keyword(s):

Reinforcement Learning ◽

Structural Characteristics ◽

Learning Algorithms ◽

Learning Problem ◽

Interactive Fiction ◽

Reward Function ◽

Learning Agent ◽

Accumulated Reward ◽

Partially Observable ◽

Action Spaces

Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learning agent that can play from feedback alone. Our design recognizes and takes advantage of the structural characteristics of text-based games. We first propose a contextualisation mechanism, based on accumulated reward, which simplifies the learning problem and mitigates partial observability. We then study different methods that rely on the notion that most actions are ineffectual in any given situation, following Zahavy et al.'s idea of an admissible action. We evaluate these techniques in a series of text-based games of increasing difficulty based on the TextWorld framework, as well as the iconic game Zork. Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

Download Full-text

Algorithms or Actions? A Study in Large-Scale Reinforcement Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/377 ◽

2018 ◽

Cited By ~ 2

Author(s):

Anderson Rocha Tavares ◽

Sivasubramanian Anbalagan ◽

Leandro Soriano Marcolino ◽

Luiz Chaimowicz

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Real Time ◽

Function Approximation ◽

Large Scale ◽

Sufficient Conditions ◽

Approximation Approach ◽

Conditions For Learning ◽

Strategy Games ◽

Action Spaces

Large state and action spaces are very challenging to reinforcement learning. However, in many domains there is a set of algorithms available, which estimate the best action given a state. Hence, agents can either directly learn a performance-maximizing mapping from states to actions, or from states to algorithms. We investigate several aspects of this dilemma, showing sufficient conditions for learning over algorithms to outperform over actions for a finite number of training iterations. We present synthetic experiments to further study such systems. Finally, we propose a function approximation approach, demonstrating the effectiveness of learning over algorithms in real-time strategy games.

Download Full-text

Optimization of hierarchical reinforcement learning relationship extraction model

Information Discovery and Delivery ◽

10.1108/idd-01-2020-0005 ◽

2020 ◽

Vol 48 (3) ◽

pp. 129-136

Author(s):

Qihang Wu ◽

Daifeng Li ◽

Lu Huang ◽

Biyun Ye

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Relation Extraction ◽

Unified Framework ◽

Data Set ◽

Joint Learning ◽

Content Type ◽

Hierarchical Reinforcement Learning ◽

Learning Framework ◽

Unstructured Text

Purpose Entity relation extraction is an important research direction to obtain structured information. However, most of the current methods are to determine the relations between entities in a given sentence based on a stepwise method, seldom considering entities and relations into a unified framework. The joint learning method is an optimal solution that combines relations and entities. This paper aims to optimize hierarchical reinforcement learning framework and provide an efficient model to extract entity relation. Design/methodology/approach This paper is based on the hierarchical reinforcement learning framework of joint learning and combines the model with BERT, the best language representation model, to optimize the word embedding and encoding process. Besides, this paper adjusts some punctuation marks to make the data set more standardized, and introduces positional information to improve the performance of the model. Findings Experiments show that the model proposed in this paper outperforms the baseline model with a 13% improvement, and achieve 0.742 in F1 score in NYT10 data set. This model can effectively extract entities and relations in large-scale unstructured text and can be applied to the fields of multi-domain information retrieval, intelligent understanding and intelligent interaction. Originality/value The research provides an efficient solution for researchers in a different domain to make use of artificial intelligence (AI) technologies to process their unstructured text more accurately.

Download Full-text

An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions

Science Robotics ◽

10.1126/scirobotics.abb9764 ◽

2020 ◽

Vol 5 (46) ◽

pp. eabb9764

Author(s):

Dong-Ok Won ◽

Klaus-Robert Müller ◽

Seong-Whan Lee

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Test Bed ◽

Good Test ◽

The Real ◽

Learning Framework ◽

Temporal Features ◽

Level Performance ◽

Rules Of The Game ◽

Timing Rules

The game of curling can be considered a good test bed for studying the interaction between artificial intelligence systems and the real world. In curling, the environmental characteristics change at every moment, and every throw has an impact on the outcome of the match. Furthermore, there is no time for relearning during a curling match due to the timing rules of the game. Here, we report a curling robot that can achieve human-level performance in the game of curling using an adaptive deep reinforcement learning framework. Our proposed adaptation framework extends standard deep reinforcement learning using temporal features, which learn to compensate for the uncertainties and nonstationarities that are an unavoidable part of curling. Our curling robot, Curly, was able to win three of four official matches against expert human teams [top-ranked women’s curling teams and Korea national wheelchair curling team (reserve team)]. These results indicate that the gap between physics-based simulators and the real world can be narrowed.

Download Full-text

Efficient Deep Reinforcement Learning via Adaptive Policy Transfer

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/428 ◽

2020 ◽

Author(s):

Tianpei Yang ◽

Jianye Hao ◽

Zhaopeng Meng ◽

Zongzhang Zhang ◽

Yujing Hu ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Policy Transfer ◽

Previous Knowledge ◽

Learning Problem ◽

Continuous Action ◽

Final Performance ◽

Art Policy ◽

Adaptive Policy ◽

Action Spaces

Transfer learning has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing approaches either transfer previous knowledge by explicitly computing similarities between tasks or select appropriate source policies to provide guided explorations. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarities is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) by taking advantage of this idea. PTF learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as an option learning problem. PTF can be easily combined with existing DRL methods and experimental results show it significantly accelerates RL and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.

Download Full-text

Cloud-Based Multimedia Content Protection System

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207448 ◽

2020 ◽

pp. 164-169

Author(s):

B. Aparna ◽

S. Madhavi ◽

G. Mounika ◽

P. Avinash ◽

S. Chakravarthi

Keyword(s):

Large Scale ◽

Protection System ◽

Multimedia Content ◽

Computationally Efficient ◽

Content Protection ◽

Protection Systems ◽

Multimedia Content Protection ◽

High Scalability ◽

Cloud Infrastructures ◽

Rapid Deployment

We propose a new design for large-scale multimedia content protection systems. Our design leverages cloud infrastructures to provide cost efficiency, rapid deployment, scalability, and elasticity to accommodate varying workloads. The proposed system can be used to protect different multimedia content types, including videos, images, audio clips, songs, and music clips. The system can be deployed on private and/or public clouds. Our system has two novel components: (i) method to create signatures of videos, and (ii) distributed matching engine for multimedia objects. The signature method creates robust and representative signatures of videos that capture the depth signals in these videos and it is computationally efficient to compute and compare as well as it requires small storage. The distributed matching engine achieves high scalability and it is designed to support different multimedia objects. We implemented the proposed system and deployed it on two clouds: Amazon cloud and our private cloud. Our experiments with more than 11,000 videos and 1 million images show the high accuracy and scalability of the proposed system. In addition, we compared our system to the protection system used by YouTube and our results show that the YouTube protection system fails to detect most copies of videos, while our system detects more than 98% of them.

Download Full-text