HRLB⌃2: A Reinforcement Learning Based Framework for Believable Bots

The creation of believable behaviors for Non-Player Characters (NPCs) is key to improve the players’ experience while playing a game. To achieve this objective, we need to design NPCs that appear to be controlled by a human player. In this paper, we propose a hierarchical reinforcement learning framework for believable bots (HRLB⌃2). This novel approach has been designed so it can overcome two main challenges currently faced in the creation of human-like NPCs. The first difficulty is exploring domains with high-dimensional state–action spaces, while satisfying constraints imposed by traits that characterize human-like behavior. The second problem is generating behavior diversity, by also adapting to the opponent’s playing style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. The results of our tests demonstrate that our bot behaves in a human-like manner.

Download Full-text

Safe Exploration of State and Action Spaces in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.3761 ◽

2012 ◽

Vol 45 ◽

pp. 515-564 ◽

Cited By ~ 20

Author(s):

J. Garcia ◽

F. Fernandez

Keyword(s):

Reinforcement Learning ◽

Learning System ◽

Action Space ◽

High Dimensional ◽

State Action ◽

Continuous State ◽

Additional Challenge ◽

Efficient Exploration ◽

Action Spaces ◽

Selection Of

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.

Download Full-text

Hierarchical Reinforcement Learning Framework for Secure UAV Communication in the Presence of Multiple UAV Adaptive Eavesdroppers

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344970 ◽

2020 ◽

Author(s):

Liu Jue ◽

Yang Weiwei

Keyword(s):

Reinforcement Learning ◽

Hierarchical Reinforcement Learning ◽

Learning Framework

Download Full-text

Developing reinforcement learning for adaptive co-construction of continuous high-dimensional state and action spaces

Artificial Life and Robotics ◽

10.1007/s10015-012-0041-5 ◽

2012 ◽

Vol 17 (2) ◽

pp. 204-210 ◽

Cited By ~ 4

Author(s):

Masato Nagayoshi ◽

Hajime Murao ◽

Hisashi Tamaki

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

Action Spaces

Download Full-text

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

Neural Computation ◽

10.1162/neco_a_00199 ◽

2011 ◽

Vol 23 (11) ◽

pp. 2798-2832 ◽

Cited By ~ 11

Author(s):

Hirotaka Hachiya ◽

Jan Peters ◽

Masashi Sugiyama

Keyword(s):

Reinforcement Learning ◽

Expectation Maximization ◽

Search Method ◽

Weighted Regression ◽

High Dimensional ◽

Extended Version ◽

Conference Paper ◽

Policy Search ◽

Learning Framework ◽

Sampling Cost

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R[Formula: see text]), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .)

Download Full-text

Selective exploration exploiting skills in hierarchical reinforcement learning framework

2013 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2013.6696426 ◽

2013 ◽

Cited By ~ 1

Author(s):

Gakuto Masuyama ◽

Atsushi Yamashita ◽

Hajime Asama

Keyword(s):

Reinforcement Learning ◽

Hierarchical Reinforcement Learning ◽

Learning Framework

Download Full-text

Improving energy efficiency in Green femtocell networks: A hierarchical reinforcement learning framework

2013 IEEE International Conference on Communications (ICC) ◽

10.1109/icc.2013.6654861 ◽

2013 ◽

Cited By ~ 17

Author(s):

Xianfu Chen ◽

Honggang Zhang ◽

Tao Chen ◽

Mika Lasanen

Keyword(s):

Energy Efficiency ◽

Reinforcement Learning ◽

Femtocell Networks ◽

Hierarchical Reinforcement Learning ◽

Learning Framework

Download Full-text

Hierarchical reinforcement learning for situated natural language generation

Natural Language Engineering ◽

10.1017/s1351324913000375 ◽

2014 ◽

Vol 21 (3) ◽

pp. 391-435 ◽

Cited By ~ 9

Author(s):

NINA DETHLEFS ◽

HERIBERTO CUAYÁHUITL

Keyword(s):

Reinforcement Learning ◽

Natural Language ◽

Natural Language Generation ◽

Evaluation Study ◽

Sufficient Information ◽

Language Generation ◽

Hierarchical Reinforcement Learning ◽

Novel Approach ◽

Large State Space ◽

Performance Results

AbstractNatural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach forsituated Natural Language Generationin dialogue that is based onhierarchical reinforcement learningand learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off betweenefficiencyanddetailin giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.

Download Full-text

New Approach in Human-AI Interaction by Reinforcement-Imitation Learning

Applied Sciences ◽

10.3390/app11073068 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3068

Author(s):

Neda Navidi ◽

Rene Landry

Keyword(s):

Imitation Learning ◽

Sequential Decision ◽

Exploration Process ◽

State Action ◽

Reward Function ◽

Novel Approach ◽

Specific Objective ◽

Agent Learning ◽

Promising Solution ◽

Action Spaces

Reinforcement Learning (RL) provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent’s training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. This paper proposes a novel approach combining IL with different types of RL methods, namely, state-action-reward-state-action (SARSA) and Asynchronous Advantage Actor–Critic Agents (A3C), to overcome the problems of both stand-alone systems. How to effectively leverage the teacher’s feedback—be it direct binary or indirect detailed—for the agent learner to learn sequential decision-making policies is addressed. The results of this study on various OpenAI-Gym environments show that this algorithmic method can be incorporated with different combinations, and significantly decreases both human endeavors and tedious exploration process.

Download Full-text

Predicting Game Difficulty and Engagement Using AI Players

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3474658 ◽

2021 ◽

Vol 5 (CHI PLAY) ◽

pp. 1-17

Author(s):

Shaghayegh Roohi ◽

Christian Guckelsberger ◽

Asko Relas ◽

Henri Heiskanen ◽

Jari Takatalo ◽

...

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Prediction Accuracy ◽

Selection Strategy ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Average Performance ◽

Novel Approach ◽

Player Modelling ◽

Human Player

This paper presents a novel approach to automated playtesting for the prediction of human player behavior and experience. We have previously demonstrated that Deep Reinforcement Learning (DRL) game-playing agents can predict both game difficulty and player engagement, operationalized as average pass and churn rates. We improve this approach by enhancing DRL with Monte Carlo Tree Search (MCTS). We also motivate an enhanced selection strategy for predictor features, based on the observation that an AI agent's best-case performance can yield stronger correlations with human data than the agent's average performance. Both additions consistently improve the prediction accuracy, and the DRL-enhanced MCTS outperforms both DRL and vanilla MCTS in the hardest levels. We conclude that player modelling via automated playtesting can benefit from combining DRL and MCTS. Moreover, it can be worthwhile to investigate a subset of repeated best AI agent runs, if AI gameplay does not yield good predictions on average.

Download Full-text

Count-Based Exploration in Feature Space for Reinforcement Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/344 ◽

2017 ◽

Cited By ~ 7

Author(s):

Jarryd Martin ◽

Suraj Narayanan S. ◽

Tom Everitt ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Feature Space ◽

Feature Representation ◽

High Dimensional ◽

Training Experience ◽

Approximation Techniques ◽

State Action ◽

Efficient Exploration

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

Download Full-text