Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Download Full-text

Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6036 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5792-5799

Author(s):

Felipe Leno Da Silva ◽

Pablo Hernandez-Leal ◽

Bilal Kartal ◽

Matthew E. Taylor

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Epistemic Uncertainty ◽

Policy Advice ◽

Sequential Decision ◽

Practical Applications ◽

Learning Agents ◽

Standard Value ◽

Random States ◽

Better Than

Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in sequential decision making problems, the sample-complexity of RL techniques still represents a major challenge for practical applications. To combat this challenge, whenever a competent policy (e.g., either a legacy system or a human demonstrator) is available, the agent could leverage samples from this policy (advice) to improve sample-efficiency. However, advice is normally limited, hence it should ideally be directed to states where the agent is uncertain on the best action to execute. In this work, we propose Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its epistemic uncertainty is high for a certain state. RCMP takes into account that the advice is limited and might be suboptimal. We also describe a technique to estimate the agent uncertainty by performing minor modifications in standard value-function-based RL methods. Our empirical evaluations show that RCMP performs better than Importance Advising, not receiving advice, and receiving it at random states in Gridworld and Atari Pong scenarios.

Download Full-text

Incremental State Aggregation for Value Function Estimation in Reinforcement Learning

IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics) ◽

10.1109/tsmcb.2011.2148710 ◽

2011 ◽

Vol 41 (5) ◽

pp. 1407-1416 ◽

Cited By ~ 10

Author(s):

T. Mori ◽

S. Ishii

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Function Estimation ◽

State Aggregation

Download Full-text

Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence ◽

10.1016/j.artint.2007.08.001 ◽

2008 ◽

Vol 172 (4-5) ◽

pp. 454-482 ◽

Cited By ~ 27

Author(s):

André da Motta Salles Barreto ◽

Charles W. Anderson

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Gradient Descent ◽

Value Function ◽

Value Function Approximation ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms

Neural Computation ◽

10.1162/089976699300016070 ◽

1999 ◽

Vol 11 (8) ◽

pp. 2017-2060 ◽

Cited By ~ 70

Author(s):

Csaba Szepesvári ◽

Michael L. Littman

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Learning Algorithms ◽

Sequential Decision ◽

Q Learning ◽

Markov Games ◽

Optimal Behavior ◽

Risk Sensitive ◽

Optimal Value

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download Full-text

Coordinating SON instances: Reinforcement learning with distributed value function

2014 IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication (PIMRC) ◽

10.1109/pimrc.2014.7136431 ◽

2014 ◽

Cited By ~ 2

Author(s):

Ovidiu Iacoboaiea ◽

Berna Sayrac ◽

Sana Ben Jemaa ◽

Pascal Bianchi

Keyword(s):

Reinforcement Learning ◽

Value Function

Download Full-text

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/65 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yong Liu ◽

Yujing Hu ◽

Yang Gao ◽

Yingfeng Chen ◽

Changjie Fan

Keyword(s):

Reinforcement Learning ◽

Knowledge Transfer ◽

Value Function ◽

Single Agent ◽

Multi Agent Systems ◽

Agent Systems ◽

Markov Decision ◽

Dimensional State Space ◽

Multi Agent ◽

Function Transfer

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

Download Full-text

Upper Bounds on the Performance of Discretisation in Reinforcement Learning

South African Computer Journal ◽

10.18489/sacj.v0i57.284 ◽

2015 ◽

Author(s):

Michael Robin Mitchley

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Value Function Approximation ◽

Learning Framework ◽

A Value ◽

Continuous State Space ◽

Policy Representation ◽

Continuous State ◽

Tile Coding ◽

Policy Mapping

Reinforcement learning is a machine learning framework whereby an agent learns to perform a task by maximising its total reward received for selecting actions in each state. The policy mapping states to actions that the agent learns is either represented explicitly, or implicitly through a value function. It is common in reinforcement learning to discretise a continuous state space using tile coding or binary features. We prove an upper bound on the performance of discretisation for direct policy representation or value function approximation.

Download Full-text