Upper Bounds on the Performance of Discretisation in Reinforcement Learning

South African Computer Journal ◽

10.18489/sacj.v0i57.284 ◽

2015 ◽

Author(s):

Michael Robin Mitchley

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Value Function Approximation ◽

Learning Framework ◽

A Value ◽

Continuous State Space ◽

Policy Representation ◽

Continuous State ◽

Tile Coding ◽

Policy Mapping

Reinforcement learning is a machine learning framework whereby an agent learns to perform a task by maximising its total reward received for selecting actions in each state. The policy mapping states to actions that the agent learns is either represented explicitly, or implicitly through a value function. It is common in reinforcement learning to discretise a continuous state space using tile coding or binary features. We prove an upper bound on the performance of discretisation for direct policy representation or value function approximation.

Download Full-text

Reinforcement learning versus swarm intelligence for autonomous multi-HAPS coordination

SN Applied Sciences ◽

10.1007/s42452-021-04658-6 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

Ogbonnaya Anicho ◽

Philip B. Charlesworth ◽

Gurvinder S. Baicher ◽

Atulya K. Nagar

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Swarm Intelligence ◽

Performance Indicators ◽

Convergence Rates ◽

Tuning Parameters ◽

Continuous State Space ◽

Continuous State ◽

User Coverage ◽

Better Than

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.

Download Full-text

Reinforcement Learning for Control Using Value Function Approximation

Encyclopedia of Systems and Control ◽

10.1007/978-3-030-44184-5_100067 ◽

2021 ◽

pp. 1868-1873

Author(s):

Konstantinos Gatsis ◽

George J. Pappas

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

Value Function Approximation

Download Full-text

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Mathematics ◽

10.3390/math8091479 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1479

Author(s):

Francisco Martinez-Gil ◽

Miguel Lozano ◽

Ignacio García-Fernández ◽

Pau Romero ◽

Dolors Serra ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Machine Learning Techniques ◽

Inverse Reinforcement Learning ◽

The Real ◽

Q Learning ◽

Learning Framework ◽

Entropy Principle ◽

Real Behavior ◽

Function Approximator

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Download Full-text