A Study of Continuous Maximum Entropy Deep Inverse Reinforcement Learning

The assumption of IRL is that demonstrations are optimally acting in an environment. In the past, most of the work on IRL needed to calculate optimal policies for different reward functions. However, this requirement is difficult to satisfy in large or continuous state space tasks. Let alone continuous action space. We propose a continuous maximum entropy deep inverse reinforcement learning algorithm for continuous state space and continues action space, which realizes the depth cognition of the environment model by the way of reconstructing the reward function based on the demonstrations, and a hot start mechanism based on demonstrations to make the training process faster and better. We compare this new approach to well-known IRL algorithms using Maximum Entropy IRL, DDPG, hot start DDPG, etc. Empirical results on classical control environments on OpenAI Gym: MountainCarContinues-v0 show that our approach is able to learn policies faster and better.

Download Full-text

Pursuit-evasion with Decentralized Robotic Swarm in Continuous State Space and Action Space via Deep Reinforcement Learning

Proceedings of the 12th International Conference on Agents and Artificial Intelligence ◽

10.5220/0008971502260233 ◽

2020 ◽

Author(s):

Gurpreet Singh ◽

Daniel Lofaro ◽

Donald Sofge

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Action Space ◽

Pursuit Evasion ◽

Continuous State Space ◽

Continuous State ◽

Robotic Swarm

Download Full-text

Reinforcement learning versus swarm intelligence for autonomous multi-HAPS coordination

SN Applied Sciences ◽

10.1007/s42452-021-04658-6 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

Ogbonnaya Anicho ◽

Philip B. Charlesworth ◽

Gurvinder S. Baicher ◽

Atulya K. Nagar

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Swarm Intelligence ◽

Performance Indicators ◽

Convergence Rates ◽

Tuning Parameters ◽

Continuous State Space ◽

Continuous State ◽

User Coverage ◽

Better Than

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.

Download Full-text

Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals

2011 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2011.6048804 ◽

2011 ◽

Cited By ~ 1

Author(s):

N. Aghasadeghi ◽

T. Bretl

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Path Integrals ◽

Inverse Reinforcement Learning ◽

State Spaces ◽

Continuous State

Download Full-text

Safety-aware Adversarial Inverse Reinforcement Learning (S-AIRL) for Highway Autonomous Driving

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4053427 ◽

2022 ◽

pp. 1-14

Author(s):

Fangjian Li ◽

John R Wagner ◽

Yue Wang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Risky Behaviors ◽

Autonomous Driving ◽

Inverse Reinforcement Learning ◽

Safety Issues ◽

Reward Function ◽

Sampling Process ◽

Safety Awareness ◽

Driving Scenario

Abstract Inverse reinforcement learning (IRL) has been successfully applied in many robotics and autonomous driving studies without the need for hand-tuning a reward function. However, it suffers from safety issues. Compared to the reinforcement learning (RL) algorithms, IRL is even more vulnerable to unsafe situations as it can only infer the importance of safety based on expert demonstrations. In this paper, we propose a safety-aware adversarial inverse reinforcement learning algorithm (S-AIRL). First, the control barrier function (CBF) is used to guide the training of a safety critic, which leverages the knowledge of system dynamics in the sampling process without training an additional guiding policy. The trained safety critic is then integrated into the discriminator to help discern the generated data and expert demonstrations from the standpoint of safety. Finally, to further improve the safety awareness, a regulator is introduced in the loss function of the discriminator training to prevent the recovered reward function from assigning high rewards to the risky behaviors. We tested our S-AIRL in the highway autonomous driving scenario. Comparing to the original AIRL algorithm, with the same level of imitation learning (IL) performance, the proposed S-AIRL can reduce the collision rate by 32.6%.

Download Full-text