A Deep Reinforcement Learning Based Mapless Navigation Algorithm Using Continuous Actions

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

A revised reinforcement learning algorithm to model complicated vehicle continuous actions in traffic

2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc.2011.6083005 ◽

2011 ◽

Cited By ~ 1

Author(s):

Linsen Chong ◽

Montasir Abbas ◽

Bryan Higgs ◽

Alejandra Medina ◽

C. Y. David Yang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Continuous Actions ◽

Reinforcement Learning Algorithm

Download Full-text

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/475 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wenjie Shi ◽

Shiji Song ◽

Cheng Wu

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Bellman Equation ◽

Value Functions ◽

Policy Actor ◽

Model Free ◽

Policy Gradient ◽

Gradient Based ◽

Continuous Actions ◽

Stable Learning

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.

Download Full-text

Intrinsically-motivated reinforcement learning for control with continuous actions

2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) ◽

10.1109/iciibms.2017.8279714 ◽

2017 ◽

Author(s):

Ildefons Magrans de Abril ◽

Ryota Kanai

Keyword(s):

Reinforcement Learning ◽

Continuous Actions

Download Full-text

Deep Deterministic Policy Gradient for Navigation of Mobile Robots

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-191711 ◽

2021 ◽

Vol 40 (1) ◽

pp. 349-361

Author(s):

Junior Costa de Jesus ◽

Jair Augusto Bottega ◽

Marco Antonio de Souza Leite Cuadros ◽

Daniel Fernando Tello Gamarra

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Learning Algorithm ◽

Mobile Robot Navigation ◽

Simulated Environments ◽

The Neural Network ◽

Policy Gradient ◽

Reward Functions ◽

Continuous Actions ◽

Reinforcement Learning Algorithm

This article describes the use of the Deep Deterministic Policy Gradient network, a deep reinforcement learning algorithm, for mobile robot navigation. The neural network structure has as inputs laser range findings, angular and linear velocities of the robot, and position and orientation of the mobile robot with respect to a goal position. The outputs of the network will be the angular and linear velocities used as control signals for the robot. The experiments demonstrated that deep reinforcement learning’s techniques that uses continuous actions, are efficient for decision-making in a mobile robot. Nevertheless, the design of the reward functions constitutes an important issue in the performance of deep reinforcement learning algorithms. In order to show the performance of the Deep Reinforcement Learning algorithm, we have applied successfully the proposed architecture in simulated environments and in experiments with a real robot.

Download Full-text