A Deep Reinforcement Learning Based Mapless Navigation Algorithm Using Continuous Actions

Author(s):  
Nanxun Duo ◽  
Qinzhao Wang ◽  
Qiang Lv ◽  
Heng Wei ◽  
Pei Zhang
Author(s):  
Shihui Li ◽  
Yi Wu ◽  
Xinyue Cui ◽  
Honghua Dong ◽  
Fei Fang ◽  
...  

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.


Author(s):  
Wenjie Shi ◽  
Shiji Song ◽  
Cheng Wu

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.


2021 ◽  
Vol 40 (1) ◽  
pp. 349-361
Author(s):  
Junior Costa de Jesus ◽  
Jair Augusto Bottega ◽  
Marco Antonio de Souza Leite Cuadros ◽  
Daniel Fernando Tello Gamarra

This article describes the use of the Deep Deterministic Policy Gradient network, a deep reinforcement learning algorithm, for mobile robot navigation. The neural network structure has as inputs laser range findings, angular and linear velocities of the robot, and position and orientation of the mobile robot with respect to a goal position. The outputs of the network will be the angular and linear velocities used as control signals for the robot. The experiments demonstrated that deep reinforcement learning’s techniques that uses continuous actions, are efficient for decision-making in a mobile robot. Nevertheless, the design of the reward functions constitutes an important issue in the performance of deep reinforcement learning algorithms. In order to show the performance of the Deep Reinforcement Learning algorithm, we have applied successfully the proposed architecture in simulated environments and in experiments with a real robot.


Sign in / Sign up

Export Citation Format

Share Document