Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space

Reinis Cimurs; Jin Han Lee; Il Hong Suh

doi:10.3390/electronics9030411

Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space

Electronics ◽

10.3390/electronics9030411 ◽

2020 ◽

Vol 9 (3) ◽

pp. 411

Author(s):

Reinis Cimurs ◽

Jin Han Lee ◽

Il Hong Suh

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Action Space ◽

Polar Coordinates ◽

Depth Image ◽

Depth Information ◽

Continuous Action ◽

Learning Network ◽

Complex Shapes ◽

Policy Gradient

In this paper, we propose a goal-oriented obstacle avoidance navigation system based on deep reinforcement learning that uses depth information in scenes, as well as goal position in polar coordinates as state inputs. The control signals for robot motion are output in a continuous action space. We devise a deep deterministic policy gradient network with the inclusion of depth-wise separable convolution layers to process the large amounts of sequential depth image information. The goal-oriented obstacle avoidance navigation is performed without prior knowledge of the environment or a map. We show that through the proposed deep reinforcement learning network, a goal-oriented collision avoidance model can be trained end-to-end without manual tuning or supervision by a human operator. We train our model in a simulation, and the resulting network is directly transferred to other environments. Experiments show the capability of the trained network to navigate safely around obstacles and arrive at the designated goal positions in the simulation, as well as in the real world. The proposed method exhibits higher reliability than the compared approaches when navigating around obstacles with complex shapes. The experiments show that the approach is capable of avoiding not only static, but also dynamic obstacles.

Download Full-text

Obstacle Avoidance Drone by Deep Reinforcement Learning and Its Racing with Human Pilot

Applied Sciences ◽

10.3390/app9245571 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5571 ◽

Cited By ~ 1

Author(s):

Sang-Yun Shin ◽

Yong-Won Kang ◽

Yong-Guk Kim

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Obstacle Avoidance ◽

Network Performance ◽

Action Space ◽

Continuous Action ◽

Discrete Action

Drones with obstacle avoidance capabilities have attracted much attention from researchers recently. They typically adopt either supervised learning or reinforcement learning (RL) for training their networks. The drawback of supervised learning is that labeling of the massive dataset is laborious and time-consuming, whereas RL aims to overcome such a problem by letting an agent learn with the data from its environment. The present study aims to utilize diverse RL within two categories: (1) discrete action space and (2) continuous action space. The former has the advantage in optimization for vision datasets, but such actions can lead to unnatural behavior. For the latter, we propose a U-net based segmentation model with an actor-critic network. Performance is compared between these RL algorithms with three different environments such as the woodland, block world, and the arena world, as well as racing with human pilots. Results suggest that our best continuous algorithm easily outperformed the discrete ones and yet was similar to an expert pilot.

Download Full-text

A game strategy model in the digital curling system based on NFSP

Complex & Intelligent Systems ◽

10.1007/s40747-021-00345-6 ◽

2021 ◽

Author(s):

Yuntao Han ◽

Qibin Zhou ◽

Fuqing Duan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Action Space ◽

Learning Networks ◽

Game Tree ◽

Continuous Action ◽

Extensive Game ◽

Strategy Model ◽

Zero Sum ◽

Tree Searching

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text

Constructing continuous action space from basis functions for fast and stable reinforcement learning

RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication ◽

10.1109/roman.2009.5326234 ◽

2009 ◽

Cited By ~ 2

Author(s):

Akihiko Yamaguchi ◽

Jun Takamatsu ◽

Tsukasa Ogasawara

Keyword(s):

Reinforcement Learning ◽

Basis Functions ◽

Action Space ◽

Continuous Action

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Soft Action Particle Deep Reinforcement Learning for a Continuous Action Space

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros40897.2019.8967959 ◽

2019 ◽

Author(s):

Minjae Kang ◽

Kyungjae Lee ◽

Songhwai Oh

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Continuous Action

Download Full-text

Collaborative Multi-agent Reinforcement Learning for Landmark Localization Using Continuous Action Space

Lecture Notes in Computer Science - Information Processing in Medical Imaging ◽

10.1007/978-3-030-78191-0_59 ◽

2021 ◽

pp. 767-778

Author(s):

Klemens Kasseroller ◽

Franz Thaler ◽

Christian Payer ◽

Darko Štern

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Continuous Action ◽

Multi Agent ◽

Landmark Localization

Download Full-text

D3PG: Dirichlet DDGP for Task Partitioning and Offloading with Constrained Hybrid Action Space in Mobile Edge Computing

10.36227/techrxiv.17203607 ◽

2021 ◽

Author(s):

Laha Ale ◽

Scott King ◽

Ning Zhang ◽

Abdul Sattar ◽

Janahan Skandaraniyam

Keyword(s):

Reinforcement Learning ◽

Dynamic Environment ◽

Edge Computing ◽

Action Space ◽

Computation Offloading ◽

Mobile Edge Computing ◽

Continuous Variables ◽

Task Partitioning ◽

Policy Gradient ◽

Service Latency

<div> Mobile Edge Computing (MEC) has been regarded as a promising paradigm to reduce service latency for data processing in Internet of Things, by provisioning computing resources at network edge. In this work, we jointly optimize the task partitioning and computational power allocation for computation offloading in a dynamic environment with multiple IoT devices and multiple edge servers. We formulate the problem as a Markov decision process with constrained hybrid action space, which cannot be well handled by existing deep reinforcement learning (DRL) algorithms. Therefore, we develop a novel Deep Reinforcement Learning called Dirichlet Deep Deterministic Policy Gradient (D3PG), which </div><div>is built on Deep Deterministic Policy Gradient (DDPG) to solve the problem. The developed model can learn to solve multi-objective optimization, including maximizing the number of tasks processed before expiration and minimizing the energy cost and service latency. More importantly, D3PG can effectively deal with constrained distribution-continuous hybrid action space, where the distribution variables are for the task partitioning and offloading, while the continuous variables are for computational frequency control. Moreover, the D3PG can address many similar issues in MEC and general reinforcement learning problems. Extensive simulation results show that the proposed D3PG outperforms the state-of-art methods.</div><div> Mobile Edge Computing (MEC) has been regarded as a promising paradigm to reduce service latency for data processing in Internet of Things, by provisioning computing resources at network edge. In this work, we jointly optimize the task partitioning and computational power allocation for computation offloading in a dynamic environment with multiple IoT devices and multiple edge servers. We formulate the problem as a Markov decision process with constrained hybrid action space, which cannot be well handled by existing deep reinforcement learning (DRL) algorithms. Therefore, we develop a novel Deep Reinforcement Learning called Dirichlet Deep Deterministic Policy Gradient (D3PG), which is built on Deep Deterministic Policy Gradient (DDPG) to solve the problem. The developed model can learn to solve multi-objective optimization, including maximizing the number of tasks processed before expiration and minimizing the energy cost and service latency. More importantly, D3PG can effectively deal with constrained distribution-continuous hybrid action space, where the distribution variables are for the task partitioning and offloading, while the continuous variables are for computational frequency control. Moreover, the D3PG can address many similar issues in MEC and general reinforcement learning problems. Extensive simulation results show that the proposed D3PG outperforms the state-of-art methods.</div>

Download Full-text