Obstacle Avoidance Drone by Deep Reinforcement Learning and Its Racing with Human Pilot

Drones with obstacle avoidance capabilities have attracted much attention from researchers recently. They typically adopt either supervised learning or reinforcement learning (RL) for training their networks. The drawback of supervised learning is that labeling of the massive dataset is laborious and time-consuming, whereas RL aims to overcome such a problem by letting an agent learn with the data from its environment. The present study aims to utilize diverse RL within two categories: (1) discrete action space and (2) continuous action space. The former has the advantage in optimization for vision datasets, but such actions can lead to unnatural behavior. For the latter, we propose a U-net based segmentation model with an actor-critic network. Performance is compared between these RL algorithms with three different environments such as the woodland, block world, and the arena world, as well as racing with human pilots. Results suggest that our best continuous algorithm easily outperformed the discrete ones and yet was similar to an expert pilot.

Download Full-text

Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space

Electronics ◽

10.3390/electronics9030411 ◽

2020 ◽

Vol 9 (3) ◽

pp. 411

Author(s):

Reinis Cimurs ◽

Jin Han Lee ◽

Il Hong Suh

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Action Space ◽

Polar Coordinates ◽

Depth Image ◽

Depth Information ◽

Continuous Action ◽

Learning Network ◽

Complex Shapes ◽

Policy Gradient

In this paper, we propose a goal-oriented obstacle avoidance navigation system based on deep reinforcement learning that uses depth information in scenes, as well as goal position in polar coordinates as state inputs. The control signals for robot motion are output in a continuous action space. We devise a deep deterministic policy gradient network with the inclusion of depth-wise separable convolution layers to process the large amounts of sequential depth image information. The goal-oriented obstacle avoidance navigation is performed without prior knowledge of the environment or a map. We show that through the proposed deep reinforcement learning network, a goal-oriented collision avoidance model can be trained end-to-end without manual tuning or supervision by a human operator. We train our model in a simulation, and the resulting network is directly transferred to other environments. Experiments show the capability of the trained network to navigate safely around obstacles and arrive at the designated goal positions in the simulation, as well as in the real world. The proposed method exhibits higher reliability than the compared approaches when navigating around obstacles with complex shapes. The experiments show that the approach is capable of avoiding not only static, but also dynamic obstacles.

Download Full-text

Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning

AI ◽

10.3390/ai2030023 ◽

2021 ◽

Vol 2 (3) ◽

pp. 366-382

Author(s):

Zhihan Xue ◽

Tad Gonsalves

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Obstacle Avoidance ◽

Image Data ◽

Depth Map ◽

Training Environment ◽

Time To Build ◽

Discrete Action ◽

Single Dataset ◽

Action Spaces

Research on autonomous obstacle avoidance of drones has recently received widespread attention from researchers. Among them, an increasing number of researchers are using machine learning to train drones. These studies typically adopt supervised learning or reinforcement learning to train the networks. Supervised learning has a disadvantage in that it takes a significant amount of time to build the datasets, because it is difficult to cover the complex and changeable drone flight environment in a single dataset. Reinforcement learning can overcome this problem by using drones to learn data in the environment. However, the current research results based on reinforcement learning are mainly focused on discrete action spaces. In this way, the movement of drones lacks precision and has somewhat unnatural flying behavior. This study aims to use the soft-actor-critic algorithm to train a drone to perform autonomous obstacle avoidance in continuous action space using only the image data. The algorithm is trained and tested in a simulation environment built by Airsim. The results show that our algorithm enables the UAV to avoid obstacles in the training environment only by inputting the depth map. Moreover, it also has a higher obstacle avoidance rate in the reconfigured environment without retraining.

Download Full-text

A game strategy model in the digital curling system based on NFSP

Complex & Intelligent Systems ◽

10.1007/s40747-021-00345-6 ◽

2021 ◽

Author(s):

Yuntao Han ◽

Qibin Zhou ◽

Fuqing Duan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Action Space ◽

Learning Networks ◽

Game Tree ◽

Continuous Action ◽

Extensive Game ◽

Strategy Model ◽

Zero Sum ◽

Tree Searching

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

Constructing continuous action space from basis functions for fast and stable reinforcement learning

RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication ◽

10.1109/roman.2009.5326234 ◽

2009 ◽

Cited By ~ 2

Author(s):

Akihiko Yamaguchi ◽

Jun Takamatsu ◽

Tsukasa Ogasawara

Keyword(s):

Reinforcement Learning ◽

Basis Functions ◽

Action Space ◽

Continuous Action

Download Full-text

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Haotian Fu ◽

Hongyao Tang ◽

Jianye Hao ◽

Zihan Lei ◽

Yingfeng Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Q Learning ◽

Challenging Tasks ◽

Discrete Action ◽

Multi Agent ◽

Decentralized Execution ◽

Novel Algorithms ◽

Action Spaces ◽

Different Levels

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text

Reinforcement learning in discrete action space applied to inverse defect design

Journal of Physics Communications ◽

10.1088/2399-6528/abe591 ◽

2021 ◽

Author(s):

Troy Loeffler ◽

Suvo Banik ◽

Tarak Patra ◽

Michael Sternberg ◽

Subramanian Sankaranarayanan

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Discrete Action

Download Full-text

Soft Action Particle Deep Reinforcement Learning for a Continuous Action Space

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros40897.2019.8967959 ◽

2019 ◽

Author(s):

Minjae Kang ◽

Kyungjae Lee ◽

Songhwai Oh

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Continuous Action

Download Full-text

Social Learning with Coarse Inference

American Economic Journal Microeconomics ◽

10.1257/mic.5.1.147 ◽

2013 ◽

Vol 5 (1) ◽

pp. 147-174 ◽

Cited By ~ 12

Author(s):

Antonio Guarino ◽

Philippe Jehiel

Keyword(s):

Social Learning ◽

The State ◽

Action Space ◽

Continuous Action ◽

Rational Agents ◽

Behavioral Bias ◽

Aggregate Distribution ◽

Rational Case ◽

The World ◽

Discrete Action

We study social learning by boundedly rational agents. Agents take a decision in sequence, after observing their predecessors and a private signal. They are unable to make perfect inferences from their predecessors' decisions: they only understand the relation between the aggregate distribution of actions and the state of nature, and make their inferences accordingly. We show that, in a discrete action space, even if agents receive signals of unbounded precision, there are asymptotic inefficiencies. In a continuous action space, compared to the rational case, agents overweight early signals. Despite this behavioral bias, eventually agents learn the realized state of the world and choose the correct action. (JEL D82, D83)

Download Full-text

Collaborative Multi-agent Reinforcement Learning for Landmark Localization Using Continuous Action Space

Lecture Notes in Computer Science - Information Processing in Medical Imaging ◽

10.1007/978-3-030-78191-0_59 ◽

2021 ◽

pp. 767-778

Author(s):

Klemens Kasseroller ◽

Franz Thaler ◽

Christian Payer ◽

Darko Štern

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Continuous Action ◽

Multi Agent ◽

Landmark Localization

Download Full-text