Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning

Research on autonomous obstacle avoidance of drones has recently received widespread attention from researchers. Among them, an increasing number of researchers are using machine learning to train drones. These studies typically adopt supervised learning or reinforcement learning to train the networks. Supervised learning has a disadvantage in that it takes a significant amount of time to build the datasets, because it is difficult to cover the complex and changeable drone flight environment in a single dataset. Reinforcement learning can overcome this problem by using drones to learn data in the environment. However, the current research results based on reinforcement learning are mainly focused on discrete action spaces. In this way, the movement of drones lacks precision and has somewhat unnatural flying behavior. This study aims to use the soft-actor-critic algorithm to train a drone to perform autonomous obstacle avoidance in continuous action space using only the image data. The algorithm is trained and tested in a simulation environment built by Airsim. The results show that our algorithm enables the UAV to avoid obstacles in the training environment only by inputting the depth map. Moreover, it also has a higher obstacle avoidance rate in the reconfigured environment without retraining.

Download Full-text

Obstacle Avoidance Drone by Deep Reinforcement Learning and Its Racing with Human Pilot

Applied Sciences ◽

10.3390/app9245571 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5571 ◽

Cited By ~ 1

Author(s):

Sang-Yun Shin ◽

Yong-Won Kang ◽

Yong-Guk Kim

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Obstacle Avoidance ◽

Network Performance ◽

Action Space ◽

Continuous Action ◽

Discrete Action

Drones with obstacle avoidance capabilities have attracted much attention from researchers recently. They typically adopt either supervised learning or reinforcement learning (RL) for training their networks. The drawback of supervised learning is that labeling of the massive dataset is laborious and time-consuming, whereas RL aims to overcome such a problem by letting an agent learn with the data from its environment. The present study aims to utilize diverse RL within two categories: (1) discrete action space and (2) continuous action space. The former has the advantage in optimization for vision datasets, but such actions can lead to unnatural behavior. For the latter, we propose a U-net based segmentation model with an actor-critic network. Performance is compared between these RL algorithms with three different environments such as the woodland, block world, and the arena world, as well as racing with human pilots. Results suggest that our best continuous algorithm easily outperformed the discrete ones and yet was similar to an expert pilot.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/430 ◽

2020 ◽

Author(s):

Mohammadamin Barekatain ◽

Ryo Yonetani ◽

Masashi Hamaya

Keyword(s):

Reinforcement Learning ◽

Task Performance ◽

Experimental Evaluation ◽

Control Problems ◽

Target Task ◽

Learning Efficiency ◽

Simulated Environments ◽

Discrete Action ◽

Key Techniques ◽

Action Spaces

Transfer reinforcement learning (RL) aims at improving the learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The demo videos and code are available on the project webpage: https://omron-sinicx.github.io/multipolar/.

Download Full-text

Binary Black-Box Attacks Against Static Malware Detectors with Reinforcement Learning in Discrete Action Spaces

2021 IEEE Security and Privacy Workshops (SPW) ◽

10.1109/spw53761.2021.00021 ◽

2021 ◽

Author(s):

Mohammadreza Ebrahimi ◽

Jason Pacheco ◽

Weifeng Li ◽

James Lee Hu ◽

Hsinchun Chen

Keyword(s):

Reinforcement Learning ◽

Black Box ◽

Discrete Action ◽

Action Spaces

Download Full-text

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Haotian Fu ◽

Hongyao Tang ◽

Jianye Hao ◽

Zihan Lei ◽

Yingfeng Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Q Learning ◽

Challenging Tasks ◽

Discrete Action ◽

Multi Agent ◽

Decentralized Execution ◽

Novel Algorithms ◽

Action Spaces ◽

Different Levels

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text

Ubiquitous Distributed Deep Reinforcement Learning at the Edge: Analyzing Byzantine Agents in Discrete Action Spaces

Procedia Computer Science ◽

10.1016/j.procs.2020.10.043 ◽

2020 ◽

Vol 177 ◽

pp. 324-329

Author(s):

Wenshuai Zhao ◽

Jorge Peña Queralta ◽

Li Qingqing ◽

Tomi Westerlund

Keyword(s):

Reinforcement Learning ◽

Discrete Action ◽

Action Spaces

Download Full-text

The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement Learning

Symmetry ◽

10.3390/sym13081335 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1335

Author(s):

Menghao Wu ◽

Yanbin Gao ◽

Pengfei Wang ◽

Fan Zhang ◽

Zhejun Liu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Control Policy ◽

Continuous Action ◽

Control Approach ◽

Low Level ◽

Learning Technique ◽

Distance Sensor ◽

High Level ◽

Action Spaces

In robotics, obstacle avoidance is an essential ability for distance sensor-based robots. This type of robot has axisymmetrically distributed distance sensors to acquire obstacle distance, so the state is symmetrical. Training the control policy with a reinforcement learning method is a trend. Considering the complexity of environments, such as narrow paths and right-angle turns, robots will have a better ability if the control policy can control the steering direction and speed simultaneously. This paper proposes the multi-dimensional action control (MDAC) approach based on a reinforcement learning technique, which can be used in multiple continuous action space tasks. It adopts a hierarchical structure, which has high and low-level modules. Low-level policies output concrete actions and the high-level policy determines when to invoke low-level modules according to the environment’s features. We design robot navigation experiments with continuous action spaces to test the method’s performance. It is an end-to-end approach and can solve complex obstacle avoidance tasks in navigation.

Download Full-text

A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance

IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics) ◽

10.1109/tsmcb.2003.808179 ◽

2003 ◽

Vol 33 (1) ◽

pp. 17-27 ◽

Cited By ~ 90

Author(s):

Cang Ye ◽

N.H.C. Yung ◽

Danwei Wang

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Obstacle Avoidance ◽

Fuzzy Controller ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Deep Reinforcement Learning in Linear Discrete Action Spaces

2020 Winter Simulation Conference (WSC) ◽

10.1109/wsc48552.2020.9384078 ◽

2020 ◽

Author(s):

Wouter van Heeswijk ◽

Han La Poutre

Keyword(s):

Reinforcement Learning ◽

Discrete Action ◽

Action Spaces

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text