Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Bartomeu Rubí; Bernardo Morcego; Ramon Pérez

doi:10.1007/s10846-021-01491-2

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-021-01491-2 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Low Cost ◽

Path Following ◽

The State ◽

Gradient Algorithm ◽

Avoidance Task ◽

Learning Approaches ◽

Reward Function ◽

Novel Structure

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

Download Full-text

Hierarchical Reinforcement Learning Considering Stochastic Wind Disturbance for Power Line Maintenance Robot

10.21203/rs.3.rs-783306/v1 ◽

2021 ◽

Author(s):

Xiaoliang Zheng ◽

Gongping Wu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Power Line ◽

Gradient Algorithm ◽

State Function ◽

Wind Disturbance ◽

Local Approach ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Autonomous Operation

Abstract Robot intelligence includes motion intelligence and cognitive intelligence. Aiming at the motion intelligence, a hierarchical reinforcement learning architecture considering stochastic wind disturbance is proposed for the decision-making of the power line maintenance robot with autonomous operation. This architecture uses the prior information of the mechanism knowledge and empirical data to improve the safety and efficiency of the robot operation. In this architecture, the high-level policy selection and the low-level motion control at global and local levels are considered comprehensively under the condition of stochastic wind disturbance. Firstly, the operation task is decomposed into three sub-policies: global obstacle avoidance, local approach and local tightening, and each sub-policy is learned. Then, a master policy is learned to select the operation sub-policy in the current state. The dual deep Q network algorithm is used for the master policy, while the deep deterministic policy gradient algorithm is used for the operation policy. In order to improve the training efficiency, the global obstacle avoidance sub-policy takes the random forest composed of dynamic environmental decision tree as the expert algorithm for imitation learning. The architecture is applied to a power line maintenance scenario, the state function and reward function of each policy are designed, and all policies are trained in an asynchronous and parallel computing environment. It is proved that this architecture can realize stable and safe autonomous operating decision for the power line maintenance robot subjected to stochastic wind disturbance.

Download Full-text

Simpler Learning of Robotic Manipulation of Clothing by Utilizing DIY Smart Textile Technology

Applied Sciences ◽

10.3390/app10124088 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4088

Author(s):

Andreas Verleysen ◽

Thomas Holvoet ◽

Remko Proesmans ◽

Cedric Den Haese ◽

Francis wyffels

Keyword(s):

Reinforcement Learning ◽

Low Cost ◽

Tactile Sensor ◽

The State ◽

Learning Needs ◽

Deformable Objects ◽

Deformable Object ◽

Reward Function ◽

Rectangular Patch ◽

Learning Agent

Deformable objects such as ropes, wires, and clothing are omnipresent in society and industry but are little researched in robotics research. This is due to the infinite amount of possible state configurations caused by the deformations of the deformable object. Engineered approaches try to cope with this by implementing highly complex operations in order to estimate the state of the deformable object. This complexity can be circumvented by utilizing learning-based approaches, such as reinforcement learning, which can deal with the intrinsic high-dimensional state space of deformable objects. However, the reward function in reinforcement learning needs to measure the state configuration of the highly deformable object. Vision-based reward functions are difficult to implement, given the high dimensionality of the state and complex dynamic behavior. In this work, we propose the consideration of concepts beyond vision and incorporate other modalities which can be extracted from deformable objects. By integrating tactile sensor cells into a textile piece, proprioceptive capabilities are gained that are valuable as they provide a reward function to a reinforcement learning agent. We demonstrate on a low-cost dual robotic arm setup that a physical agent can learn on a single CPU core to fold a rectangular patch of textile in the real world based on a learned reward function from tactile information.

Download Full-text

Reinforcement Learning Approaches in Social Robotics

Sensors ◽

10.3390/s21041292 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1292

Author(s):

Neziha Akalin ◽

Amy Loutfi

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Social Robotics ◽

Research Field ◽

Social Robots ◽

Learning Approaches ◽

Reward Function ◽

Optimal Behavior ◽

Learning Challenges ◽

Starting Point

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211037847 ◽

2021 ◽

pp. 014233122110378

Author(s):

Qingyuan Zheng ◽

Duo Wang ◽

Zhang Chen ◽

Yiyong Sun ◽

Bin Liang

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Control Method ◽

Learning Control ◽

Action Space ◽

Gradient Algorithm ◽

Single Track ◽

Wheeled Robots ◽

Reward Function ◽

Wheeled Robot

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.

Download Full-text

Generative Adversarial Immitation Learning for Steering an Unmanned Surface Vehicle

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5147 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Alexandra Vedeler ◽

Narada Warakagoda

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Complex Dynamics ◽

Imitation Learning ◽

Inverse Reinforcement Learning ◽

Radar Sensor ◽

Single Action ◽

Reward Function ◽

End To End ◽

Insight Into

The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.

Download Full-text

Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.738113 ◽

2021 ◽

Vol 8 ◽

Author(s):

Thomas Nakken Larsen ◽

Halvor Ødegård Teigen ◽

Torkel Laache ◽

Damiano Varagnolo ◽

Adil Rasheed

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Path Following ◽

Behavioral Characteristics ◽

Training Environment ◽

Reward Function ◽

Unexpected Consequence ◽

Policy Optimization ◽

Dual Objectives ◽

And Task

Reinforcement Learning (RL) controllers have proved to effectively tackle the dual objectives of path following and collision avoidance. However, finding which RL algorithm setup optimally trades off these two tasks is not necessarily easy. This work proposes a methodology to explore this that leverages analyzing the performance and task-specific behavioral characteristics for a range of RL algorithms applied to path-following and collision-avoidance for underactuated surface vehicles in environments of increasing complexity. Compared to the introduced RL algorithms, the results show that the Proximal Policy Optimization (PPO) algorithm exhibits superior robustness to changes in the environment complexity, the reward function, and when generalized to environments with a considerable domain gap from the training environment. Whereas the proposed reward function significantly improves the competing algorithms’ ability to solve the training environment, an unexpected consequence of the dimensionality reduction in the sensor suite, combined with the domain gap, is identified as the source of their impaired generalization performance.

Download Full-text

A new gap-based obstacle avoidance approach: follow the obstacle circle method

Robotica ◽

10.1017/s0263574721001624 ◽

2021 ◽

pp. 1-24

Author(s):

Hosein Houshyari ◽

Volkan Sezer

Keyword(s):

Obstacle Avoidance ◽

Autonomous Robots ◽

Circle Method ◽

Path Following ◽

Avoidance Task ◽

The Real ◽

Goal State ◽

Challenging Tasks ◽

Equidistant Points ◽

Center Angle

Abstract One of the most challenging tasks for autonomous robots is avoiding unexpected obstacles during their path following operation. Follow the gap method (FGM) is one of the most popular obstacle avoidance algorithms that recursively guides the robot to the goal state by considering the angle to the goal point and the distance to the closest obstacles. It selects the largest gap around the robot, where the gap angle is calculated by the vector to the midpoint of the largest gap. In this paper, a novel obstacle avoidance procedure is developed and applied to a real fully autonomous wheelchair. This proposed algorithm improves the FGM’s travel safety and brings a new solution to the obstacle avoidance task. In the proposed algorithm, the largest gap is selected based on gap width. Moreover, the avoidance angle (similar to the gap center angle of FGM) is calculated considering the locus of the equidistant points from obstacles that create obstacle circles. Monte Carlo simulations are used to test the proposed algorithm, and according to the results, the new procedure guides the robot to safer trajectories compared with classical FGM. The real experimental test results are in parallel to the simulations and show the real-time performance of the proposed approach.

Download Full-text

Deep Reinforcement Learning for Quadrotor Path Following and Obstacle Avoidance

10.1007/978-3-030-77939-9_17 ◽

2021 ◽

pp. 563-633

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Path Following

Download Full-text

Learning Reward Function with Matching Network for Mapless Navigation

Sensors ◽

10.3390/s20133664 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3664 ◽

Cited By ~ 1

Author(s):

Qichen Zhang ◽

Meiqiang Zhu ◽

Liang Zou ◽

Ming Li ◽

Yong Zhang

Keyword(s):

Reinforcement Learning ◽

Optimal Strategy ◽

State Of The Art ◽

The State ◽

Matching Network ◽

Additional Training ◽

Reward Function ◽

Moving Obstacles ◽

Reward Shaping ◽

Simulation Results

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

Download Full-text