Hierarchical Reinforcement Learning Considering Stochastic Wind Disturbance for Power Line Maintenance Robot

Mapping Intimacies ◽

10.21203/rs.3.rs-783306/v1 ◽

2021 ◽

Author(s):

Xiaoliang Zheng ◽

Gongping Wu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Power Line ◽

Gradient Algorithm ◽

State Function ◽

Wind Disturbance ◽

Local Approach ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Autonomous Operation

Abstract Robot intelligence includes motion intelligence and cognitive intelligence. Aiming at the motion intelligence, a hierarchical reinforcement learning architecture considering stochastic wind disturbance is proposed for the decision-making of the power line maintenance robot with autonomous operation. This architecture uses the prior information of the mechanism knowledge and empirical data to improve the safety and efficiency of the robot operation. In this architecture, the high-level policy selection and the low-level motion control at global and local levels are considered comprehensively under the condition of stochastic wind disturbance. Firstly, the operation task is decomposed into three sub-policies: global obstacle avoidance, local approach and local tightening, and each sub-policy is learned. Then, a master policy is learned to select the operation sub-policy in the current state. The dual deep Q network algorithm is used for the master policy, while the deep deterministic policy gradient algorithm is used for the operation policy. In order to improve the training efficiency, the global obstacle avoidance sub-policy takes the random forest composed of dynamic environmental decision tree as the expert algorithm for imitation learning. The architecture is applied to a power line maintenance scenario, the state function and reward function of each policy are designed, and all policies are trained in an asynchronous and parallel computing environment. It is proved that this architecture can realize stable and safe autonomous operating decision for the power line maintenance robot subjected to stochastic wind disturbance.

Download Full-text

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-021-01491-2 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Low Cost ◽

Path Following ◽

The State ◽

Gradient Algorithm ◽

Avoidance Task ◽

Learning Approaches ◽

Reward Function ◽

Novel Structure

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

Autonomous Surface Vessel Obstacle Avoidance Based on Hierarchical Reinforcement Learning With Potential Field Method

10.1115/1.0000710v ◽

2021 ◽

Author(s):

Chang Zhou ◽

Lei Wang ◽

Huacheng He ◽

Shangyu Yu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Potential Field ◽

Field Method ◽

Hierarchical Reinforcement Learning ◽

Potential Field Method ◽

Surface Vessel

Download Full-text

Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211037847 ◽

2021 ◽

pp. 014233122110378

Author(s):

Qingyuan Zheng ◽

Duo Wang ◽

Zhang Chen ◽

Yiyong Sun ◽

Bin Liang

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Control Method ◽

Learning Control ◽

Action Space ◽

Gradient Algorithm ◽

Single Track ◽

Wheeled Robots ◽

Reward Function ◽

Wheeled Robot

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.

Download Full-text

Generative Adversarial Immitation Learning for Steering an Unmanned Surface Vehicle

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5147 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Alexandra Vedeler ◽

Narada Warakagoda

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Complex Dynamics ◽

Imitation Learning ◽

Inverse Reinforcement Learning ◽

Radar Sensor ◽

Single Action ◽

Reward Function ◽

End To End ◽

Insight Into

The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text

A hierarchical reinforcement learning algorithm based on heuristic reward function

2010 2nd International Conference on Advanced Computer Control ◽

10.1109/icacc.2010.5486837 ◽

2010 ◽

Author(s):

Qicui Yan ◽

Quan Liu ◽

Daojing Hu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

The Open Electrical & Electronic Engineering Journal ◽

10.2174/1874129001610010069 ◽

2016 ◽

Vol 10 (1) ◽

pp. 69-79 ◽

Cited By ~ 1

Author(s):

Juan Yan ◽

Huibin Yang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Weight Vector ◽

The Self ◽

Simulation Experiments ◽

Wheeled Robots ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Balancing Control ◽

Reinforcement Learning Algorithm

Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of two-wheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and its corresponding weight vector, and propose a reward function with additional subgoal reward function. Finally, we give a hierarchical reinforcement learning algorithm for finding the optimal strategy. Simulation experiments show that, the proposed algorithm is more effectiveness than traditional reinforcement learning algorithm in convergent speed. So in our system, the robots can get self-balanced very quickly.

Download Full-text

Autonomous Control of Urban Storm Water Networks Using Reinforcement Learning

10.29007/hx4d ◽

2018 ◽

Cited By ~ 1

Author(s):

Abhiram Mullapudi ◽

Branko Kerkez

Keyword(s):

Reinforcement Learning ◽

System Level ◽

Storm Water ◽

Level Control ◽

Water Network ◽

Reward Function ◽

Autonomous Operation ◽

Control Robustness ◽

Control Objective ◽

Urban Storm

We investigate the real-time and autonomous operation of a 12 km2 urban storm water network, which has been retrofitted with sensors and control valves. Specifically, we evaluate reinforcement learning, a technique rooted in deep learning, as a system-level control methodology. The controller opens and closes valves in the system, which enhances the performance in the storm water network by coordinating the discharges amongst spatially distributed storm water assets (i.e. detention basins and wetlands). A reinforcement learning control algorithm is implemented to control the storm water network across an urban watershed. Results show that control of valves using reinforcement learning shows great potential, but extensive research still needs to be conducted to develop a fundamental understanding of control robustness. We specifically discuss the role and importance of the reward function (i.e. heuristic control objective), which guides the autonomous controller towards achieving the desired water shed scale response.

Download Full-text

Autonomous Surface Vessel Obstacle Avoidance Based on Hierarchical Reinforcement Learning

Volume 1: Offshore Technology ◽

10.1115/omae2020-18454 ◽

2020 ◽

Author(s):

Chang Zhou ◽

Lei Wang ◽

Shangyu Yu ◽

Huacheng He

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Avoidance Performance ◽

Working Environment ◽

Main Task ◽

Planning Strategy ◽

Hierarchical Reinforcement Learning ◽

Target Network ◽

Control Research ◽

Safety Consideration

Abstract The obstacle avoidance problem of autonomous surface vessels (ASV) has attracted the attention of the marine control research community for long years. Out of safety consideration, it is important for ASV to avoid all kinds of obstacles like shores, cliffs, floaters and other vessels. Developing a heading and path planning strategy for ASV is the main task and the remaining challenge. Traditional obstacle avoidance algorithms lead to too much computing in working environment. The issue of computation cost can be solved by training obstacle avoidance models with reinforcement learning (RL). By using the RL method, the ASV will choose the most efficient action according to the ASV’s experience it learned from the past. In this paper, RL is adopted to design a decision-making agent for obstacle avoidance. To train the obstacle avoidance model under a sparse feedback environment, hierarchical reinforcement learning (HRL) method is applied. Using this algorithm, better obstacle avoidance performance and longer survival time can be achieved. Memory pool modification and target network modification are also used to smooth the training process of the ASV. Simulation results demonstrate that HRL can make the learning process of un-manned ship’s obstacle avoidance smoother and more effective. Also, the living time of ASVs is improved.

Download Full-text