Path Following Control for UAV Using Deep Reinforcement Learning Approach

Unmanned aerial vehicles (UAVs) have been extensively used in civil and industrial applications due to the rapid development of the guidance, navigation and control (GNC) technologies. Especially, using deep reinforcement learning methods for motion control acquires a major progress recently, since deep [Formula: see text]-learning algorithm has been successfully applied to the continuous action domain problem. This paper proposes an improved deep deterministic policy gradient (DDPG) algorithm for path following control problem of UAV. A specific reward function is designed for minimizing the cross-track error of the path following problem. In the training phase, a double experience replay buffer (DERB) is used to increase the learning efficiency and accelerate the convergence speed. First, the model of UAV path following problem has been established. After that, the framework of DDPG algorithm is constructed. Then the state space, action space and reward function of the UAV path following algorithm are designed. DERB is proposed to accelerate the training phase. Finally, simulation results are carried out to show the effectiveness of the proposed DERB–DDPG method.

Download Full-text

Three-Dimensional Path-Following Control of a Robotic Airship with Reinforcement Learning

International Journal of Aerospace Engineering ◽

10.1155/2019/7854173 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Chunyu Nie ◽

Zewei Zheng ◽

Ming Zhu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Three Dimensional ◽

Path Following ◽

Cerebellar Model Articulation Controller ◽

Intelligent Decision Making ◽

Cmac Neural Network ◽

Path Following Control ◽

Markov Decision ◽

Robotic Airship

This paper proposed an adaptive three-dimensional (3D) path-following control design for a robotic airship based on reinforcement learning. The airship 3D path-following control is decomposed into the altitude control and the planar path-following control, and the Markov decision process (MDP) models of the control problems are established, in which the scale of the state space is reduced by parameter simplification and coordinate transformation. To ensure the control adaptability without dependence on an accurate airship dynamic model, a Q-Learning algorithm is directly adopted for learning the action policy of actuator commands, and the controller is trained online based on actual motion. A cerebellar model articulation controller (CMAC) neural network is employed for experience generalization to accelerate the training process. Simulation results demonstrate that the proposed controllers can achieve comparable performance to the well-tuned proportion integral differential (PID) controllers and have a more intelligent decision-making ability.

Download Full-text

AUV Path Following Control using Deep Reinforcement Learning Under the Influence of Ocean Currents

10.1145/3458380.3459041 ◽

2021 ◽

Author(s):

Chao Wang ◽

Jun Du ◽

Jingjing Wang ◽

Yong Ren

Keyword(s):

Reinforcement Learning ◽

Path Following ◽

Ocean Currents ◽

Path Following Control

Download Full-text

Safety-aware Adversarial Inverse Reinforcement Learning (S-AIRL) for Highway Autonomous Driving

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4053427 ◽

2022 ◽

pp. 1-14

Author(s):

Fangjian Li ◽

John R Wagner ◽

Yue Wang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Risky Behaviors ◽

Autonomous Driving ◽

Inverse Reinforcement Learning ◽

Safety Issues ◽

Reward Function ◽

Sampling Process ◽

Safety Awareness ◽

Driving Scenario

Abstract Inverse reinforcement learning (IRL) has been successfully applied in many robotics and autonomous driving studies without the need for hand-tuning a reward function. However, it suffers from safety issues. Compared to the reinforcement learning (RL) algorithms, IRL is even more vulnerable to unsafe situations as it can only infer the importance of safety based on expert demonstrations. In this paper, we propose a safety-aware adversarial inverse reinforcement learning algorithm (S-AIRL). First, the control barrier function (CBF) is used to guide the training of a safety critic, which leverages the knowledge of system dynamics in the sampling process without training an additional guiding policy. The trained safety critic is then integrated into the discriminator to help discern the generated data and expert demonstrations from the standpoint of safety. Finally, to further improve the safety awareness, a regulator is introduced in the loss function of the discriminator training to prevent the recovered reward function from assigning high rewards to the risky behaviors. We tested our S-AIRL in the highway autonomous driving scenario. Comparing to the original AIRL algorithm, with the same level of imitation learning (IL) performance, the proposed S-AIRL can reduce the collision rate by 32.6%.

Download Full-text

A Deep Reinforcement Learning Strategy for UAV Path Following Control Under Sensor Fault

10.1007/978-981-15-8155-7_432 ◽

2021 ◽

pp. 5239-5249

Author(s):

Yintao Zhang ◽

Youmin Zhang ◽

Ziquan Yu

Keyword(s):

Reinforcement Learning ◽

Learning Strategy ◽

Path Following ◽

Sensor Fault ◽

Path Following Control

Download Full-text

A Path-Integral-Based Reinforcement Learning Algorithm for Path Following of an Autoassembly Mobile Robot

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2019.2955699 ◽

2020 ◽

Vol 31 (11) ◽

pp. 4487-4499

Author(s):

Wei Zhu ◽

Xian Guo ◽

Yongchun Fang ◽

Xueyou Zhang

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Path Integral ◽

Learning Algorithm ◽

Path Following ◽

Reinforcement Learning Algorithm

Download Full-text

Data-driven unmanned surface vessel path following control method based on reinforcement learning

2019 Chinese Control And Decision Conference (CCDC) ◽

10.1109/ccdc.2019.8832655 ◽

2019 ◽

Author(s):

Weinan Deng ◽

Hao Li ◽

YuanQiao Wen

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Path Following ◽

Data Driven ◽

Path Following Control ◽

Unmanned Surface Vessel ◽

Surface Vessel

Download Full-text

An Application of Reinforced Learning-Based Dynamic Pricing for Improvement of Ridesharing Platform Service in Seoul

Electronics ◽

10.3390/electronics9111818 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1818

Author(s):

Jaein Song ◽

Yun Ji Cho ◽

Min Hee Kang ◽

Kee Yeon Hwang

Keyword(s):

Reinforcement Learning ◽

Waiting Time ◽

Dynamic Pricing ◽

Learning Algorithm ◽

Residential Areas ◽

Private Companies ◽

Reward Function ◽

Time Period ◽

Reinforced Learning ◽

Centrality Analysis

As ridesharing services (including taxi) are often run by private companies, profitability is the top priority in operation. This leads to an increase in the driver’s refusal to take passengers to areas with low demand where they will have difficulties finding subsequent passengers, causing problems such as an extended waiting time when hailing a vehicle for passengers bound for these regions. The study used Seoul’s taxi data to find appropriate surge rates of ridesharing services between 10:00 p.m. and 4:00 a.m. by region using a reinforcement learning algorithm to resolve this problem during the worst time period. In reinforcement learning, the outcome of centrality analysis was applied as a weight affecting drivers’ destination choice probability. Furthermore, the reward function used in the learning was adjusted according to whether the passenger waiting time value was applied or not. The profit was used for reward value. By using a negative reward for the passenger waiting time, the study was able to identify a more appropriate surge level. Across the region, the surge averaged a value of 1.6. To be more specific, those located on the outskirts of the city and in residential areas showed a higher surge, while central areas had a lower surge. Due to this different surge, a driver’s refusal to take passengers can be lessened and the passenger waiting time can be shortened. The supply of ridesharing services in low-demand regions can be increased by as much as 7.5%, allowing regional equity problems related to ridesharing services in Seoul to be reduced to a greater extent.

Download Full-text

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-021-01491-2 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Low Cost ◽

Path Following ◽

The State ◽

Gradient Algorithm ◽

Avoidance Task ◽

Learning Approaches ◽

Reward Function ◽

Novel Structure

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

Download Full-text

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211063081 ◽

2021 ◽

pp. 095440702110630

Author(s):

Zhenhai Gao ◽

Xiangtong Yan ◽

Fei Gao ◽

Lei He

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Autonomous Driving ◽

Decision Strategies ◽

Reward Function ◽

Human Driver ◽

Reward Functions ◽

A Current ◽

Better Than

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text