Path-following control of underactuated ships using actor-critic reinforcement learning with MLP neural networks

Author(s):  
Haiqing Shen ◽  
Chen Guo
2020 ◽  
Vol 53 (2) ◽  
pp. 8163-8168
Author(s):  
Tianhao Zhang ◽  
Runyu Tian ◽  
Chen Wang ◽  
Guangming Xie

2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Chunyu Nie ◽  
Zewei Zheng ◽  
Ming Zhu

This paper proposed an adaptive three-dimensional (3D) path-following control design for a robotic airship based on reinforcement learning. The airship 3D path-following control is decomposed into the altitude control and the planar path-following control, and the Markov decision process (MDP) models of the control problems are established, in which the scale of the state space is reduced by parameter simplification and coordinate transformation. To ensure the control adaptability without dependence on an accurate airship dynamic model, a Q-Learning algorithm is directly adopted for learning the action policy of actuator commands, and the controller is trained online based on actual motion. A cerebellar model articulation controller (CMAC) neural network is employed for experience generalization to accelerate the training process. Simulation results demonstrate that the proposed controllers can achieve comparable performance to the well-tuned proportion integral differential (PID) controllers and have a more intelligent decision-making ability.


2021 ◽  
Vol 01 (01) ◽  
pp. 2150005
Author(s):  
Yintao Zhang ◽  
Youmin Zhang ◽  
Ziquan Yu

Unmanned aerial vehicles (UAVs) have been extensively used in civil and industrial applications due to the rapid development of the guidance, navigation and control (GNC) technologies. Especially, using deep reinforcement learning methods for motion control acquires a major progress recently, since deep [Formula: see text]-learning algorithm has been successfully applied to the continuous action domain problem. This paper proposes an improved deep deterministic policy gradient (DDPG) algorithm for path following control problem of UAV. A specific reward function is designed for minimizing the cross-track error of the path following problem. In the training phase, a double experience replay buffer (DERB) is used to increase the learning efficiency and accelerate the convergence speed. First, the model of UAV path following problem has been established. After that, the framework of DDPG algorithm is constructed. Then the state space, action space and reward function of the UAV path following algorithm are designed. DERB is proposed to accelerate the training phase. Finally, simulation results are carried out to show the effectiveness of the proposed DERB–DDPG method.


Sign in / Sign up

Export Citation Format

Share Document