Primary side optimization control of heating station based on double-delay depth deterministic policy gradient

This research aims to solve the safe navigation problem of autonomous underwater vehicles (AUVs) in deep ocean, which is a complex and changeable environment with various mountains. When an AUV reaches the deep sea navigation, it encounters many underwater canyons, and the hard valley walls threaten its safety seriously. To solve the problem on the safe driving of AUV in underwater canyons and address the potential of AUV autonomous obstacle avoidance in uncertain environments, an improved AUV path planning algorithm based on the deep deterministic policy gradient (DDPG) algorithm is proposed in this work. This method refers to an end-to-end path planning algorithm that optimizes the strategy directly. It takes sensor information as input and driving speed and yaw angle as outputs. The path planning algorithm can reach the predetermined target point while avoiding large-scale static obstacles, such as valley walls in the simulated underwater canyon environment, as well as sudden small-scale dynamic obstacles, such as marine life and other vehicles. In addition, this research aims at the multi-objective structure of the obstacle avoidance of path planning, modularized reward function design, and combined artificial potential field method to set continuous rewards. This research also proposes a new algorithm called deep SumTree-deterministic policy gradient algorithm (SumTree-DDPG), which improves the random storage and extraction strategy of DDPG algorithm experience samples. According to the importance of the experience samples, the samples are classified and stored in combination with the SumTree structure, high-quality samples are extracted continuously, and SumTree-DDPG algorithm finally improves the speed of the convergence model. Finally, this research uses Python language to write an underwater canyon simulation environment and builds a deep reinforcement learning simulation platform on a high-performance computer to conduct simulation learning training for AUV. Data simulation verified that the proposed path planning method can guide the under-actuated underwater robot to navigate to the target without colliding with any obstacles. In comparison with the DDPG algorithm, the stability, training’s total reward, and robustness of the improved Sumtree-DDPG algorithm planner in this study are better.

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text

Coordinated optimization control of shield machine based on dynamic fuzzy neural network direct inverse control

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220980274 ◽

2020 ◽

pp. 014233122098027

Author(s):

Xuanyu Liu ◽

Wentao Wang ◽

Yudong Wang ◽

Cheng Shao ◽

Qiumei Cong

Keyword(s):

Fuzzy Neural Network ◽

Control Method ◽

Earth Pressure ◽

Screw Conveyor ◽

Inverse Control ◽

The Earth ◽

Coordinated Optimization ◽

Optimization Control ◽

Fuzzy Neural ◽

Shield Machine

During shield machine tunneling, the earth pressure in the sealed cabin must be kept balanced to ensure construction safety. As there is a strong nonlinear coupling relationship among the tunneling parameters, it is difficult to control the balance between the amount of soil entered and the amount discharged in the sealed cabin. So, the control effect of excavation face stability is poor. For this purpose, a coordinated optimization control method of shield machine based on dynamic fuzzy neural network (D-FNN) direct inverse control is proposed. The cutter head torque, advance speed, thrust, screw conveyor speed and earth pressure difference in the sealed cabin are selected as inputs, and the D-FNN control model of the control parameters is established, whose output are screw conveyor speed and advance speed at the next moment. The error reduction rate method is introduced to trim and identify the network structure to optimize the control model. On this basis, an optimal control system for earth pressure balance (EPB) of shield machine is established based on the direct inverse control method. The simulation results show that the method can optimize the control parameters coordinately according to the changes of the construction environment, effectively reduce the earth pressure fluctuations during shield tunneling, and can better control the stability of the excavation surface.

Download Full-text

Self-guided deep deterministic policy gradient with multi-actor

Neural Computing and Applications ◽

10.1007/s00521-021-05738-9 ◽

2021 ◽

Author(s):

Hongming Chen ◽

Quan Liu ◽

Shan Zhong

Keyword(s):

Policy Gradient

Download Full-text

Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9326973 ◽

2020 ◽

Author(s):

Mingchao Wang ◽

Xiaogang Ruan ◽

Xiaoqing Zhu

Keyword(s):

Quadruped Robot ◽

Gradient Algorithm ◽

Policy Gradient ◽

Gait Learning

Download Full-text

Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305446 ◽

2020 ◽

Author(s):

Yunhong Ma ◽

Shuyao Bai ◽

Yifei Zhao ◽

Chao Song ◽

Jie Yang

Keyword(s):

Reinforcement Learning ◽

Policy Gradient

Download Full-text

Safe option-critic: learning safety in the option-critic architecture

The Knowledge Engineering Review ◽

10.1017/s0269888921000035 ◽

2021 ◽

Vol 36 ◽

Author(s):

Arushi Jain ◽

Khimya Khetarpal ◽

Doina Precup

Keyword(s):

Model Uncertainty ◽

Gradient Algorithm ◽

Intrinsic Variability ◽

Expected Return ◽

Practical Applications ◽

Hierarchical Reinforcement Learning ◽

Continuous State ◽

End Conditions ◽

Policy Gradient ◽

High Uncertainty

Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications but also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.

Download Full-text

Primary side optimization control of heating station based on double-delay depth deterministic policy gradient

Experiments on Time-Optimization Control by Multiple Bang-Bang for Flexible Space Structures.

Efficiency Optimization Control of an IPMSM Drive System for Electric Vehicles (EVs)

Deep Deterministic Policy Gradient-DRL Enabled Multiphysics-Constrained Fast Charging of Lithium-Ion Battery

A 2D Optimal Path Planning Algorithm for Autonomous Underwater Vehicle Driving in Unknown Underwater Canyons

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

Coordinated optimization control of shield machine based on dynamic fuzzy neural network direct inverse control

Self-guided deep deterministic policy gradient with multi-actor

Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm

Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

Safe option-critic: learning safety in the option-critic architecture

Export Citation Format