Model-Free Dual Heuristic Dynamic Programming

AbstractThis paper proposes a hybrid robust-adaptive learning-based control scheme based on Approximate Dynamic Programming (ADP) for the tracking control of autonomous ship maneuvering. We adopt a Time-Delay Control (TDC) approach, which is known as a simple, practical, model free and roughly robust strategy, combined with an Actor-Critic Approximate Dynamic Programming (ACADP) algorithm as an adaptive part in the proposed hybrid control algorithm. Based on this integration, Actor-Critic Time-Delay Control (AC-TDC) is proposed. It offers a high-performance robust-adaptive control approach for path following of autonomous ships under deterministic and stochastic disturbances induced by the winds, waves, and ocean currents. Computer simulations have been conducted under two different conditions in terms of the deterministic and stochastic disturbances and all simulation results indicate an acceptable performance in tracking of paths for the proposed control algorithm in comparison with the conventional TDC approach.

Download Full-text

Automotive Engine Torque and Air-Fuel Ratio Control Using Dual Heuristic Dynamic Programming

The 2006 IEEE International Joint Conference on Neural Network Proceedings ◽

10.1109/ijcnn.2006.1716137 ◽

2006 ◽

Author(s):

H. Javaherian ◽

Derong Liu ◽

O. Kovalenko

Keyword(s):

Dynamic Programming ◽

Engine Torque ◽

Automotive Engine ◽

Fuel Ratio ◽

Heuristic Dynamic Programming

Download Full-text

Globalised Dual Heuristic Dynamic Programming in Tracking Control of the Wheeled Mobile Robot

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-07176-3_26 ◽

2014 ◽

pp. 290-301

Author(s):

Marcin Szuster

Keyword(s):

Dynamic Programming ◽

Mobile Robot ◽

Tracking Control ◽

Wheeled Mobile Robot ◽

Heuristic Dynamic Programming

Download Full-text

An Action Dependent Heuristic Dynamic Programming-controlled Superconducting Magnetic Energy Storage for Transient Stability Augmentation

Physics Procedia ◽

10.1016/j.phpro.2015.05.153 ◽

2015 ◽

Vol 65 ◽

pp. 286-290

Author(s):

Xinpu Wang ◽

Jun Yang ◽

Xiaodong Zhang ◽

Xiaopeng Yu

Keyword(s):

Dynamic Programming ◽

Energy Storage ◽

Transient Stability ◽

Magnetic Energy ◽

Superconducting Magnetic Energy Storage ◽

Stability Augmentation ◽

Heuristic Dynamic Programming ◽

Magnetic Energy Storage

Download Full-text

Globalized Dual Heuristic Dynamic Programming in Control of Robotic Manipulator

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.817.150 ◽

2016 ◽

Vol 817 ◽

pp. 150-161 ◽

Cited By ~ 2

Author(s):

Marcin Szuster ◽

Piotr Gierlak

Keyword(s):

Neural Networks ◽

Dynamic Programming ◽

Control System ◽

Degrees Of Freedom ◽

Dynamic Programming Algorithm ◽

Robotic Manipulator ◽

Iteration Step ◽

Programming Algorithm ◽

Heuristic Dynamic Programming ◽

The Stability

The article focuses on the implementation of the globalized dual-heuristic dynamic programming algorithm in the discrete tracking control system of the three degrees of freedom robotic manipulator. The globalized dual-heuristic dynamic programming algorithm is included in the approximate dynamic programming algorithms family, that bases on the Bellman’s dynamic programming idea. These algorithms generally consist of the actor and the critic structures realized in a form of artificial neural networks. Moreover, the control system includes the PD controller, the supervisory term and an additional control signal. The structure of the supervisory term derives from the stability analysis, which was realized using the Lyapunov stability theorem. The control system works on-line and the neural networks’ weight adaptation process is realized in every iteration step. A series of computer simulations was realized in Matlab/Simulink software to confirm performance of the control system.

Download Full-text

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336.v1 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Dynamic Programming Method ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text