scholarly journals Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization

2020 ◽  
Vol 124 ◽  
pp. 105032
Author(s):  
Ying Chen ◽  
Feng Liu ◽  
Jay M. Rosenberger ◽  
Victoria C.P. Chen ◽  
Asama Kulvanitchaiyanunt ◽  
...  
Author(s):  
Tohid Sardarmehni ◽  
Ali Heydari

Approximate dynamic programming, also known as reinforcement learning, is applied for optimal control of Antilock Brake Systems (ABS) in ground vehicles. As an accurate and control oriented model of the brake system, quarter vehicle model with hydraulic brake system is selected. Due to the switching nature of hydraulic brake system of ABS, an optimal switching solution is generated through minimizing a performance index that penalizes the braking distance and forces the vehicle velocity to go to zero, while preventing wheel lock-ups. Towards this objective, a value iteration algorithm is selected for ‘learning’ the infinite horizon solution. Artificial neural networks, as powerful function approximators, are utilized for approximating the value function. The training is conducted offline using least squares. Once trained, the converged neural network is used for determining optimal decisions for the actuators on the fly. Numerical simulations show that this approach is very promising while having low real-time computational burden, hence, outperforms many existing solutions in the literature.


Author(s):  
Phillip R. Jenkins ◽  
Matthew J. Robbins ◽  
Brian J. Lunday

Military medical planners must consider how aerial medical evacuation (MEDEVAC) assets will be dispatched when preparing for and supporting high-intensity combat operations. The dispatching authority seeks to dispatch MEDEVAC assets to prioritized requests for service, such that battlefield casualties are effectively and efficiently transported to nearby medical-treatment facilities. We formulate and solve a discounted, infinite-horizon Markov decision process (MDP) model of the MEDEVAC dispatching problem. Because the high dimensionality and uncountable state space of our MDP model renders classical dynamic programming solution methods intractable, we instead apply approximate dynamic programming (ADP) solution methods to produce high-quality dispatching policies relative to the currently practiced closest-available dispatching policy. We develop, test, and compare two distinct ADP solution techniques, both of which utilize an approximate policy iteration (API) algorithmic framework. The first algorithm uses least-squares temporal differences (LSTD) learning for policy evaluation, whereas the second algorithm uses neural network (NN) learning. We construct a notional, yet representative planning scenario based on high-intensity combat operations in southern Azerbaijan to demonstrate the applicability of our MDP model and to compare the efficacies of our proposed ADP solution techniques. We generate 30 problem instances via a designed experiment to examine how selected problem features and algorithmic features affect the quality of solutions attained by our ADP policies. Results show that the respective policies determined by the NN-API and LSTD-API algorithms significantly outperform the closest-available benchmark policies in 27 (90%) and 24 (80%) of the problem instances examined. Moreover, the NN-API policies significantly outperform the LSTD-API policies in each of the problem instances examined. Compared with the closest-available policy for the baseline problem instance, the NN-API policy decreases the average response time of important urgent (i.e., life-threatening) requests by 39 minutes. These research models, methodologies, and results inform the implementation and modification of current and future MEDEVAC tactics, techniques, and procedures, as well as the design and purchase of future aerial MEDEVAC assets.


Author(s):  
Hossein Nejatbakhsh Esfahani ◽  
Rafal Szlapczynski

AbstractThis paper proposes a hybrid robust-adaptive learning-based control scheme based on Approximate Dynamic Programming (ADP) for the tracking control of autonomous ship maneuvering. We adopt a Time-Delay Control (TDC) approach, which is known as a simple, practical, model free and roughly robust strategy, combined with an Actor-Critic Approximate Dynamic Programming (ACADP) algorithm as an adaptive part in the proposed hybrid control algorithm. Based on this integration, Actor-Critic Time-Delay Control (AC-TDC) is proposed. It offers a high-performance robust-adaptive control approach for path following of autonomous ships under deterministic and stochastic disturbances induced by the winds, waves, and ocean currents. Computer simulations have been conducted under two different conditions in terms of the deterministic and stochastic disturbances and all simulation results indicate an acceptable performance in tracking of paths for the proposed control algorithm in comparison with the conventional TDC approach.


Sign in / Sign up

Export Citation Format

Share Document