scholarly journals Using Control Theory for Analysis of Reinforcement Learning and Optimal Policy Properties in Grid-World Problems

Author(s):  
S. Mostapha Kalami Heris ◽  
Mohammad-Bagher Naghibi Sistani ◽  
Naser Pariz
Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


2021 ◽  
Author(s):  
Yunfan Su

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.


2021 ◽  
Author(s):  
Yunfan Su

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.


2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Bo Dong ◽  
Yuanchun Li

A novel decentralized reinforcement learning robust optimal tracking control theory for time varying constrained reconfigurable modular robots based on action-critic-identifier (ACI) and state-action value function (Q-function) has been presented to solve the problem of the continuous time nonlinear optimal control policy for strongly coupled uncertainty robotic system. The dynamics of time varying constrained reconfigurable modular robot is described as a synthesis of interconnected subsystem, and continuous time state equation andQ-function have been designed in this paper. Combining with ACI and RBF network, the global uncertainty of the subsystem and the HJB (Hamilton-Jacobi-Bellman) equation have been estimated, where critic-NN and action-NN are used to approximate the optimalQ-function and the optimal control policy, and the identifier is adopted to identify the global uncertainty as well as RBF-NN which is used to update the weights of ACI-NN. On this basis, a novel decentralized robust optimal tracking controller of the subsystem is proposed, so that the subsystem can track the desired trajectory and the tracking error can converge to zero in a finite time. The stability of ACI and the robust optimal tracking controller are confirmed by Lyapunov theory. Finally, comparative simulation examples are presented to illustrate the effectiveness of the proposed ACI and decentralized control theory.


2020 ◽  
Author(s):  
Sean Nolan ◽  
Matthew Lanier ◽  
Andrew Haines ◽  
Linus Mockus ◽  
Kristopher Ezra ◽  
...  

2021 ◽  
pp. 1-12
Author(s):  
Zhiyou Yang ◽  
Hong Qu ◽  
Mingsheng Fu ◽  
Wang Hu ◽  
Yongze Zhao

Energies ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 2830
Author(s):  
David Domínguez-Barbero ◽  
Javier García-González ◽  
Miguel A. Sanz-Bobi ◽  
Eugenio F. Sánchez-Úbeda

The deployment of microgrids could be fostered by control systems that do not require very complex modelling, calibration, prediction and/or optimisation processes. This paper explores the application of Reinforcement Learning (RL) techniques for the operation of a microgrid. The implemented Deep Q-Network (DQN) can learn an optimal policy for the operation of the elements of an isolated microgrid, based on the interaction agent-environment when particular operation actions are taken in the microgrid components. In order to facilitate the scaling-up of this solution, the algorithm relies exclusively on historical data from past events, and therefore it does not require forecasts of the demand or the renewable generation. The objective is to minimise the cost of operating the microgrid, including the penalty of non-served power. This paper analyses the effect of considering different definitions for the state of the system by expanding the set of variables that define it. The obtained results are very satisfactory as it can be concluded by their comparison with the perfect-information optimal operation computed with a traditional optimisation model, and with a Naive model.


Author(s):  
Swetasudha Panda ◽  
Yevgeniy Vorobeychik

We propose a novel Stackelberg game model of MDP interdiction in which the defender modifies the initial state of the planner, who then responds by computing an optimal policy starting with that state. We first develop a novel approach for MDP interdiction in factored state space that allows the defender to modify the initial state. The resulting approach can be computationally expensive for large factored MDPs. To address this, we develop several interdiction algorithms that leverage variations of reinforcement learning using both linear and non-linear function approximation. Finally, we extend the interdiction framework to consider a Bayesian interdiction problem in which the interdictor is uncertain about some of the planner's initial state features. Extensive experiments demonstrate the effectiveness of our approaches.


Sign in / Sign up

Export Citation Format

Share Document