Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems

Benefitting from the technology of integral reinforcement learning, the nonzero sum (NZS) game for distributed parameter systems is effectively solved in this paper when the information of system dynamics are unavailable. The Karhunen-Loève decomposition (KLD) is employed to convert the partial differential equation (PDE) systems into high-order ordinary differential equation (ODE) systems. Moreover, the off-policy IRL technology is introduced to design the optimal strategies for the NZS game. To confirm that the presented algorithm will converge to the optimal value functions, the traditional adaptive dynamic programming (ADP) method is first discussed. Then, the equivalence between the traditional ADP method and the presented off-policy method is proved. For implementing the presented off-policy IRL method, actor and critic neural networks are utilized to approach the value functions and control strategies in the iteration process, individually. Finally, a numerical simulation is shown to illustrate the effectiveness of the proposal off-policy algorithm.

Download Full-text

Online Optimal Control of Robotic Systems with Single Critic NN-Based Reinforcement Learning

Complexity ◽

10.1155/2021/8839391 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Xiaoyi Long ◽

Zheng He ◽

Zhongyuan Wang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Tracking Control ◽

Learning Algorithm ◽

Tracking Error ◽

Adaptive Dynamic Programming ◽

Robotic Systems ◽

Control Synthesis ◽

Optimal Tracking ◽

Optimal Tracking Control

This paper suggests an online solution for the optimal tracking control of robotic systems based on a single critic neural network (NN)-based reinforcement learning (RL) method. To this end, we rewrite the robotic system model as a state-space form, which will facilitate the realization of optimal tracking control synthesis. To maintain the tracking response, a steady-state control is designed, and then an adaptive optimal tracking control is used to ensure that the tracking error can achieve convergence in an optimal sense. To solve the obtained optimal control via the framework of adaptive dynamic programming (ADP), the command trajectory to be tracked and the modified tracking Hamilton-Jacobi-Bellman (HJB) are all formulated. An online RL algorithm is the developed to address the HJB equation using a critic NN with online learning algorithm. Simulation results are given to verify the effectiveness of the proposed method.

Download Full-text

SYSTEM CHARACTERISTICS OF DISTRIBUTED PARAMETER SYSTEMS

Proceedings of the Technical University of Sofia ◽

10.47978/tus.2020.70.03.018 ◽

2020 ◽

Vol 70 (3) ◽

pp. 34-44

Author(s):

Kamen Perev

Keyword(s):

Differential Equation ◽

Partial Differential Equation ◽

Initial Conditions ◽

Distributed Parameter Systems ◽

Point Of View ◽

Distributed Parameter ◽

Partial Differential ◽

Special Cases ◽

The Green Function ◽

Set Up

The paper considers the problem of distributed parameter systems modeling. The basic model types are presented, depending on the partial differential equation, which determines the physical processes dynamics. The similarities and the differences with the models described in terms of ordinary differential equations are discussed. A special attention is paid to the problem of heat flow in a rod. The problem set up is demonstrated and the methods of its solution are discussed. The main characteristics from a system point of view are presented, namely the Green function and the transfer function. Different special cases for these characteristics are discussed, depending on the specific partial differential equation, as well as the initial conditions and the boundary conditions.

Download Full-text

Reward-Free Reinforcement Learning Algorithm Using Prediction Network

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200744 ◽

2020 ◽

Author(s):

Zhen Yu ◽

Yimin Feng ◽

Lijun Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Value Functions ◽

Learning Method ◽

Reward Function ◽

Network Training ◽

Learning Tasks ◽

Reward Value ◽

Policy Gradient ◽

Reward Functions

In general reinforcement learning tasks, the formulation of reward functions is a very important step in reinforcement learning. The reward function is not easy to formulate in a large number of systems. The network training effect is sensitive to the reward function, and different reward value functions will get different results. For a class of systems that meet specific conditions, the traditional reinforcement learning method is improved. A state quantity function is designed to replace the reward function, which is more efficient than the traditional reward function. At the same time, the predictive network link is designed so that the network can learn the value of the general state by using the special state. The overall structure of the network will be improved based on the Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the algorithm was successfully applied in the environment of FrozenLake, and achieved good performance. The experiment proves the effectiveness of the algorithm and realizes rewardless reinforcement learning in a class of systems.

Download Full-text

Adaptive Dynamic Programming for Optimal Control of Nonlinear Distributed Parameter Systems

Adaptive Learning Methods for Nonlinear System Modeling ◽

10.1016/b978-0-12-812976-0.00019-1 ◽

2018 ◽

pp. 335-359

Author(s):

Biao Luo ◽

Derong Liu ◽

Huai-Ning Wu

Keyword(s):

Optimal Control ◽

Dynamic Programming ◽

Distributed Parameter Systems ◽

Adaptive Dynamic Programming ◽

Distributed Parameter ◽

Adaptive Dynamic

Download Full-text

A PD-Type Iterative Learning Algorithm for Semi-Linear Distributed Parameter Systems With Sensors/Actuators

IEEE Access ◽

10.1109/access.2019.2950456 ◽

2019 ◽

Vol 7 ◽

pp. 159037-159047 ◽

Cited By ~ 1

Author(s):

Jianxiang Zhang ◽

Baotong Cui ◽

Zhengxian Jiang ◽

Juan Chen

Keyword(s):

Learning Algorithm ◽

Distributed Parameter Systems ◽

Iterative Learning ◽

Distributed Parameter

Download Full-text

A Newton-type iterative learning algorithm of output tracking control for uncertain nonlinear distributed parameter systems

Proceedings of the 33rd Chinese Control Conference ◽

10.1109/chicc.2014.6896498 ◽

2014 ◽

Cited By ~ 4

Author(s):

Jingli Kang

Keyword(s):

Tracking Control ◽

Learning Algorithm ◽

Distributed Parameter Systems ◽

Iterative Learning ◽

Distributed Parameter ◽

Output Tracking ◽

Output Tracking Control

Download Full-text

Robotic arm reinforcement learning control method based on autonomous visual perception

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20213951057 ◽

2021 ◽

Vol 39 (5) ◽

pp. 1057-1063

Author(s):

Chunyang HU ◽

Heng WANG ◽

Haobin SHI

Keyword(s):

Reinforcement Learning ◽

Visual Perception ◽

Control Method ◽

Learning Algorithm ◽

Control Strategies ◽

Control Process ◽

Target Position ◽

Robotic Arm ◽

Environmental Models ◽

Strategy Control

The traditional robotic arm control methods are often based on artificially preset fixed trajectories to control them to complete specific tasks, which rely on accurate environmental models, and the control process lacks the ability of self-adaptability. Aiming at the above problems, we proposed an end-to-end robotic arm intelligent control method based on the combination of machine vision and reinforcement learning. The visual perception uses the YOLO algorithm, and the strategy control module uses the DDPG reinforcement learning algorithm, which enables the robotic arm to learn autonomous control strategies in a complex environment. Otherwise, we used imitation learning and hindsight experience replay algorithm during the training process, which accelerated the learning process of the robotic arm. The experimental results show that the algorithm can converge in a shorter time, and it has excellent performance in autonomously perceiving the target position and overall strategy control in the simulation environment.

Download Full-text

Cortical mechanisms for reinforcement learning in competitive games

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2008.0158 ◽

2008 ◽

Vol 363 (1511) ◽

pp. 3845-3857 ◽

Cited By ~ 40

Author(s):

Hyojung Seo ◽

Daeyeol Lee

Keyword(s):

Game Theory ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Optimal Strategies ◽

Simple Zero ◽

Anterior Cingulate ◽

Outcome Monitoring ◽

Dorsolateral Prefrontal ◽

Competitive Games ◽

Zero Sum

Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys ( Macaca mulatta ) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy.

Download Full-text

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Journal of Artificial Intelligence Research ◽

10.1613/jair.639 ◽

2000 ◽

Vol 13 ◽

pp. 227-303 ◽

Cited By ~ 389

Author(s):

T. G. Dietterich

Keyword(s):

Reinforcement Learning ◽

Optimal Policy ◽

Value Function ◽

Learning Algorithm ◽

Value Functions ◽

Procedural Semantics ◽

Hierarchical Reinforcement Learning ◽

Model Free ◽

Function Decomposition ◽

The Value Function

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consistent with the given hierarchy. The decomposition also creates opportunities to exploit state abstractions, so that individual MDPs within the hierarchy can ignore large parts of the state space. This is important for the practical application of the method. This paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.

Download Full-text

Exact and approximate controllability for distributed parameter systems

Acta Numerica ◽

10.1017/s0962492900002452 ◽

1994 ◽

Vol 3 ◽

pp. 269-378 ◽

Cited By ~ 42

Author(s):

R. Glowinski ◽

J.L. Lions

Keyword(s):

Differential Equation ◽

Partial Differential Equation ◽

Approximate Controllability ◽

Distributed Parameter Systems ◽

Distributed Parameter ◽

Partial Differential ◽

Control Functions

We consider a system whose state is given by the solution y to a Partial Differential Equation (PDE) of evolution, and which contains control functions, denoted by v.

Download Full-text