Design and Comparison of Reinforcement-Learning-Based Time-Varying PID Controllers with Gain-Scheduled Actions

This paper presents innovative reinforcement learning methods for automatically tuning the parameters of a proportional integral derivative controller. Conventionally, the high dimension of the Q-table is a primary drawback when implementing a reinforcement learning algorithm. To overcome the obstacle, the idea underlying the n-armed bandit problem is used in this paper. Moreover, gain-scheduled actions are presented to tune the algorithms to improve the overall system behavior; therefore, the proposed controllers fulfill the multiple performance requirements. An experiment was conducted for the piezo-actuated stage to illustrate the effectiveness of the proposed control designs relative to competing algorithms.

Download Full-text

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6125 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6518-6525

Author(s):

Xiao Xu ◽

Fang Dong ◽

Yanghua Li ◽

Shaojian He ◽

Xin Li

Keyword(s):

Learning Algorithm ◽

General Setting ◽

Personalized Recommendation ◽

Time Varying ◽

Bandit Problem ◽

User Interests ◽

Specific Preference ◽

Coefficient Vector ◽

Real World Datasets ◽

Efficient Learning

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

Download Full-text

The Application of a New PID Autotuning Method for the Steam/Water Loop in Large Scale Ships

Processes ◽

10.3390/pr8020196 ◽

2020 ◽

Vol 8 (2) ◽

pp. 196 ◽

Cited By ~ 1

Author(s):

Shiquan Zhao ◽

Sheng Liu ◽

Robain De Keyser ◽

Clara-Mihaela Ionescu

Keyword(s):

Predictive Control ◽

Large Scale ◽

Pid Controllers ◽

Empirical Knowledge ◽

Control Performance ◽

Proportional Integral Derivative ◽

Controlled System ◽

Performance Requirements ◽

Gain Margin ◽

Forbidden Region

In large scale ships, the most used controllers for the steam/water loop are still the proportional-integral-derivative (PID) controllers. However, the tuning rules for the PID parameters are based on empirical knowledge and the performance for the loops is not satisfying. In order to improve the control performance of the steam/water loop, the application of a recently developed PID autotuning method is studied. Firstly, a ‘forbidden region’ on the Nyquist plane can be obtained based on user-defined performance requirements such as robustness or gain margin and phase margin. Secondly, the dynamic of the system can be obtained with a sine test around the operation point. Finally, the PID controller’s parameters can be obtained by locating the frequency response of the controlled system at the edge of the ‘forbidden region’. To verify the effectiveness of the new PID autotuning method, comparisons are presented with other PID autotuning methods, as well as the model predictive control. The results show the superiority of the new PID autotuning method.

Download Full-text

Gain-scheduled Smith proportional–integral derivative controllers for linear parameter varying first-order plus time-varying delay systems

IET Control Theory and Applications ◽

10.1049/iet-cta.2010.0088 ◽

2011 ◽

Vol 5 (18) ◽

pp. 2142-2155 ◽

Cited By ~ 18

Author(s):

Y. Bolea ◽

J. Blesa ◽

V. Puig

Keyword(s):

Delay Systems ◽

Time Varying ◽

Linear Parameter ◽

Proportional Integral Derivative ◽

Linear Parameter Varying ◽

Proportional Integral ◽

First Order ◽

Time Varying Delay ◽

Gain Scheduled ◽

Varying Delay

Download Full-text

Gain-scheduled Smith PID controllers for LPV first order plus time varying delay systems

2007 European Control Conference (ECC) ◽

10.23919/ecc.2007.7068416 ◽

2007 ◽

Author(s):

Yolanda Bolea ◽

Vicenc Puig ◽

R. Sanchez-Pena

Keyword(s):

Delay Systems ◽

Pid Controllers ◽

Time Varying ◽

First Order ◽

Time Varying Delay ◽

Gain Scheduled ◽

Varying Delay

Download Full-text

Switching Gain-Scheduled Proportional–Integral–Derivative Electronic Throttle Control for Automotive Engines

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4039152 ◽

2018 ◽

Vol 140 (7) ◽

Cited By ~ 5

Author(s):

Arman Zandi Nia ◽

Ryozo Nagamune

Keyword(s):

Pid Controller ◽

Controller Design ◽

Pid Controllers ◽

Proportional Integral Derivative ◽

Battery Voltage ◽

Electronic Throttle ◽

Automotive Engines ◽

Throttle Valve ◽

Pid Controller Design ◽

Gain Scheduled

This paper proposes an application of the switching gain-scheduled (S-GS) proportional–integral–derivative (PID) control technique to the electronic throttle control (ETC) problem in automotive engines. For the S-GS PID controller design, a published linear parameter-varying (LPV) model of the electronic throttle valve (ETV) is adopted whose dynamics change with both the throttle valve velocity variation and the battery voltage fluctuation. The designed controller consists of multiple GS PID controllers assigned to local subregions defined for varying throttle valve velocity and battery voltage. Hysteresis switching logic is employed for switching between local GS PID controllers based on the operating point. The S-GS PID controller design problem is formulated as a nonconvex optimization problem and tackled by solving its convex subproblems iteratively. Experimental results demonstrate overall superiority of the S-GS PID controller to conventional controllers in reference tracking performance of the throttle valve under various scenarios.

Download Full-text

Robust Gain-Scheduled Smith PID Controllers for Second Order LPV Systems with Time Varying Delay

IFAC Proceedings Volumes ◽

10.3182/20120328-3-it-3014.00034 ◽

2012 ◽

Vol 45 (3) ◽

pp. 199-204 ◽

Cited By ~ 4

Author(s):

Vicenç Puig ◽

Yolanda Bolea ◽

Joaquim Blesa

Keyword(s):

Second Order ◽

Pid Controllers ◽

Time Varying ◽

Time Varying Delay ◽

Lpv Systems ◽

Gain Scheduled ◽

Varying Delay

Download Full-text

Gain-Scheduled Smith PID Controllers for LPV Systems with Time Varying Delay: Application to an Open-flow Canal

IFAC Proceedings Volumes ◽

10.3182/20080706-5-kr-1001.02467 ◽

2008 ◽

Vol 41 (2) ◽

pp. 14564-14569 ◽

Cited By ~ 2

Author(s):

Yolanda Bolea ◽

Vicenç Puig ◽

Joaquim Blesa

Keyword(s):

Pid Controllers ◽

Time Varying ◽

Open Flow ◽

Time Varying Delay ◽

Lpv Systems ◽

Gain Scheduled ◽

Varying Delay

Download Full-text

Model dependent reinforcement learning algorithm for reservoir operation stochastic optimization

International Journal of Hydrology ◽

10.15406/ijh.2018.02.00129 ◽

2018 ◽

Vol 2 (5) ◽

Author(s):

Li Wenwu

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Reservoir Operation ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Reinforcement learning algorithm for one-warehouse multi-retailer inventory problem

Automation, Mechanical and Electrical Engineering ◽

10.2495/amee140161 ◽

2014 ◽

Author(s):

C.Y. Li ◽

X.T. Wang ◽

T.W. Zhang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Inventory Problem ◽

Reinforcement Learning Algorithm

Download Full-text

Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning

Symmetry ◽

10.3390/sym13030471 ◽

2021 ◽

Vol 13 (3) ◽

pp. 471

Author(s):

Jai Hoon Park ◽

Kang Hoon Lee

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Design Space ◽

Learning Algorithm ◽

Computational Design ◽

Computational Method ◽

Learning Ability ◽

Modular Robots ◽

Control Mechanisms ◽

Candidate Structure

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.

Download Full-text