Reinforcement Learning Based Adaptive Optimal Strategy in Robotic Control Systems

Синтез регуляторов для многоканальных систем - актуальная и сложная задача. Одним из возможных способов синтеза является применение нейронных сетей. Нейронный регулятор либо обучают на предварительно рассчитанных данных, либо используют для настройки параметров ПИД-регулятора из начального устойчивого положения замкнутой системы. Предложено использовать нейронные сети для регулирования двухканального объекта, при этом обучение будет выполняться из неустойчивого (произвольного) начального положения с применением методов обучения нейронных сетей с подкреплением. Предложена структура нейронной сети и замкнутой системы, в которой уставка задается при помощи входного параметра нейронной сети регулятора The problem for synthesis of automatic control systems is hard, especially for multichannel objects. One of the approaches is the use of neural networks. For the approaches that are based on the use of reinforcement learning, there is an additional issue - supporting of range of values for the set points. The method of synthesis of automatic control systems using neural networks and the process of its learning with reinforcement learning that allows neural networks learning for supporting regulation is proposed in the predefined range of set points. The main steps of the method are 1) to form a neural net input as a state of the object and system set point; 2) to perform modelling of the system with a set of randomly generated set points from the desired range; 3) to perform a one-step of the learning using the Deterministic Policy Gradient method. The originality of the proposed method is that, in contrast to existing methods of using a neural network to synthesize a controller, the proposed method allows training a controller from an unstable initial state in a closed system and set of a range of set points. The method was applied to the problem of stabilizing the outputs of a two-channel object, for which stabilization both outputs and the first near the input set point is required

Download Full-text

REINFORCEMENT LEARNING IN CONTROL SYSTEMS OF OBJECTS WITH A TRANSPORT DELAY

Автометрия ◽

10.15372/aut20210306 ◽

2021 ◽

Vol 57 (3) ◽

pp. 48-57

Author(s):

V.S. Borovik ◽

S.V. Shidlovskiy

Keyword(s):

Reinforcement Learning ◽

Control Systems ◽

Transport Delay

Download Full-text

P.C. Based Robotic Control Systems

IFAC Proceedings Volumes ◽

10.1016/s1474-6670(17)48780-5 ◽

1993 ◽

Vol 26 (2) ◽

pp. 515-518

Author(s):

E.R. Fielding ◽

E.D. Illos

Keyword(s):

Control Systems ◽

Robotic Control

Download Full-text

ALFA: a language for programming reactive robotic control systems

Proceedings. 1991 IEEE International Conference on Robotics and Automation ◽

10.1109/robot.1991.131743 ◽

2002 ◽

Cited By ~ 23

Author(s):

E. Gat

Keyword(s):

Control Systems ◽

Robotic Control

Download Full-text

An Idea of Using Reinforcement Learning in Adaptive Control Systems

International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL'06) ◽

10.1109/icniconsmcl.2006.52 ◽

2006 ◽

Cited By ~ 2

Author(s):

L. Koszaka ◽

R. Rudek ◽

I. Pozniak-Koszalka

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Control Systems ◽

Adaptive Control Systems

Download Full-text

BOOSTR: A Dataset for Accelerator Control Systems

Data ◽

10.3390/data6040042 ◽

2021 ◽

Vol 6 (4) ◽

pp. 42

Author(s):

Diana Kafkes ◽

Jason St. John

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Reinforcement Learning ◽

Control Systems ◽

Power Supply ◽

Cycle Time ◽

Rapid Cycling ◽

Operation Optimization ◽

Advanced Control ◽

Rapid Cycling Synchrotron

The Booster Operation Optimization Sequential Time-series for Regression (BOOSTR) dataset was created to provide a cycle-by-cycle time series of readings and settings from instruments and controllable devices of the Booster, Fermilab’s Rapid-Cycling Synchrotron (RCS) operating at 15 Hz. BOOSTR provides a time series from 55 device readings and settings that pertain most directly to the high-precision regulation of the Booster’s gradient magnet power supply (GMPS). To our knowledge, this is one of the first well-documented datasets of accelerator device parameters made publicly available. We are releasing it in the hopes that it can be used to demonstrate aspects of artificial intelligence for advanced control systems, such as reinforcement learning and autonomous anomaly detection.

Download Full-text

VERIFICATION OF A MARINE POLLUTANT SURFACE PLUME MODEL FOR USE IN THE DEVELOPMENT OF AUTONOMOUS VEHICLE TRACKING SYSTEMS

International Oil Spill Conference Proceedings ◽

10.7901/2169-3358-2017.1.1612 ◽

2017 ◽

Vol 2017 (1) ◽

pp. 1612-1628

Author(s):

Laura M. Fitzpatrick ◽

A Zachary Trimble ◽

Brian S. Bingham

Keyword(s):

Control Systems ◽

Surface Concentration ◽

Computational Effort ◽

Computational Time ◽

Type Model ◽

Fine Scale ◽

Multiple Model ◽

Robotic Control ◽

Statistical Parameters ◽

Tuning Parameters

ABSTRACT A marine pollutant spill environmental model that can accurately predict fine scale pollutant concentration variations on a free surface is needed in early stages of testing robotic control systems for tracking pollutant spills. The model must reproduce, for use in a robotic control system simulation environment, the fine-scale surface concentration variations observed by a robot. Furthermore, to facilitate development of robotic control systems, the model must reproduce sample spill distributions in minimal computational time. A combination Eulerian-Lagrangian type model, with two tuning parameters, was developed to produce, with minimal computational effort, the fine scale concentrations that would be observed by a robot. Multiple model scenarios were run with different tuning parameters to determine the effects of those parameters on the model’s ability to reproduce an experimental measured pollutant plume’s structure. A qualitative method for analyzing the concentration variations was established using amplitude and temporal statistical parameters. The differences in the statistical parameters between the model and experiment vary from 69%–316%. After tuning, the model produces a sample spill, which includes a high frequency concentration component not observed in the experimental data, but that generally represents the real-time, fine scale pollutant plume structure and can be used for testing control algorithms.

Download Full-text