Reinforcement Learning Based Adaptive Optimal Strategy in Robotic Control Systems

Author(s):  
Phuong Nam Dao ◽  
Hong Quang Nguyen
Author(s):  
Александр Александрович Воевода ◽  
Дмитрий Олегович Романников

Синтез регуляторов для многоканальных систем - актуальная и сложная задача. Одним из возможных способов синтеза является применение нейронных сетей. Нейронный регулятор либо обучают на предварительно рассчитанных данных, либо используют для настройки параметров ПИД-регулятора из начального устойчивого положения замкнутой системы. Предложено использовать нейронные сети для регулирования двухканального объекта, при этом обучение будет выполняться из неустойчивого (произвольного) начального положения с применением методов обучения нейронных сетей с подкреплением. Предложена структура нейронной сети и замкнутой системы, в которой уставка задается при помощи входного параметра нейронной сети регулятора The problem for synthesis of automatic control systems is hard, especially for multichannel objects. One of the approaches is the use of neural networks. For the approaches that are based on the use of reinforcement learning, there is an additional issue - supporting of range of values for the set points. The method of synthesis of automatic control systems using neural networks and the process of its learning with reinforcement learning that allows neural networks learning for supporting regulation is proposed in the predefined range of set points. The main steps of the method are 1) to form a neural net input as a state of the object and system set point; 2) to perform modelling of the system with a set of randomly generated set points from the desired range; 3) to perform a one-step of the learning using the Deterministic Policy Gradient method. The originality of the proposed method is that, in contrast to existing methods of using a neural network to synthesize a controller, the proposed method allows training a controller from an unstable initial state in a closed system and set of a range of set points. The method was applied to the problem of stabilizing the outputs of a two-channel object, for which stabilization both outputs and the first near the input set point is required


1993 ◽  
Vol 26 (2) ◽  
pp. 515-518
Author(s):  
E.R. Fielding ◽  
E.D. Illos

Data ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 42
Author(s):  
Diana Kafkes ◽  
Jason St. John

The Booster Operation Optimization Sequential Time-series for Regression (BOOSTR) dataset was created to provide a cycle-by-cycle time series of readings and settings from instruments and controllable devices of the Booster, Fermilab’s Rapid-Cycling Synchrotron (RCS) operating at 15 Hz. BOOSTR provides a time series from 55 device readings and settings that pertain most directly to the high-precision regulation of the Booster’s gradient magnet power supply (GMPS). To our knowledge, this is one of the first well-documented datasets of accelerator device parameters made publicly available. We are releasing it in the hopes that it can be used to demonstrate aspects of artificial intelligence for advanced control systems, such as reinforcement learning and autonomous anomaly detection.


2017 ◽  
Vol 2017 (1) ◽  
pp. 1612-1628
Author(s):  
Laura M. Fitzpatrick ◽  
A Zachary Trimble ◽  
Brian S. Bingham

ABSTRACT A marine pollutant spill environmental model that can accurately predict fine scale pollutant concentration variations on a free surface is needed in early stages of testing robotic control systems for tracking pollutant spills. The model must reproduce, for use in a robotic control system simulation environment, the fine-scale surface concentration variations observed by a robot. Furthermore, to facilitate development of robotic control systems, the model must reproduce sample spill distributions in minimal computational time. A combination Eulerian-Lagrangian type model, with two tuning parameters, was developed to produce, with minimal computational effort, the fine scale concentrations that would be observed by a robot. Multiple model scenarios were run with different tuning parameters to determine the effects of those parameters on the model’s ability to reproduce an experimental measured pollutant plume’s structure. A qualitative method for analyzing the concentration variations was established using amplitude and temporal statistical parameters. The differences in the statistical parameters between the model and experiment vary from 69%–316%. After tuning, the model produces a sample spill, which includes a high frequency concentration component not observed in the experimental data, but that generally represents the real-time, fine scale pollutant plume structure and can be used for testing control algorithms.


Sign in / Sign up

Export Citation Format

Share Document