Training effective deep reinforcement learning agents for real-time life-cycle production optimization

Abstract We propose a new measure of intelligence for general reinforcement learning agents, based on the notion that an agent’s environment can change at any step of execution of the agent. That is, an agent is considered to be interacting with its environment in real-time. In this sense, the resulting intelligence measure is more general than the universal intelligence measure (Legg and Hutter, 2007) and the anytime universal intelligence test (Hernández-Orallo and Dowe, 2010). A major advantage of the measure is that an agent’s computational complexity is factored into the measure in a natural manner. We show that there exist agents with intelligence arbitrarily close to the theoretical maximum, and that the intelligence of agents depends on their parallel processing capability. We thus believe that the measure can provide a better evaluation of agents and guidance for building practical agents with high intelligence.

Download Full-text

A Normative Supervisor for Reinforcement Learning Agents

Automated Deduction – CADE 28 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-79876-5_32 ◽

2021 ◽

pp. 565-576

Author(s):

Emery Neufeld ◽

Ezio Bartocci ◽

Agata Ciabattoni ◽

Guido Governatori

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Deontic Logic ◽

Theorem Prover ◽

Event Recorder ◽

Learning Agents ◽

Learning Agent

AbstractWe introduce a modular and transparent approach for augmenting the ability of reinforcement learning agents to comply with a given norm base. The normative supervisor module functions as both an event recorder and real-time compliance checker w.r.t. an external norm base. We have implemented this module with a theorem prover for defeasible deontic logic, in a reinforcement learning agent that we task with playing a “vegan” version of the arcade game Pac-Man.

Download Full-text

Design of a Reinforcement Learning-Based Lane Keeping Planning Agent for Automated Vehicles

Applied Sciences ◽

10.3390/app10207171 ◽

2020 ◽

Vol 10 (20) ◽

pp. 7171

Author(s):

Bálint Kővári ◽

Ferenc Hegedüs ◽

Tamás Bécsi

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Autonomous Vehicles ◽

High Performance ◽

Search Algorithm ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Learning Agents ◽

Combined Solution ◽

Lane Keeping

Reinforcement learning-based approaches are widely studied in the literature for solving different control tasks for Connected and Autonomous Vehicles, from which this paper deals with the problem of lateral control of a dynamic nonlinear vehicle model, performing the task of lane-keeping. In this area, the appropriate formulation of the goals and environment information is crucial, for which the research outlines the importance of lookahead information, enabling to accomplish maneuvers with complex trajectories. Another critical part is the real-time manner of the problem. On the one hand, optimization or search based methods, such as the presented Monte Carlo Tree Search method, can solve the problem with the trade-off of high numerical complexity. On the other hand, single Reinforcement Learning agents struggle to learn these tasks with high performance, though they have the advantage that after the training process, they can operate in a real-time manner. Two planning agent structures are proposed in the paper to resolve this duality, where the machine learning agents aid the tree search algorithm. As a result, the combined solution provides high performance and low computational needs.

Download Full-text

Integration of Asset Management with Real Time Simulation to Improve Reliability, Preventive and Corrective Maintenance and Reduce Life Cycle Cost in Wastewater Treatment for Reuse

Proceedings of the Water Environment Federation ◽

10.2175/193864715819542214 ◽

2015 ◽

Vol 2015 (10) ◽

pp. 5152-5178 ◽

Cited By ~ 1

Author(s):

D Sen ◽

A Lodhi ◽

M Brooks ◽

R Angelotti ◽

A Godrej

Keyword(s):

Wastewater Treatment ◽

Life Cycle ◽

Real Time ◽

Asset Management ◽

Life Cycle Cost ◽

Corrective Maintenance ◽

Real Time Simulation ◽

Time Simulation ◽

Improve Reliability

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Real-time Energy Management of Microgrid Using Reinforcement Learning

2020 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) ◽

10.1109/dcabes50732.2020.00019 ◽

2020 ◽

Author(s):

Wenzheng Bi ◽

Yuankai Shu ◽

Wei Dong ◽

Qiang Yang

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Energy Management

Download Full-text

FPGA Acceleration of ROS2-Based Reinforcement Learning Agents

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) ◽

10.1109/candarw51189.2020.00031 ◽

2020 ◽

Author(s):

Daniel Pinheiro Leal ◽

Midori Sugaya ◽

Hideharu Amano ◽

Takeshi Ohkawa

Keyword(s):

Reinforcement Learning ◽

Learning Agents ◽

Fpga Acceleration

Download Full-text

Performance Study of Minimax and Reinforcement Learning Agents Playing the Turn-based Game Iwoki

Applied Artificial Intelligence ◽

10.1080/08839514.2021.1934265 ◽

2021 ◽

pp. 1-28

Author(s):

Santiago Videgaín ◽

Pablo García Sánchez

Keyword(s):

Reinforcement Learning ◽

Performance Study ◽

Learning Agents

Download Full-text

Real-Time Safety Optimization of Connected Vehicle Trajectories Using Reinforcement Learning

Sensors ◽

10.3390/s21113864 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3864

Author(s):

Tarek Ghoul ◽

Tarek Sayed

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Low Cost ◽

Safety Evaluation ◽

Traffic Volume ◽

Connected Vehicles ◽

Connected Vehicle ◽

Real World Data ◽

Physical Constraints ◽

Traffic Conflicts

Speed advisories are used on highways to inform vehicles of upcoming changes in traffic conditions and apply a variable speed limit to reduce traffic conflicts and delays. This study applies a similar concept to intersections with respect to connected vehicles to provide dynamic speed advisories in real-time that guide vehicles towards an optimum speed. Real-time safety evaluation models for signalized intersections that depend on dynamic traffic parameters such as traffic volume and shock wave characteristics were used for this purpose. The proposed algorithm incorporates a rule-based approach alongside a Deep Deterministic Policy Gradient reinforcement learning technique (DDPG) to assign ideal speeds for connected vehicles at intersections and improve safety. The system was tested on two intersections using real-world data and yielded an average reduction in traffic conflicts ranging from 9% to 23%. Further analysis was performed to show that the algorithm yields tangible results even at lower market penetration rates (MPR). The algorithm was tested on the same intersection with different traffic volume conditions as well as on another intersection with different physical constraints and characteristics. The proposed algorithm provides a low-cost approach that is not computationally intensive and works towards optimizing for safety by reducing rear-end traffic conflicts.

Download Full-text

Real-Time Production Optimization - Applying a Digital Twin Model to Optimize the Entire Upstream Value Chain

10.2118/197693-ms ◽

2019 ◽

Cited By ~ 1

Author(s):

Bob Okhuijsen ◽

Kevin Wade

Keyword(s):

Real Time ◽

Value Chain ◽

Production Optimization ◽

Time Production ◽

Digital Twin ◽

Twin Model

Download Full-text