Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space

RoboCup 2000: Robot Soccer World Cup IV - Lecture Notes in Computer Science ◽

10.1007/3-540-45324-5_11 ◽

2001 ◽

pp. 129-138 ◽

Cited By ~ 2

Author(s):

Shuichi Enokida ◽

Takeshi Ohasi ◽

Takaichi Yoshida ◽

Toshiaki Ejima

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Q Learning ◽

Self Organized ◽

Learning Reinforcement

Download Full-text

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/840 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alberto Camacho ◽

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Normal Form ◽

State Of The Art ◽

Formal Languages ◽

Function Structure ◽

Q Learning ◽

Reward Function ◽

Form Representation ◽

Reward Shaping ◽

Reward Functions

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Download Full-text

Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents

Journal of Information Security and Applications ◽

10.1016/j.jisa.2021.102903 ◽

2021 ◽

Vol 61 ◽

pp. 102903

Author(s):

László Erdődi ◽

Åvald Åslaugson Sommervoll ◽

Fabio Massimo Zennaro

Keyword(s):

Reinforcement Learning ◽

Sql Injection ◽

Q Learning ◽

Learning Agents ◽

Learning Reinforcement

Download Full-text

Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control

10.5753/wtdr_ctdr.2020.14956 ◽

2020 ◽

Author(s):

Gabriel Moraes Barros ◽

Esther Colombini

Keyword(s):

Reinforcement Learning ◽

Autonomous Systems ◽

Control Policy ◽

Primary Objective ◽

Imitation Learning ◽

Level Control ◽

Reward Function ◽

Long Time ◽

Learning Reinforcement ◽

Function Approximator

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built from scratch. Learning may take a long time, not to mention that RL algorithms need a stated reward function. Sometimes, it is not trivial to define one. Often it is easier for a teacher, human or intelligent agent, do demonstrate the desired behavior or how to accomplish a given task. Humans and other animals have a natural ability to learn skills from observation, often from merely seeing these skills’ effects: without direct knowledge of the underlying actions. The same principle exists in Imitation Learning, a practical approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. In this scenario, this work’s primary objective is to design an agent that can successfully imitate a prior acquired control policy using Imitation Learning. The chosen algorithm is GAIL since we consider that it is the proper algorithm to tackle this problem by utilizing expert (state, action) trajectories. As reference expert trajectories, we implement state-of-the-art on and off-policy methods PPO and SAC. Results show that the learned policies for all three methods can solve the task of low-level control of a quadrotor and that all can account for generalization on the original tasks.

Download Full-text

Optimizing Hadoop parameter for speedup using Q-Learning Reinforcement Learning

10.1109/icecct52121.2021.9616965 ◽

2021 ◽

Author(s):

Nandita Yambem ◽

A. N. Nandakumar

Keyword(s):

Reinforcement Learning ◽

Q Learning ◽

Learning Reinforcement

Download Full-text

Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

Sensors ◽

10.3390/s19184055 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4055 ◽

Cited By ~ 9

Author(s):

Zhang ◽

Wang ◽

Liu ◽

Chen

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Q Learning ◽

Reward Function ◽

International Regulations ◽

Convergence Trend ◽

Decision Making Model

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.

Download Full-text

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/913 ◽

2019 ◽

Author(s):

Baihan Lin ◽

Djallel Bouneffouf ◽

Guillermo Cecchi

Keyword(s):

Reinforcement Learning ◽

Wide Spectrum ◽

User Preferences ◽

Reward Processing ◽

Q Learning ◽

Agent Interactions ◽

Behavioral Studies ◽

Human Decision ◽

Multi Agent ◽

Learning Reinforcement

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.

Download Full-text

Aero-engine acceleration control using deep reinforcement learning with phase-based reward function

Proceedings of the Institution of Mechanical Engineers Part G Journal of Aerospace Engineering ◽

10.1177/09544100211046225 ◽

2021 ◽

pp. 095441002110462

Author(s):

Qian-Kun Hu ◽

Yong-Ping Zhao

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Engine Control ◽

Control Task ◽

Q Learning ◽

Reward Function ◽

Engine Control System ◽

Aero Engine ◽

Markov Decision ◽

Policy Optimization

In this paper, the conventional aero-engine acceleration control task is formulated into a Markov Decision Process (MDP) problem. Then, a novel phase-based reward function is proposed to enhance the performance of deep reinforcement learning (DRL) in solving feedback control tasks. With that reward function, an aero-engine controller based on Trust Region Policy Optimization (TRPO) is developed to improve the aero-engine acceleration performance. Four comparison simulations were conducted to verify the effectiveness of the proposed methods. The simulation results show that the phase-based reward function helps to eliminate the oscillation problem of the aero-engine control system, which is caused by the traditional goal-based reward function when DRL is applied to the aero-engine control. And the TRPO controller outperforms deep Q-learning (DQN) and the proportional-integral-derivative (PID) in the aero-engine acceleration control task. Compared to DQN and PID controller, the acceleration time of aero-engine is decreased by 0.6 and 2.58 s, respectively, and the aero-engine acceleration performance is improved by 16.8 and 46.4 % each.

Download Full-text

Model-Free IRL Using Maximum Likelihood Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013951 ◽

2019 ◽

Vol 33 ◽

pp. 3951-3958

Author(s):

Vinamra Jain ◽

Prashant Doshi ◽

Bikramjit Banerjee

Keyword(s):

Reinforcement Learning ◽

Maximum Likelihood ◽

Likelihood Estimation ◽

Transition Function ◽

Inverse Reinforcement Learning ◽

Q Learning ◽

Reward Function ◽

Model Free ◽

Model Free Approach ◽

Q Function

The problem of learning an expert’s unknown reward function using a limited number of demonstrations recorded from the expert’s behavior is investigated in the area of inverse reinforcement learning (IRL). To gain traction in this challenging and underconstrained problem, IRL methods predominantly represent the reward function of the expert as a linear combination of known features. Most of the existing IRL algorithms either assume the availability of a transition function or provide a complex and inefficient approach to learn it. In this paper, we present a model-free approach to IRL, which casts IRL in the maximum likelihood framework. We present modifications of the model-free Q-learning that replace its maximization to allow computing the gradient of the Q-function. We use gradient ascent to update the feature weights to maximize the likelihood of expert’s trajectories. We demonstrate on two problem domains that our approach improves the likelihood compared to previous methods.

Download Full-text

Leveraging Conventional Control to Improve Performance of Systems Using Reinforcement Learning

Volume 2: Intelligent Transportation/Vehicles; Manufacturing; Mechatronics; Engine/After-Treatment Systems; Soft Actuators/Manipulators; Modeling/Validation; Motion/Vibration Control Applications; Multi-Agent/Networked Systems; Path Planning/Motion Control; Renewable/Smart Energy Systems; Security/Privacy of Cyber-Physical Systems; Sensors/Actuators; Tracking Control Systems; Unmanned Ground/Aerial Vehicles; Vehicle Dynamics, Estimation, Control; Vibration/Control Systems; Vibrations ◽

10.1115/dscc2020-3307 ◽

2020 ◽

Author(s):

Gerald Eaglin ◽

Joshua Vaughan

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Domain Knowledge ◽

Training Time ◽

Reward Function ◽

Model Based ◽

Model Free ◽

Optimal Controllers ◽

Dynamics And Control ◽

Learning Reinforcement

Abstract While many model-based methods have been proposed for optimal control, it is often difficult to generate model-based optimal controllers for nonlinear systems. One model-free method to solve for optimal control policies is reinforcement learning. Reinforcement learning iteratively trains an agent to optimize a reward function. However, agents often perform poorly at the beginning of training and require a large number of trials to converge to a successful policy. A method is proposed to incorporate domain knowledge of dynamics and control into the controllers using reinforcement learning to reduce the training time needed. Simulations are presented to compare the performance of agents utilizing domain knowledge to those that do not use domain knowledge. The results show that the agents with domain knowledge can accomplish the desired task with less training time than those without domain knowledge.

Download Full-text