Optimizing Hadoop parameter for speedup using Q-Learning Reinforcement Learning

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space

RoboCup 2000: Robot Soccer World Cup IV - Lecture Notes in Computer Science ◽

10.1007/3-540-45324-5_11 ◽

2001 ◽

pp. 129-138 ◽

Cited By ~ 2

Author(s):

Shuichi Enokida ◽

Takeshi Ohasi ◽

Takaichi Yoshida ◽

Toshiaki Ejima

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Q Learning ◽

Self Organized ◽

Learning Reinforcement

Download Full-text

Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents

Journal of Information Security and Applications ◽

10.1016/j.jisa.2021.102903 ◽

2021 ◽

Vol 61 ◽

pp. 102903

Author(s):

László Erdődi ◽

Åvald Åslaugson Sommervoll ◽

Fabio Massimo Zennaro

Keyword(s):

Reinforcement Learning ◽

Sql Injection ◽

Q Learning ◽

Learning Agents ◽

Learning Reinforcement

Download Full-text

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/913 ◽

2019 ◽

Author(s):

Baihan Lin ◽

Djallel Bouneffouf ◽

Guillermo Cecchi

Keyword(s):

Reinforcement Learning ◽

Wide Spectrum ◽

User Preferences ◽

Reward Processing ◽

Q Learning ◽

Agent Interactions ◽

Behavioral Studies ◽

Human Decision ◽

Multi Agent ◽

Learning Reinforcement

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.

Download Full-text

Concurrent Q-learning: Reinforcement learning for dynamic goals and environments

International Journal of Intelligent Systems ◽

10.1002/int.20105 ◽

2005 ◽

Vol 20 (10) ◽

pp. 1037-1052 ◽

Cited By ~ 7

Author(s):

Robert B. Ollington ◽

Peter W. Vamplew

Keyword(s):

Reinforcement Learning ◽

Q Learning ◽

Learning Reinforcement

Download Full-text

Personalized project recommendations: using reinforcement learning

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1619-6 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 1

Author(s):

Faxin Qi ◽

Xiangrong Tong ◽

Lei Yu ◽

Yingjie Wang

Keyword(s):

Reinforcement Learning ◽

User Behavior ◽

Collaborative Work ◽

Recursive Least Squares ◽

The Internet ◽

Dynamic Impact ◽

Rls Algorithm ◽

Trust Value ◽

Q Learning ◽

Actual Evaluation

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Comparing quantum hybrid reinforcement learning to classical methods

Human-Intelligent Systems Integration ◽

10.1007/s42454-021-00025-3 ◽

2021 ◽

Author(s):

Maximilian Moll ◽

Leonhard Kunczik

Keyword(s):

Reinforcement Learning ◽

Quantum Circuits ◽

Memory Usage ◽

Computational Power ◽

Classical Domain ◽

Q Learning ◽

Optimal Behavior ◽

Computational Speed ◽

Complex Decision ◽

Hybrid Reinforcement

AbstractIn recent history, reinforcement learning (RL) proved its capability by solving complex decision problems by mastering several games. Increased computational power and the advances in approximation with neural networks (NN) paved the path to RL’s successful applications. Even though RL can tackle more complex problems nowadays, it still relies on computational power and runtime. Quantum computing promises to solve these issues by its capability to encode information and the potential quadratic speedup in runtime. We compare tabular Q-learning and Q-learning using either a quantum or a classical approximation architecture on the frozen lake problem. Furthermore, the three algorithms are analyzed in terms of iterations until convergence to the optimal behavior, memory usage, and runtime. Within the paper, NNs are utilized for approximation in the classical domain, while in the quantum domain variational quantum circuits, as a quantum hybrid approximation method, have been used. Our simulations show that a quantum approximator is beneficial in terms of memory usage and provides a better sample complexity than NNs; however, it still lacks the computational speed to be competitive.

Download Full-text

Optimization of Multilayer Optical Films with Unsupervised Learning, reinforcement learning and genetic algorithm

Frontiers in Optics / Laser Science ◽

10.1364/fio.2020.jm6a.5 ◽

2020 ◽

Author(s):

Jiang Anqing ◽

Osamu Yoshie

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Unsupervised Learning ◽

Optical Films ◽

Learning Reinforcement

Download Full-text