Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space

RoboCup 2000: Robot Soccer World Cup IV - Lecture Notes in Computer Science ◽

10.1007/3-540-45324-5_11 ◽

2001 ◽

pp. 129-138 ◽

Cited By ~ 2

Author(s):

Shuichi Enokida ◽

Takeshi Ohasi ◽

Takaichi Yoshida ◽

Toshiaki Ejima

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Q Learning ◽

Self Organized ◽

Learning Reinforcement

Download Full-text

BUILDING AN ARTIFICIAL STOCK MARKET POPULATED BY REINFORCEMENT‐LEARNING AGENTS

Journal of Business Economics and Management ◽

10.3846/1611-1699.2009.10.329-341 ◽

2009 ◽

Vol 10 (4) ◽

pp. 329-341 ◽

Cited By ~ 10

Author(s):

Aleksandras Vytautas Rutkauskas ◽

Tomas Ramanauskas

Keyword(s):

Reinforcement Learning ◽

Stock Market ◽

Learning Algorithm ◽

Self Regulation ◽

Market Model ◽

Emergent Properties ◽

Q Learning ◽

Evolutionary Selection ◽

Learning Agents ◽

Artificial Stock Market

In this paper we propose an artificial stock market model based on interaction of heterogeneous agents whose forward-looking behaviour is driven by the reinforcement-learning algorithm combined with some evolutionary selection mechanism. We use the model for the analysis of market self-regulation abilities, market efficiency and determinants of emergent properties of the financial market. Distinctive and novel features of the model include strong emphasis on the economic content of individual decision-making, application of the Q-learning algorithm for driving individual behaviour, and rich market setup. Along with that a parallel version of the model is presented, which is mainly based on research of current changes in the market, as well as on search of newly emerged consistent patterns, and which has been repeatedly used for optimal decisions’ search experiments in various capital markets.

Download Full-text

Active Inference: Demystified and Compared

Neural Computation ◽

10.1162/neco_a_01357 ◽

2021 ◽

pp. 1-39

Author(s):

Noor Sajid ◽

Philip J. Ball ◽

Thomas Parr ◽

Karl J. Friston

Keyword(s):

Reinforcement Learning ◽

Autonomous Agents ◽

Arm Movement ◽

Generative Models ◽

Active Inference ◽

Discrete State ◽

Q Learning ◽

Learning Agents ◽

Time Formulation ◽

Complex Settings

Active inference is a first principle account of how autonomous agents operate in dynamic, nonstationary environments. This problem is also considered in reinforcement learning, but limited work exists on comparing the two approaches on the same discrete-state environments. In this letter, we provide (1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in reinforcement learning, and (2) an explicit discrete-state comparison between active inference and reinforcement learning on an OpenAI gym baseline. We begin by providing a condensed overview of the active inference literature, in particular viewing the various natural behaviors of active inference agents through the lens of reinforcement learning. We show that by operating in a pure belief-based setting, active inference agents can carry out epistemic exploration—and account for uncertainty about their environment—in a Bayes-optimal fashion. Furthermore, we show that the reliance on an explicit reward signal in reinforcement learning is removed in active inference, where reward can simply be treated as another observation we have a preference over; even in the total absence of rewards, agent behaviors are learned through preference learning. We make these properties explicit by showing two scenarios in which active inference agents can infer behaviors in reward-free environments compared to both Q-learning and Bayesian model-based reinforcement learning agents and by placing zero prior preferences over rewards and learning the prior preferences over the observations corresponding to reward. We conclude by noting that this formalism can be applied to more complex settings (e.g., robotic arm movement, Atari games) if appropriate generative models can be formulated. In short, we aim to demystify the behavior of active inference agents by presenting an accessible discrete state-space and time formulation and demonstrate these behaviors in a OpenAI gym environment, alongside reinforcement learning agents.

Download Full-text

Optimizing Hadoop parameter for speedup using Q-Learning Reinforcement Learning

10.1109/icecct52121.2021.9616965 ◽

2021 ◽

Author(s):

Nandita Yambem ◽

A. N. Nandakumar

Keyword(s):

Reinforcement Learning ◽

Q Learning ◽

Learning Reinforcement

Download Full-text

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/913 ◽

2019 ◽

Author(s):

Baihan Lin ◽

Djallel Bouneffouf ◽

Guillermo Cecchi

Keyword(s):

Reinforcement Learning ◽

Wide Spectrum ◽

User Preferences ◽

Reward Processing ◽

Q Learning ◽

Agent Interactions ◽

Behavioral Studies ◽

Human Decision ◽

Multi Agent ◽

Learning Reinforcement

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.

Download Full-text

Image Representation of Time Series for Reinforcement Learning Trading Agent

10.48011/asba.v2i1.1108 ◽

2020 ◽

Author(s):

Guinther K. da Costa ◽

Leandro Dos S. Coelho ◽

Roberto Z. Freire1

Keyword(s):

Time Series ◽

Reinforcement Learning ◽

Model Accuracy ◽

Q Learning ◽

Learning Agents ◽

Wide Range ◽

Trading Agent ◽

And Performance ◽

Diverse Data ◽

Representation Of Time

The availability of diverse data has increased the demand for expertise in algorithmic trading strategies. Reinforcement learning has shown interesting applicability in a wide range of tasks, especially in some challenging problems as trading, where slow model convergence, inference speed, and reduced model accuracy appear as barriers in this type of application. In this paper, we propose the transformation of time series into images considering a transfer learning based on a semi-supervised model with deep Q learning agents, where labels were generated by an evolutionary algorithm to improve both training speed and performance measures.

Download Full-text

Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0667 ◽

2009 ◽

Vol 13 (6) ◽

pp. 667-674

Author(s):

Yasuyo Hatcho ◽

◽

Kiyohiko Hattori ◽

Keiki Takadama ◽

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Ad Hoc ◽

Selection Method ◽

Parameter Setting ◽

Q Learning ◽

Learning Agents ◽

Multiagent Reinforcement Learning ◽

The Subject ◽

Interaction Game

This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.

Download Full-text

Concurrent Q-learning: Reinforcement learning for dynamic goals and environments

International Journal of Intelligent Systems ◽

10.1002/int.20105 ◽

2005 ◽

Vol 20 (10) ◽

pp. 1037-1052 ◽

Cited By ~ 7

Author(s):

Robert B. Ollington ◽

Peter W. Vamplew

Keyword(s):

Reinforcement Learning ◽

Q Learning ◽

Learning Reinforcement

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text