Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Multi-Agent Reinforcement Learning Based on K-Means Clustering in Multi-Robot Cooperative Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.216.75 ◽

2011 ◽

Vol 216 ◽

pp. 75-80 ◽

Cited By ~ 1

Author(s):

Chang An Liu ◽

Fei Liu ◽

Chun Yang Liu ◽

Hua Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Experimental Results ◽

Learning Ability ◽

Learning Method ◽

Q Learning ◽

State Space Explosion ◽

Multi Agent ◽

Robot Cooperation ◽

Multi Robot

To solve the curse of dimensionality problem in multi-agent reinforcement learning, a learning method based on k-means is presented in this paper. In this method, the environmental state is represented as key state factors. The state space explosion is avoided by classifying states into different clusters using k-means. The learning rate is improved by assigning different states to existent clusters, as well as corresponding strategy. Compared to traditional Q-learning, our experimental results of the multi-robot cooperation show that our scheme improves the team learning ability efficiently. Meanwhile, the cooperation efficiency can be enhanced successfully.

Download Full-text

Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents

Journal of Information Security and Applications ◽

10.1016/j.jisa.2021.102903 ◽

2021 ◽

Vol 61 ◽

pp. 102903

Author(s):

László Erdődi ◽

Åvald Åslaugson Sommervoll ◽

Fabio Massimo Zennaro

Keyword(s):

Reinforcement Learning ◽

Sql Injection ◽

Q Learning ◽

Learning Agents ◽

Learning Reinforcement

Download Full-text

A Self-Organized Fuzzy-Neuro Reinforcement Learning System for Continuous State Space for Autonomous Robots

2008 International Conference on Computational Intelligence for Modelling Control & Automation ◽

10.1109/cimca.2008.25 ◽

2008 ◽

Cited By ~ 7

Author(s):

Masanao Obayashi ◽

Takashi Kuremoto ◽

Kunikazu Kobayashi

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Autonomous Robots ◽

Learning System ◽

Self Organized ◽

Continuous State Space ◽

Continuous State

Download Full-text

Optimizing Hadoop parameter for speedup using Q-Learning Reinforcement Learning

10.1109/icecct52121.2021.9616965 ◽

2021 ◽

Author(s):

Nandita Yambem ◽

A. N. Nandakumar

Keyword(s):

Reinforcement Learning ◽

Q Learning ◽

Learning Reinforcement

Download Full-text

The Knowledge Sharing Based Reinforcement Learning Algorithm for Collective Behaviors of Mobile Robots

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.588-589.1515 ◽

2012 ◽

Vol 588-589 ◽

pp. 1515-1518

Author(s):

Yong Song ◽

Bing Liu ◽

Yi Bin Li

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

Collective Behaviors ◽

Q Learning ◽

Exponential Increase ◽

Multi Robot ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multi-robot may will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning base on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be joined in the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multi-robot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Download Full-text

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/913 ◽

2019 ◽

Author(s):

Baihan Lin ◽

Djallel Bouneffouf ◽

Guillermo Cecchi

Keyword(s):

Reinforcement Learning ◽

Wide Spectrum ◽

User Preferences ◽

Reward Processing ◽

Q Learning ◽

Agent Interactions ◽

Behavioral Studies ◽

Human Decision ◽

Multi Agent ◽

Learning Reinforcement

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.

Download Full-text

GA-Based Q-CMAC Applied to Airship Evasion Problem

Journal of Robotics and Mechatronics ◽

10.20965/jrm.1998.p0431 ◽

1998 ◽

Vol 10 (5) ◽

pp. 431-438 ◽

Cited By ~ 1

Author(s):

Yuka Akisato ◽

◽

Keiji Suzuki ◽

Azuma Ohuchi

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

State Space ◽

Control Policy ◽

The State ◽

Space Layer ◽

Construction Simulation ◽

Q Learning ◽

Evolutionary State ◽

Evasion Problem

The purpose of this research is to acquire an adaptive control policy of an airship in a dynamic, continuous environment based on reinforcement learning combined with evolutionary construction. The state space for reinforcement learning becomes huge because the airship has great inertia and must sense huge amounts of information from a continuous environment to behave appropriately. To reduce and suitably segment state space, we propose combining CMAC-based Q-learning and its evolutionary state space layer construction. Simulation showed the acquisition of state space segmentation enabling airships to learn effectively.

Download Full-text

Concurrent Q-learning: Reinforcement learning for dynamic goals and environments

International Journal of Intelligent Systems ◽

10.1002/int.20105 ◽

2005 ◽

Vol 20 (10) ◽

pp. 1037-1052 ◽

Cited By ~ 7

Author(s):

Robert B. Ollington ◽

Peter W. Vamplew

Keyword(s):

Reinforcement Learning ◽

Q Learning ◽

Learning Reinforcement

Download Full-text

Personalized project recommendations: using reinforcement learning

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1619-6 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 1

Author(s):

Faxin Qi ◽

Xiangrong Tong ◽

Lei Yu ◽

Yingjie Wang

Keyword(s):

Reinforcement Learning ◽

User Behavior ◽

Collaborative Work ◽

Recursive Least Squares ◽

The Internet ◽

Dynamic Impact ◽

Rls Algorithm ◽

Trust Value ◽

Q Learning ◽

Actual Evaluation

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.

Download Full-text