An Analysis of Rule Deletion Scheme in XCS on Reinforcement Learning Problem

Author(s):  
Masaya Nakata ◽  
◽  
Tomoki Hamagami

The XCS classifier system is an evolutionary rule-based learning technique powered by a Q-learning like learning mechanism. It employs a global deletion scheme to delete rules from all rules covering all state-action pairs. However, the optimality of this scheme remains unclear owing to the lack of intensive analysis. We here introduce two deletion schemes: 1) local deletion, which can be applied to a subset of rules covering each state (a match set), and 2) stronger local deletion, which can be applied to a more specific subset covering each state-action pair (an action set). The aim of this paper is to reveal how the above three deletion schemes affect the performance of XCS. Our analysis shows that the local deletion schemes promote the elimination of inaccurate rules compared with the global deletion scheme. However, the stronger local deletion scheme occasionally deletes a good rule. We further show that the two local deletion schemes greatly improve the performance of XCS on a set of noisy maze problems. Although the localization strength of the proposed deletion schemes may require consideration, they can be adequate for XCS rather than the original global deletion scheme.

Author(s):  
Atsushi Wada ◽  
◽  
Keiki Takadama ◽  
◽  

Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical on-line learning. With the aim of establishing a common theoretical basis between LCSs and RL algorithms to share each field's findings, a detailed analysis was performed to compare the learning processes of these two approaches. Based on our previous work on deriving an equivalence between the Zeroth-level Classifier System (ZCS) and Q-learning with Function Approximation (FA), this paper extends the analysis to the influence of actually applying the conditions for this equivalence. Comparative experiments have revealed interesting implications: (1) ZCS's original parameter, the deduction rate, plays a role in stabilizing the action selection, but (2) from the Reinforcement Learning perspective, such a process inhibits the ability to accurately estimate values for the entire state-action space, thus limiting the performance of ZCS in problems requiring accurate value estimation.


1999 ◽  
Vol 7 (2) ◽  
pp. 125-149 ◽  
Author(s):  
Pier Luca Lanzi

The XCS classifier system represents a major advance in learning classifier systems research because (1) it has a sound and accurate generalization mechanism, and (2) its learning mechanism is based on Q-learning, a recognized learning technique. In taking XCS beyond its very first environments and parameter settings, we show that, in certain difficult sequential (“animat”) environments, performance is poor. We suggest that this occurs because in the chosen environments, some conditions for proper functioning of the generalization mechanism do not hold, resulting in overly general classifiers that cause reduced performance. We hypothesize that one such condition is a lack of sufficiently wide exploration of the environment during learning. We show that if XCS is forced to explore its environment more completely, performance improves dramatically. We propose a technique, based on Sutton's Dyna concept, through which wider exploration would occur naturally. Separately, we demonstrate that the compactness of the representation evolved by XCS is limited by the number of instances of each generalization actually present in the environment. The paper shows that XCS's generalization mechanism is effective, but that the conditions under which it works must be clearly understood.


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1685 ◽  
Author(s):  
Chayoung Kim

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.


Electronics ◽  
2019 ◽  
Vol 8 (2) ◽  
pp. 231 ◽  
Author(s):  
Panagiotis Kofinas ◽  
Anastasios I. Dounis

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 2782-2798 ◽  
Author(s):  
Lucileide M. D. Da Silva ◽  
Matheus F. Torquato ◽  
Marcelo A. C. Fernandes

2020 ◽  
Vol 17 (2) ◽  
pp. 647-664
Author(s):  
Yangyang Ge ◽  
Fei Zhu ◽  
Wei Huang ◽  
Peiyao Zhao ◽  
Quan Liu

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in the above areas where an agent may fall into dangerous states that are irreversible, causing great damage. To solve the safety problem, in this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based on Constrained Markov Game. In this method, safety constraints are added to the set of actions, and each agent, when interacting with the environment to search for optimal values, should be restricted by the safety rules, so as to obtain an optimal policy that satisfies the security requirements. Since traditional Multi-Agent reinforcement learning algorithm is no more suitable for the proposed model in this paper, a new solution is introduced for calculating the global optimum state-action function that satisfies the safety constraints. We take advantage of the Lagrange multiplier method to determine the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, under conditions that the state-action function and the constraint function are both differentiable, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.


Telecom ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 255-270
Author(s):  
Saeid Pourroostaei Ardakani ◽  
Ali Cheshmehzangi

UAV path planning for remote sensing aims to find the best-fitted routes to complete a data collection mission. UAVs plan the routes and move through them to remotely collect environmental data from particular target zones by using sensory devices such as cameras. Route planning may utilize machine learning techniques to autonomously find/select cost-effective and/or best-fitted routes and achieve optimized results including: minimized data collection delay, reduced UAV power consumption, decreased flight traversed distance and maximized number of collected data samples. This paper utilizes a reinforcement learning technique (location and energy-aware Q-learning) to plan UAV routes for remote sensing in smart farms. Through this, the UAV avoids heuristically or blindly moving throughout a farm, but this takes the benefits of environment exploration–exploitation to explore the farm and find the shortest and most cost-effective paths into target locations with interesting data samples to collect. According to the simulation results, utilizing the Q-learning technique increases data collection robustness and reduces UAV resource consumption (e.g., power), traversed paths, and remote sensing latency as compared to two well-known benchmarks, IEMF and TBID, especially if the target locations are dense and crowded in a farm.


Author(s):  
Yusuke Taguchi ◽  
Hideitsu Hino ◽  
Keisuke Kameyama

AbstractThere are many situations in supervised learning where the acquisition of data is very expensive and sometimes determined by a user’s budget. One way to address this limitation is active learning. In this study, we focus on a fixed budget regime and propose a novel active learning algorithm for the pool-based active learning problem. The proposed method performs active learning with a pre-trained acquisition function so that the maximum performance can be achieved when the number of data that can be acquired is fixed. To implement this active learning algorithm, the proposed method uses reinforcement learning based on deep neural networks as as a pre-trained acquisition function tailored for the fixed budget situation. By using the pre-trained deep Q-learning-based acquisition function, we can realize the active learner which selects a sample for annotation from the pool of unlabeled samples taking the fixed-budget situation into account. The proposed method is experimentally shown to be comparable with or superior to existing active learning methods, suggesting the effectiveness of the proposed approach for the fixed-budget active learning.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Xiali Li ◽  
Zhengyu Lv ◽  
Licheng Wu ◽  
Yue Zhao ◽  
Xiaona Xu

In this study, hybrid state-action-reward-state-action (SARSAλ) and Q-learning algorithms are applied to different stages of an upper confidence bound applied to tree search for Tibetan Jiu chess. Q-learning is also used to update all the nodes on the search path when each game ends. A learning strategy that uses SARSAλ and Q-learning algorithms combining domain knowledge for a feedback function for layout and battle stages is proposed. An improved deep neural network based on ResNet18 is used for self-play training. Experimental results show that hybrid online and offline reinforcement learning with a deep neural network can improve the game program’s learning efficiency and understanding ability for Tibetan Jiu chess.


Sign in / Sign up

Export Citation Format

Share Document