scholarly journals Classifier Fitness Based on Accuracy

1995 ◽  
Vol 3 (2) ◽  
pp. 149-175 ◽  
Author(s):  
Stewart W. Wilson

In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is given by a measure of the prediction's accuracy. The system executes the genetic algorithm in niches defined by the match sets, instead of panmictically. These aspects of XCS result in its population tending to form a complete and accurate mapping X × A → P from inputs and actions to payoff predictions. Further, XCS tends to evolve classifiers that are maximally general, subject to an accuracy criterion. Besides introducing a new direction for classifier system research, these properties of XCS make it suitable for a wide range of reinforcement learning situations where generalization over states is desirable.

Author(s):  
Atsushi Wada ◽  
◽  
Keiki Takadama ◽  
◽  

Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical on-line learning. With the aim of establishing a common theoretical basis between LCSs and RL algorithms to share each field's findings, a detailed analysis was performed to compare the learning processes of these two approaches. Based on our previous work on deriving an equivalence between the Zeroth-level Classifier System (ZCS) and Q-learning with Function Approximation (FA), this paper extends the analysis to the influence of actually applying the conditions for this equivalence. Comparative experiments have revealed interesting implications: (1) ZCS's original parameter, the deduction rate, plays a role in stabilizing the action selection, but (2) from the Reinforcement Learning perspective, such a process inhibits the ability to accurately estimate values for the entire state-action space, thus limiting the performance of ZCS in problems requiring accurate value estimation.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 471
Author(s):  
Jai Hoon Park ◽  
Kang Hoon Lee

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.


2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Karim El-Laithy ◽  
Martin Bogdan

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.


2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.


Author(s):  
Sandip K Lahiri ◽  
Kartik Chandra Ghanta

Four distinct regimes were found existent (namely sliding bed, saltation, heterogeneous suspension and homogeneous suspension) in slurry flow in pipeline depending upon the average velocity of flow. In the literature, few numbers of correlations has been proposed for identification of these regimes in slurry pipelines. Regime identification is important for slurry pipeline design as they are the prerequisite to apply different pressure drop correlation in different regime. However, available correlations fail to predict the regime over a wide range of conditions. Based on a databank of around 800 measurements collected from the open literature, a method has been proposed to identify the regime using artificial neural network (ANN) modeling. The method incorporates hybrid artificial neural network and genetic algorithm technique (ANN-GA) for efficient tuning of ANN meta parameters. Statistical analysis showed that the proposed method has an average misclassification error of 0.03%. A comparison with selected correlations in the literature showed that the developed ANN-GA method noticeably improved prediction of regime over a wide range of operating conditions, physical properties, and pipe diameters.


2011 ◽  
Vol 133 (4) ◽  
Author(s):  
Raed I. Bourisli ◽  
Adnan A. AlAnzi

This work aims at developing a closed-form correlation between key building design variables and its energy use. The results can be utilized during the initial design stages to assess the different building shapes and designs according to their expected energy use. Prototypical, 20-floor office buildings were used. The relative compactness, footprint area, projection factor, and window-to-wall ratio were changed and the resulting buildings performances were simulated. In total, 729 different office buildings were developed and simulated in order to provide the training cases for optimizing the correlation’s coefficients. Simulations were done using the VisualDOE TM software with a Typical Meteorological Year data file, Kuwait City, Kuwait. A real-coded genetic algorithm (GA) was used to optimize the coefficients of a proposed function that relates the energy use of a building to its four key parameters. The figure of merit was the difference in the ratio of the annual energy use of a building normalized by that of a reference building. The objective was to minimize the difference between the simulated results and the four-variable function trying to predict them. Results show that the real-coded GA was able to come up with a function that estimates the thermal performance of a proposed design with an accuracy of around 96%, based on the number of buildings tested. The goodness of fit, roughly represented by R2, ranged from 0.950 to 0.994. In terms of the effects of the various parameters, the area was found to have the smallest role among the design parameters. It was also found that the accuracy of the function suffers the most when high window-to-wall ratios are combined with low projection factors. In such cases, the energy use develops a potential optimum compactness. The proposed function (and methodology) will be a great tool for designers to inexpensively explore a wide range of alternatives and assess them in terms of their energy use efficiency. It will also be of great use to municipality officials and building codes authors.


2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
An Liu ◽  
Erwie Zahara ◽  
Ming-Ta Yang

Ordinary differential equations usefully describe the behavior of a wide range of dynamic physical systems. The particle swarm optimization (PSO) method has been considered an effective tool for solving the engineering optimization problems for ordinary differential equations. This paper proposes a modified hybrid Nelder-Mead simplex search and particle swarm optimization (M-NM-PSO) method for solving parameter estimation problems. The M-NM-PSO method improves the efficiency of the PSO method and the conventional NM-PSO method by rapid convergence and better objective function value. Studies are made for three well-known cases, and the solutions of the M-NM-PSO method are compared with those by other methods published in the literature. The results demonstrate that the proposed M-NM-PSO method yields better estimation results than those obtained by the genetic algorithm, the modified genetic algorithm (real-coded GA (RCGA)), the conventional particle swarm optimization (PSO) method, and the conventional NM-PSO method.


2021 ◽  
Vol 12 (6) ◽  
pp. 1-23
Author(s):  
Shuo Tao ◽  
Jingang Jiang ◽  
Defu Lian ◽  
Kai Zheng ◽  
Enhong Chen

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.


Sign in / Sign up

Export Citation Format

Share Document