scholarly journals Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

2021 ◽  
Vol 11 (3) ◽  
pp. 1098
Author(s):  
Norbert Kozłowski ◽  
Olgierd Unold

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.

2015 ◽  
Vol 52 (2) ◽  
pp. 419-440
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-De-Oca ◽  
Karel Sladký

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.


Author(s):  
Atsushi Wada ◽  
◽  
Keiki Takadama ◽  
◽  

Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical on-line learning. With the aim of establishing a common theoretical basis between LCSs and RL algorithms to share each field's findings, a detailed analysis was performed to compare the learning processes of these two approaches. Based on our previous work on deriving an equivalence between the Zeroth-level Classifier System (ZCS) and Q-learning with Function Approximation (FA), this paper extends the analysis to the influence of actually applying the conditions for this equivalence. Comparative experiments have revealed interesting implications: (1) ZCS's original parameter, the deduction rate, plays a role in stabilizing the action selection, but (2) from the Reinforcement Learning perspective, such a process inhibits the ability to accurately estimate values for the entire state-action space, thus limiting the performance of ZCS in problems requiring accurate value estimation.


1999 ◽  
Vol 30 (7-8) ◽  
pp. 7-20
Author(s):  
M. Kurano ◽  
M. Yasuda ◽  
J.-I. Nakagami ◽  
Y. Yoshida

1996 ◽  
Vol 28 (4) ◽  
pp. 1123-1144
Author(s):  
K. D. Glazebrook

A single machine is available to process a collection of jobs J, each of which evolves stochastically under processing. Jobs incur costs while awaiting the machine at a rate which is state dependent and processing must respect a set of precedence constraints Γ. Index policies are optimal in a variety of scenarios. The indices concerned are characterised as values of restart problems with the average reward criterion. This characterisation yields a range of efficient approaches to their computation. Index-based suboptimality bounds are derived for general processing policies. These bounds enable us to develop sensitivity analyses and to evaluate scheduling heuristics.


1982 ◽  
Vol 19 (2) ◽  
pp. 301-309 ◽  
Author(s):  
Zvi Rosberg

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.


Sign in / Sign up

Export Citation Format

Share Document