Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.

Download Full-text

A new LP formulation of the admission control problem modelled as an MDP under average reward criterion

International Journal of Systems Science ◽

10.1080/00207721003717289 ◽

2011 ◽

Vol 42 (12) ◽

pp. 2085-2096

Author(s):

Antonio Pietrabissa

Keyword(s):

Control Problem ◽

Admission Control ◽

Average Reward ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

An $\varepsilon $-Optimal Control of a Finite Markov Chain with an Average Reward Criterion

Theory of Probability and Its Applications ◽

10.1137/1125006 ◽

1980 ◽

Vol 25 (1) ◽

pp. 70-81 ◽

Cited By ~ 11

Author(s):

E. A. Fainberg

Keyword(s):

Optimal Control ◽

Markov Chain ◽

Average Reward ◽

Finite Markov Chain ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Average Reward Criterion

Adaptive Markov Control Processes - Applied Mathematical Sciences ◽

10.1007/978-1-4419-8714-3_3 ◽

1989 ◽

pp. 51-82

Author(s):

O. Hernández-Lerma

Keyword(s):

Average Reward ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Applied Probability ◽

10.1239/jap/1437658607 ◽

2015 ◽

Vol 52 (2) ◽

pp. 419-440

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-De-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Point Of View ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Criterion ◽

Compact Action Sets ◽

Path Point ◽

Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

Optimal switching problem for countable Markov chains: average reward criterion

Mathematical Methods of Operations Research ◽

10.1007/s001860000102 ◽

2001 ◽

Vol 53 (1) ◽

pp. 1-24 ◽

Cited By ~ 4

Author(s):

Alexander Yushkevich

Keyword(s):

Markov Chains ◽

Average Reward ◽

Optimal Switching ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Analyzing Strength-Based Classifier System from Reinforcement Learning Perspective

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0631 ◽

2009 ◽

Vol 13 (6) ◽

pp. 631-639

Author(s):

Atsushi Wada ◽

◽

Keiki Takadama ◽

◽

Keyword(s):

Reinforcement Learning ◽

Adaptive Systems ◽

Classifier Systems ◽

Q Learning ◽

State Action ◽

Classifier System ◽

Learning Classifier ◽

Value Estimation ◽

On Line ◽

On Line Learning

Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical on-line learning. With the aim of establishing a common theoretical basis between LCSs and RL algorithms to share each field's findings, a detailed analysis was performed to compare the learning processes of these two approaches. Based on our previous work on deriving an equivalence between the Zeroth-level Classifier System (ZCS) and Q-learning with Function Approximation (FA), this paper extends the analysis to the influence of actually applying the conditions for this equivalence. Comparative experiments have revealed interesting implications: (1) ZCS's original parameter, the deduction rate, plays a role in stabilizing the action selection, but (2) from the Reinforcement Learning perspective, such a process inhibits the ability to accurately estimate values for the entire state-action space, thus limiting the performance of ZCS in problems requiring accurate value estimation.

Download Full-text

Fuzzy decision processes with an average reward criterion

Mathematical and Computer Modelling ◽

10.1016/s0895-7177(99)00160-0 ◽

1999 ◽

Vol 30 (7-8) ◽

pp. 7-20

Author(s):

M. Kurano ◽

M. Yasuda ◽

J.-I. Nakagami ◽

Y. Yoshida

Keyword(s):

Decision Processes ◽

Average Reward ◽

Fuzzy Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Continuous time Markov decision programming with average reward criterion and unbounded reward rate

Acta Mathematicae Applicatae Sinica English Series ◽

10.1007/bf02080199 ◽

1991 ◽

Vol 7 (1) ◽

pp. 6-16 ◽

Cited By ~ 7

Author(s):

Shaohui Zheng

Keyword(s):

Continuous Time ◽

Average Reward ◽

Reward Rate ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

On the undiscounted tax problem with precedence constraints

Advances in Applied Probability ◽

10.2307/1428167 ◽

1996 ◽

Vol 28 (4) ◽

pp. 1123-1144

Author(s):

K. D. Glazebrook

Keyword(s):

Single Machine ◽

Precedence Constraints ◽

Sensitivity Analyses ◽

Average Reward ◽

State Dependent ◽

Index Policies ◽

Average Reward Criterion ◽

Reward Criterion

A single machine is available to process a collection of jobs J, each of which evolves stochastically under processing. Jobs incur costs while awaiting the machine at a rate which is state dependent and processing must respect a set of precedence constraints Γ. Index policies are optimal in a variety of scenarios. The indices concerned are characterised as values of restart problems with the average reward criterion. This characterisation yields a range of efficient approaches to their computation. Index-based suboptimality bounds are derived for general processing policies. These bounds enable us to develop sensitivity analyses and to evaluate scheduling heuristics.

Download Full-text

Semi-Markov decision processes with polynomial reward

Journal of Applied Probability ◽

10.2307/3213482 ◽

1982 ◽

Vol 19 (2) ◽

pp. 301-309 ◽

Cited By ~ 6

Author(s):

Zvi Rosberg

Keyword(s):

Transition Period ◽

Queueing Network ◽

Decision Processes ◽

Average Reward ◽

Network Scheduling ◽

Long Run ◽

Markov Decision ◽

Average Reward Criterion ◽

Long Run Average Reward ◽

Reward Criterion

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.

Download Full-text