Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments
Keyword(s):
Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.
2011 ◽
Vol 42
(12)
◽
pp. 2085-2096
Keyword(s):
1980 ◽
Vol 25
(1)
◽
pp. 70-81
◽
Keyword(s):
2001 ◽
Vol 53
(1)
◽
pp. 1-24
◽
Keyword(s):
1999 ◽
Vol 30
(7-8)
◽
pp. 7-20
Keyword(s):
1991 ◽
Vol 7
(1)
◽
pp. 6-16
◽
Keyword(s):