average reward criterion Latest Research Papers

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.

Download Full-text

On Finding Large Sets of Rewards in Two-Player ETP–ESP Games

International Game Theory Review ◽

10.1142/s0219198920400022 ◽

2020 ◽

Vol 22 (02) ◽

pp. 2040002

Author(s):

Reinoud Joosten ◽

Llea Samuel

Keyword(s):

Stochastic Games ◽

Transition Probabilities ◽

Pure Strategy ◽

Continuous Functions ◽

Average Reward ◽

Endogenous Stage ◽

Large Sets ◽

Past Action ◽

Average Reward Criterion ◽

Reward Criterion

Games with endogenous transition probabilities and endogenous stage payoffs (or ETP–ESP games for short) are stochastic games in which both the transition probabilities and the payoffs at any stage are continuous functions of the relative frequencies of all past action combinations chosen. We present methods to compute large sets of jointly-convergent pure-strategy rewards in two-player ETP–ESP games with communicating states under the limiting average reward criterion. Such sets are useful in determining feasible rewards in a game, and instrumental in obtaining the set of (Nash) equilibrium rewards.

Download Full-text

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7160 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13777-13778

Author(s):

Akshay Dharmavaram ◽

Matthew Riemer ◽

Shalabh Bhatnagar

Keyword(s):

General Purpose ◽

Value Functions ◽

Average Reward ◽

Credit Assignment ◽

Gradient Algorithms ◽

Policy Gradient ◽

Average Reward Criterion ◽

Temporal Abstractions ◽

Learning Options

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an ordinary differential equation based approach for our convergence analysis and prove that the parameters of the intra-option policies, termination functions, and value functions, converge to their corresponding optimal values, with probability one. Finally, we illustrate the competitive advantage of learning options, in the average reward setting, on a grid-world environment with sparse rewards.

Download Full-text

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

IEEE Transactions on Automatic Control ◽

10.1109/tac.2017.2702203 ◽

2017 ◽

Vol 62 (11) ◽

pp. 6032-6038 ◽

Cited By ~ 1

Author(s):

Xiaofeng Jiang ◽

Xiaodong Wang ◽

Hongsheng Xi ◽

Falin Liu

Keyword(s):

Average Reward ◽

Average Reward Criterion ◽

Expected Average Reward Criterion ◽

Reward Criterion

Download Full-text

Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion

IEEE Transactions on Automatic Control ◽

10.1109/tac.2015.2497904 ◽

2016 ◽

Vol 61 (10) ◽

pp. 3070-3075 ◽

Cited By ~ 1

Author(s):

Xiaofeng Jiang ◽

Hongsheng Xi ◽

Xiaodong Wang ◽

Falin Liu

Keyword(s):

Average Reward ◽

Optimal Observation ◽

Average Reward Criterion ◽

Expected Average Reward Criterion ◽

Reward Criterion

Download Full-text

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Applied Probability ◽

10.1239/jap/1437658607 ◽

2015 ◽

Vol 52 (2) ◽

pp. 419-440

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-De-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Point Of View ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Criterion ◽

Compact Action Sets ◽

Path Point ◽

Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Applied Probability ◽

10.1017/s0021900200012559 ◽

2015 ◽

Vol 52 (02) ◽

pp. 419-440 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-De-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Point Of View ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Criterion ◽

Compact Action Sets ◽

Path Point ◽

Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

Long-run Strategic Advertising and Short-run Bertrand Competition

International Game Theory Review ◽

10.1142/s0219198915400149 ◽

2015 ◽

Vol 17 (02) ◽

pp. 1540014 ◽

Cited By ~ 2

Author(s):

Reinoud Joosten

Keyword(s):

Strategic Interaction ◽

Bertrand Competition ◽

Average Reward ◽

Sequential Decisions ◽

Long Run ◽

Short Run ◽

Average Reward Criterion ◽

Uniqueness Of Equilibrium ◽

Over Time ◽

Reward Criterion

We model and analyze strategic interaction over time in a duopoly. Each period the firms independently and simultaneously take two sequential decisions. First, they decide whether or not to advertise, then they set prices for goods which are imperfect substitutes. Not only the own, but also the other firm's past advertisement efforts affect the current "sales potential" of each firm. How much of this potential materializes as immediate sales, depends on current advertisement decisions. If both firms advertise, "sales potential" turns into demand, otherwise part of it "evaporates" and does not materialize. We determine feasible rewards and equilibria for the limiting average reward criterion. Uniqueness of equilibrium is by no means guaranteed, but Pareto efficiency may serve very well as a refinement criterion for wide ranges of the advertisement costs.

Download Full-text

A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Optimization Theory and Applications ◽

10.1007/s10957-013-0474-6 ◽

2013 ◽

Vol 163 (2) ◽

pp. 674-684 ◽

Cited By ~ 3

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Average Reward ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Reversible Markov Decision Processes with an Average-Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/110844957 ◽

2013 ◽

Vol 51 (1) ◽

pp. 402-418

Author(s):

Randy Cogill ◽

Cheng Peng

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

average reward criterion
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

On Finding Large Sets of Rewards in Two-Player ETP–ESP Games

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Long-run Strategic Advertising and Short-run Bertrand Competition

A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion

Reversible Markov Decision Processes with an Average-Reward Criterion

Export Citation Format

average reward criterionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

On Finding Large Sets of Rewards in Two-Player ETP–ESP Games

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion

Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Long-run Strategic Advertising and Short-run Bertrand Competition

A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion

Reversible Markov Decision Processes with an Average-Reward Criterion

average reward criterion
Recently Published Documents