Automatic construction of Markov decision process models for multi-agent reinforcement learning

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.

Download Full-text

Risk aversion and risk seeking in multicriteria forest management: a Markov decision process approach

Canadian Journal of Forest Research ◽

10.1139/cjfr-2016-0502 ◽

2017 ◽

Vol 47 (6) ◽

pp. 800-807 ◽

Cited By ~ 9

Author(s):

Joseph Buongiorno ◽

Mo Zhou ◽

Craig Johnston

Keyword(s):

Risk Aversion ◽

Markov Decision Process ◽

Decision Process ◽

Weighted Average ◽

Basal Area ◽

Risk Attitude ◽

Expected Value ◽

Process Models ◽

Risk Seeking ◽

Markov Decision

Markov decision process models were extended to reflect some consequences of the risk attitude of forestry decision makers. One approach consisted of maximizing the expected value of a criterion subject to an upper bound on the variance or, symmetrically, minimizing the variance subject to a lower bound on the expected value. The other method used the certainty equivalent criterion, a weighted average of the expected value and variance. The two approaches were applied to data for mixed softwood–hardwood forests in the southern United States with multiple financial and ecological criteria. Compared with risk neutrality or risk seeking, financial risk aversion reduced expected annual financial returns and production and led to shorter cutting cycles that lowered the expected diversity of tree species and size, stand basal area, stored CO2e, and old-growth area.

Download Full-text

Cooperative retransmissions using Markov decision process with reinforcement learning

2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications ◽

10.1109/pimrc.2009.5450098 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ghasem Naddafzadeh Shirazi ◽

Peng-Yong Kong ◽

Chen-Khong Tham

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Markov Decision

Download Full-text

Using Intelligent Multi-Agent Systems to Model and Foster Self-Regulated Learning: A Theoretically-Based Approach Using Markov Decision Process

2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA) ◽

10.1109/aina.2013.70 ◽

2013 ◽

Cited By ~ 1

Author(s):

B. Khosravifar ◽

F. Bouchet ◽

R. Feyzi-Behnagh ◽

R. Azevedo ◽

J. M. Harley

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Multi Agent Systems ◽

Agent Systems ◽

Self Regulated Learning ◽

Regulated Learning ◽

Markov Decision ◽

Multi Agent

Download Full-text

Continuous-time Markov decision process with average reward: Using reinforcement learning method

2015 34th Chinese Control Conference (CCC) ◽

10.1109/chicc.2015.7260117 ◽

2015 ◽

Author(s):

Shengde Jia ◽

Lincheng Shen ◽

Hongtao Xue

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Continuous Time ◽

Decision Process ◽

Learning Method ◽

Average Reward ◽

Markov Decision

Download Full-text

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/638 ◽

2017 ◽

Author(s):

Madison Clark-Turner ◽

Christopher Amato

Keyword(s):

Markov Decision Process ◽

Real World ◽

Decision Process ◽

Extended Version ◽

Continuous Observation ◽

Solution Methods ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

Download Full-text

Universal Reinforcement Learning Algorithms: Survey and Experiments

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/194 ◽

2017 ◽

Author(s):

John Aslanides ◽

Jan Leike ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Open Source ◽

Markov Decision Process ◽

Decision Process ◽

Empirical Investigation ◽

State Of The Art ◽

Learning Algorithms ◽

Markov Decision ◽

Reference Implementation ◽

Partially Observable

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

Download Full-text