ALGORITHMS FOR FINDING OPTIMAL POLICY FOR INTELLIGENT AGENTS BASED ON MARKOV DECISION-MAKING PROCESSES

Multi Agent Systems ◽

Markov Decision ◽

Decision Making Processes ◽

Multi Agent ◽

Method Of Dynamic Programming

Currently, the paradigm of intelligent agents and multi-agent systems is actively developing. The policy of agents ‘ actions can be represented as a Markov decision-making process. Such agents need methods to develop optimal policies. The purpose of this study is to review existing techniques, determine the possibility and conditions of their application. The main approaches based on linear and dynamic programming are considered. The specific algorithms used to find the extreme value of utility are given. The method of linear programming - simplex method, and the method of dynamic programming method-iteration of values are considered. The equations necessary to find the optimal policy of intelligent agent actions are given. Restrictions of application of various algorithms are considered. Conclusions the most suitable method for finding the optimal policy is the iteration of values.

Distributed optimal consensus control for nonlinear multi-agent systems with input saturation based on event-triggered adaptive dynamic programming method

International Journal of Control ◽

10.1080/00207179.2020.1790663 ◽

2020 ◽

pp. 1-13

Author(s):

Zhengqing Shi ◽

Chuan Zhou

Keyword(s):

Dynamic Programming ◽

Input Saturation ◽

Multi Agent Systems ◽

Consensus Control ◽

Agent Systems ◽

Adaptive Dynamic ◽

Multi Agent ◽

Event Triggered

Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method

ISA Transactions ◽

10.1016/j.isatra.2019.01.021 ◽

2019 ◽

Vol 91 ◽

pp. 184-195 ◽

Cited By ~ 7

Author(s):

Wei Zhao ◽

Huaipin Zhang

Keyword(s):

Dynamic Programming ◽

Coordination Control ◽

Multi Agent Systems ◽

Agent Systems ◽

Adaptive Dynamic ◽

Multi Agent ◽

Optimal Coordination ◽

Event Triggered

Proceedings Third Russian-Korean International Symposium on Science and Technology. KORUS'99 (Cat. No.99EX362) ◽

Monitoring and prognosis of decision making processes in multi-agent systems

10.1109/korus.1999.875911 ◽

2003 ◽

Author(s):

N.G. Zagoruiko ◽

J.I. Zhuravlev

Keyword(s):

Decision Making ◽

Multi Agent Systems ◽

Agent Systems ◽

Decision Making Processes ◽

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Risk-Sensitive Multiagent Decision-Theoretic Planning Based on MDP and One-Switch Utility Functions

Mathematical Problems in Engineering ◽

10.1155/2014/697895 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11

Author(s):

Wei Zeng ◽

Hongtao Zhou ◽

Mingshan You

Keyword(s):

Decision Making ◽

Attitude Change ◽

Optimal Policy ◽

Risk Attitude ◽

High Stakes ◽

Group Settings ◽

Risk Sensitive ◽

Markov Decision ◽

Decision Making Processes ◽

Initial Wealth

In high stakes situations decision-makers are often risk-averse and decision-making processes often take place in group settings. This paper studies multiagent decision-theoretic planning under Markov decision processes (MDPs) framework with considering the change of agent’s risk attitude as his wealth level varies. Based on one-switch utility function that describes agent’s risk attitude change with his wealth level, we give the additive and multiplicative aggregation models of group utility and adopt maximizing expected group utility as planning objective. When the wealth level approaches infinity, the characteristics of optimal policy are analyzed for the additive and multiplicative aggregation model, respectively. Then a backward-induction method is proposed to divide the wealth level interval from negative infinity to initial wealth level into subintervals and determine the optimal policy in states and subintervals. The proposed method is illustrated by numerical examples and the influences of agent’s risk aversion parameters and weights on group decision-making are also analyzed.

Data-based optimal coordination control of continuous-time nonlinear multi-agent systems via adaptive dynamic programming method

Journal of the Franklin Institute ◽

10.1016/j.jfranklin.2020.08.007 ◽

2020 ◽

Vol 357 (15) ◽

pp. 10312-10328

Author(s):

Jing Shi ◽

Dong Yue ◽

Xiangpeng Xie

Keyword(s):

Dynamic Programming ◽

Continuous Time ◽

Coordination Control ◽

Multi Agent Systems ◽

Agent Systems ◽

Adaptive Dynamic ◽

Multi Agent ◽

Optimal Coordination

Optimal Consensus Control of Unknown Nonlinear Multi-Agent Systems Using Adaptive Dynamic Programming via MRAC

2018 37th Chinese Control Conference (CCC) ◽

10.23919/chicc.2018.8482957 ◽

2018 ◽

Author(s):

Hao Fu ◽

Xin Chen ◽

Wei Wang ◽

Jinbin Li ◽

Yaodong Zhang

Keyword(s):

Dynamic Programming ◽

Multi Agent Systems ◽

Consensus Control ◽

Agent Systems ◽

Adaptive Dynamic ◽

Data Mining for Decision Making in Multi-Agent Systems

Multi-Agent Systems - Modeling, Interactions, Simulations and Case Studies ◽

10.5772/15584 ◽

2011 ◽

Author(s):

Hani K. ◽

Hoda K. ◽

Sally S.

Keyword(s):

Data Mining ◽

Decision Making ◽

Multi Agent Systems ◽

Agent Systems ◽

Agents’ ontologies negotiation in cohesive hybrid intelligent multi-agent systems

Journal of Physics Conference Series ◽

10.1088/1742-6596/2094/3/032033 ◽

2021 ◽

Vol 2094 (3) ◽

pp. 032033

Author(s):

I A Kirikov ◽

S V Listopad ◽

A S Luchko

Keyword(s):

Intelligent Agents ◽

Intelligent Systems ◽

Multi Agent Systems ◽

Labor Costs ◽

Joint Work ◽

Working Process ◽

Agent Systems ◽

Development Teams ◽

Multi Agent ◽

Domain Models

Abstract The paper proposes the model for negotiating intelligent agents’ ontologies in cohesive hybrid intelligent multi-agent systems. Intelligent agent in this study will be called relatively autonomous software entity with developed domain models and goal-setting mechanisms. When such agents have to work together within single hybrid intelligent multi-agent systems to solve some problem, the working process “go wild”, if there are significant differences between the agents’ “points of view” on the domain, goals and rules of joint work. In this regard, in order to reduce labor costs for integrating intelligent agents into a single system, the concept of cohesive hybrid intelligent multi-agent systems was proposed that implement mechanisms for negotiating goals, domain models and building a protocol for solving the problems posed. The presence of these mechanisms is especially important when building intelligent systems from intelligent agents created by various independent development teams.

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336.v1 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.