scholarly journals ALGORITHMS FOR FINDING OPTIMAL POLICY FOR INTELLIGENT AGENTS BASED ON MARKOV DECISION-MAKING PROCESSES

Author(s):  
A. V. Lachikhin

Currently, the paradigm of intelligent agents and multi-agent systems is actively developing. The policy of agents ‘ actions can be represented as a Markov decision-making process. Such agents need methods to develop optimal policies. The purpose of this study is to review existing techniques, determine the possibility and conditions of their application. The main approaches based on linear and dynamic programming are considered. The specific algorithms used to find the extreme value of utility are given. The method of linear programming - simplex method, and the method of dynamic programming method-iteration of values are considered. The equations necessary to find the optimal policy of intelligent agent actions are given. Restrictions of application of various algorithms are considered. Conclusions the most suitable method for finding the optimal policy is the iteration of values.

2019 ◽  
Vol 1 (2) ◽  
pp. 590-610
Author(s):  
Zohreh Akbari ◽  
Rainer Unland

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.


2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Wei Zeng ◽  
Hongtao Zhou ◽  
Mingshan You

In high stakes situations decision-makers are often risk-averse and decision-making processes often take place in group settings. This paper studies multiagent decision-theoretic planning under Markov decision processes (MDPs) framework with considering the change of agent’s risk attitude as his wealth level varies. Based on one-switch utility function that describes agent’s risk attitude change with his wealth level, we give the additive and multiplicative aggregation models of group utility and adopt maximizing expected group utility as planning objective. When the wealth level approaches infinity, the characteristics of optimal policy are analyzed for the additive and multiplicative aggregation model, respectively. Then a backward-induction method is proposed to divide the wealth level interval from negative infinity to initial wealth level into subintervals and determine the optimal policy in states and subintervals. The proposed method is illustrated by numerical examples and the influences of agent’s risk aversion parameters and weights on group decision-making are also analyzed.


2021 ◽  
Vol 2094 (3) ◽  
pp. 032033
Author(s):  
I A Kirikov ◽  
S V Listopad ◽  
A S Luchko

Abstract The paper proposes the model for negotiating intelligent agents’ ontologies in cohesive hybrid intelligent multi-agent systems. Intelligent agent in this study will be called relatively autonomous software entity with developed domain models and goal-setting mechanisms. When such agents have to work together within single hybrid intelligent multi-agent systems to solve some problem, the working process “go wild”, if there are significant differences between the agents’ “points of view” on the domain, goals and rules of joint work. In this regard, in order to reduce labor costs for integrating intelligent agents into a single system, the concept of cohesive hybrid intelligent multi-agent systems was proposed that implement mechanisms for negotiating goals, domain models and building a protocol for solving the problems posed. The presence of these mechanisms is especially important when building intelligent systems from intelligent agents created by various independent development teams.


2021 ◽  
Author(s):  
Yunfan Su

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.


Sign in / Sign up

Export Citation Format

Share Document