ALGORITHMS FOR FINDING OPTIMAL POLICY FOR INTELLIGENT AGENTS BASED ON MARKOV DECISION-MAKING PROCESSES
Currently, the paradigm of intelligent agents and multi-agent systems is actively developing. The policy of agents ‘ actions can be represented as a Markov decision-making process. Such agents need methods to develop optimal policies. The purpose of this study is to review existing techniques, determine the possibility and conditions of their application. The main approaches based on linear and dynamic programming are considered. The specific algorithms used to find the extreme value of utility are given. The method of linear programming - simplex method, and the method of dynamic programming method-iteration of values are considered. The equations necessary to find the optimal policy of intelligent agent actions are given. Restrictions of application of various algorithms are considered. Conclusions the most suitable method for finding the optimal policy is the iteration of values.