scholarly journals An Action Selection Method Using Degree of Cooperation in a Multi-agent Reinforcement Learning System

2014 ◽  
Vol 1 (3) ◽  
pp. 231 ◽  
Author(s):  
Masanori Kawamura ◽  
Kunikazu Kobayashi
Author(s):  
Daxue Liu ◽  
Jun Wu ◽  
Xin Xu

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.


Author(s):  
Masahiro Nakayama ◽  
Takeshi Kamio ◽  
Kunihiko Mitsubori ◽  
Takahiro Tanaka ◽  
Hisato Fujisaka

2006 ◽  
Vol 04 (06) ◽  
pp. 1071-1083 ◽  
Author(s):  
C. L. CHEN ◽  
D. Y. DONG ◽  
Z. H. CHEN

This paper proposes a novel action selection method based on quantum computation and reinforcement learning (RL). Inspired by the advantages of quantum computation, the state/action in a RL system is represented with quantum superposition state. The probability of action eigenvalue is denoted by probability amplitude, which is updated according to rewards. And the action selection is carried out by observing quantum state according to collapse postulate of quantum measurement. The results of simulated experiments show that quantum computation can be effectively used to action selection and decision making through speeding up learning. This method also makes a good tradeoff between exploration and exploitation for RL using probability characteristics of quantum theory.


2021 ◽  
Vol 11 (1) ◽  
pp. 6637-6644
Author(s):  
H. El Fazazi ◽  
M. Elgarej ◽  
M. Qbadou ◽  
K. Mansouri

Adaptive e-learning systems are created to facilitate the learning process. These systems are able to suggest the student the most suitable pedagogical strategy and to extract the information and characteristics of the learners. A multi-agent system is a collection of organized and independent agents that communicate with each other to resolve a problem or complete a well-defined objective. These agents are always in communication and they can be homogeneous or heterogeneous and may or may not have common objectives. The application of the multi-agent approach in adaptive e-learning systems can enhance the learning process quality by customizing the contents to students’ needs. The agents in these systems collaborate to provide a personalized learning experience. In this paper, a design of an adaptative e-learning system based on a multi-agent approach and reinforcement learning is presented. The main objective of this system is the recommendation to the students of a learning path that meets their characteristics and preferences using the Q-learning algorithm. The proposed system is focused on three principal characteristics, the learning style according to the Felder-Silverman learning style model, the knowledge level, and the student's possible disabilities. Three types of disabilities were taken into account, namely hearing impairments, visual impairments, and dyslexia. The system will be able to provide the students with a sequence of learning objects that matches their profiles for a personalized learning experience.


Author(s):  
Kazuteru Miyazaki ◽  
Koudai Furukawa ◽  
Hiroaki Kobayashi ◽  
◽  
◽  
...  

When multiple agents learn a task simultaneously in an environment, the learning results often become unstable. This problem is known as the concurrent learning problem and to date, several methods have been proposed to resolve it. In this paper, we propose a new method that incorporates expected failure probability (EFP) into the action selection strategy to give agents a kind of mutual adaptability. The effectiveness of the proposed method is confirmed using Keepaway task.


Sign in / Sign up

Export Citation Format

Share Document