Two-stage training algorithm for AI robot soccer

In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior; however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, e.g., team rewards. Because these two training processes are conducted in a series in every time step, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. The experiments are performed in a robot soccer environment using Webots robot simulation software. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other three approaches that can be used to solve problems of training cooperative multi-agent. Quantitatively, a team trained by the proposed method improves the score concede rate by 5% to 30% when compared to teams trained with the other approaches in matches against evaluation teams.

Download Full-text

Multi-agent reinforcement learning: an approach based on the other agent's internal model

Proceedings Fourth International Conference on MultiAgent Systems ◽

10.1109/icmas.2000.858456 ◽

2002 ◽

Cited By ~ 29

Author(s):

Y. Nagayuki ◽

S. Ishii ◽

K. Doya

Keyword(s):

Reinforcement Learning ◽

Internal Model ◽

The Other ◽

Multi Agent

Download Full-text

Fast reinforcement learning approach to cooperative behavior acquisition in multi-agent system

IEEE/RSJ International Conference on Intelligent Robots and System ◽

10.1109/irds.2002.1041500 ◽

2003 ◽

Cited By ~ 2

Author(s):

Songhao Piao ◽

Bingrong Hong

Keyword(s):

Reinforcement Learning ◽

Cooperative Behavior ◽

Learning Approach ◽

Multi Agent System ◽

Agent System ◽

Multi Agent

Download Full-text

Distributed Multi-agent Reinforcement Learning and Its Application to Robot Soccer

2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing ◽

10.1109/ettandgrs.2008.328 ◽

2008 ◽

Cited By ~ 1

Author(s):

Bo Fan ◽

Jiexin Pu

Keyword(s):

Reinforcement Learning ◽

Robot Soccer ◽

Multi Agent

Download Full-text

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Entropy ◽

10.3390/e23091133 ◽

2021 ◽

Vol 23 (9) ◽

pp. 1133

Author(s):

Shanzhi Gu ◽

Mingyang Geng ◽

Long Lan

Keyword(s):

Reinforcement Learning ◽

Noise Intensity ◽

Fault Tolerant ◽

State Of The Art ◽

Learning Systems ◽

Noisy Environments ◽

Time Step ◽

Malicious Behavior ◽

Previous State ◽

Multi Agent

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.

Download Full-text

VarLenMARL: A Framework of Variable-Length Time-Step Multi-Agent Reinforcement Learning for Cooperative Charging in Sensor Networks

2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON) ◽

10.1109/secon52354.2021.9491594 ◽

2021 ◽

Author(s):

Yuxin Chen ◽

Hejun Wu ◽

Yongheng Liang ◽

Guoming Lai

Keyword(s):

Reinforcement Learning ◽

Sensor Networks ◽

Variable Length ◽

Time Step ◽

Multi Agent

Download Full-text

Multi-Agent Deep Reinforcement Learning for Solving Large-scale Air Traffic Flow Management Problem: A Time-Step Sequential Decision Approach

10.1109/dasc52595.2021.9594329 ◽

2021 ◽

Author(s):

Yifan Tang ◽

Yan Xu

Keyword(s):

Reinforcement Learning ◽

Traffic Flow ◽

Large Scale ◽

Management Problem ◽

Sequential Decision ◽

Time Step ◽

Flow Management ◽

Air Traffic Flow Management ◽

Traffic Flow Management ◽

Multi Agent

Download Full-text

Adaptation Method of the Exploration Ratio Based on the Orientation of Equilibrium in Multi-Agent Reinforcement Learning Under Non-Stationary Environments

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0939 ◽

2017 ◽

Vol 21 (5) ◽

pp. 939-947

Author(s):

Takuya Okano ◽

Itsuki Noda ◽

◽

Keyword(s):

Reinforcement Learning ◽

Learning Performance ◽

Evolutionary Adaptation ◽

Agent Learning ◽

Optimal Value ◽

Multi Agent ◽

Adaptation Method

In this paper, we propose a method to adapt the exploration ratio in multi-agent reinforcement learning. The adaptation of exploration ratio is important in multi-agent learning, as this is one of key parameters that affect the learning performance. In our observation, the adaptation method can adjust the exploration ratio suitably (but not optimally) according to the characteristics of environments. We investigated the evolutionarily adaptation of the exploration ratio in multi-agent learning. We conducted several experiments to adapt the exploration ratio in a simple evolutionary way, namely, mimicking advantageous exploration ratio (MAER), and confirmed that MAER always acquires relatively lower exploration ratio than the optimal value for the change ratio of the environments. In this paper, we propose a second evolutionary adaptation method, namely, win or update exploration ratio (WoUE). The results of the experiments showed that WoUE can acquire a more suitable exploration ratio than MAER, and the obtained ratio was near-optimal.

Download Full-text

Multi-Agent Reinforcement Learning for Automated Peer-to-Peer Energy Trading in Double-Side Auction Market

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/401 ◽

2021 ◽

Author(s):

Dawei Qiu ◽

Jianhong Wang ◽

Junkai Wang ◽

Goran Strbac

Keyword(s):

Reinforcement Learning ◽

Electricity Market ◽

Smart Grids ◽

Public Information ◽

Economic Benefits ◽

Peer To Peer ◽

The Other ◽

Energy Trading ◽

Utility Company ◽

Multi Agent

With increasing prosumers employed with distributed energy resources (DER), advanced energy management has become increasingly important. To this end, integrating demand-side DER into electricity market is a trend for future smart grids. The double-side auction (DA) market is viewed as a promising peer-to-peer (P2P) energy trading mechanism that enables interactions among prosumers in a distributed manner. To achieve the maximum profit in a dynamic electricity market, prosumers act as price makers to simultaneously optimize their operations and trading strategies. However, the traditional DA market is difficult to be explicitly modelled due to its complex clearing algorithm and the stochastic bidding behaviors of the participants. For this reason, in this paper we model this task as a multi-agent reinforcement learning (MARL) problem and propose an algorithm called DA-MADDPG that is modified based on MADDPG by abstracting the other agents’ observations and actions through the DA market public information for each agent’s critic. The experiments show that 1) prosumers obtain more economic benefits in P2P energy trading w.r.t. the conventional electricity market independently trading with the utility company; and 2) DA-MADDPG performs better than the traditional Zero Intelligence (ZI) strategy and the other MARL algorithms, e.g., IQL, IDDPG, IPPO and MADDPG.

Download Full-text

Two-Stage Hybrid Network Clustering Using Multi-Agent Reinforcement Learning

Electronics ◽

10.3390/electronics10030232 ◽

2021 ◽

Vol 10 (3) ◽

pp. 232

Author(s):

Joohyun Kim ◽

Dongkwan Ryu ◽

Juyeon Kim ◽

Jae-Hoon Kim

Keyword(s):

Reinforcement Learning ◽

Delaunay Triangulation ◽

Selection Process ◽

Single Agent ◽

Hybrid Approach ◽

Critical Factor ◽

Hybrid Network ◽

Network Clustering ◽

Two Stage ◽

Multi Agent

In the Internet-of-Things (IoT) environments, the publish (pub)/subscribe (sub)-operated communication is widely employed. The use of pub/sub operation as a lightweight communication protocol facilitates communication among IoTs. The protocol consists of network nodes functioning as publishers, subscribers, and brokers, wherein brokers transfer messages from publishers to subscribers. Thus, the communication capability of the broker is a critical factor in the overall communication performance. In this study, multi-agent reinforcement learning (MARL) is applied to find the best combination of broker nodes. MARL goes through various combinations of broker nodes to find the best combination. However, MARL is inefficient to perform with an excessive number of broker nodes. Delaunay triangulation selects candidate broker nodes among the pool of broker nodes. The selection process operates as a preprocessing of the MARL. The suggested Delaunay triangulation is improved by the custom deletion method. Consequently, the two-stage hybrid approach outperforms any methods employing single-agent reinforcement learning (SARL). The MARL eliminates the performance fluctuation of the SARL caused by the iterative selection of broker nodes. Furthermore, the proposed approach requires a fewer number of candidate broker nodes and converges faster.

Download Full-text

Cooperative Behavior Acquisition in Multi-agent Reinforcement Learning System Using Attention Degree

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-34487-9_65 ◽

2012 ◽

pp. 537-544 ◽

Cited By ~ 3

Author(s):

Kunikazu Kobayashi ◽

Tadashi Kurano ◽

Takashi Kuremoto ◽

Masanao Obayashi

Keyword(s):

Reinforcement Learning ◽

Cooperative Behavior ◽

Learning System ◽

Multi Agent

Download Full-text