scholarly journals Two-stage training algorithm for AI robot soccer

2021 ◽  
Vol 7 ◽  
pp. e718
Author(s):  
Taeyoung Kim ◽  
Luiz Felipe Vecchietti ◽  
Kyujin Choi ◽  
Sanem Sariel ◽  
Dongsoo Har

In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior; however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, e.g., team rewards. Because these two training processes are conducted in a series in every time step, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. The experiments are performed in a robot soccer environment using Webots robot simulation software. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other three approaches that can be used to solve problems of training cooperative multi-agent. Quantitatively, a team trained by the proposed method improves the score concede rate by 5% to 30% when compared to teams trained with the other approaches in matches against evaluation teams.

Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1133
Author(s):  
Shanzhi Gu ◽  
Mingyang Geng ◽  
Long Lan

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.


Author(s):  
Takuya Okano ◽  
Itsuki Noda ◽  
◽  

In this paper, we propose a method to adapt the exploration ratio in multi-agent reinforcement learning. The adaptation of exploration ratio is important in multi-agent learning, as this is one of key parameters that affect the learning performance. In our observation, the adaptation method can adjust the exploration ratio suitably (but not optimally) according to the characteristics of environments. We investigated the evolutionarily adaptation of the exploration ratio in multi-agent learning. We conducted several experiments to adapt the exploration ratio in a simple evolutionary way, namely, mimicking advantageous exploration ratio (MAER), and confirmed that MAER always acquires relatively lower exploration ratio than the optimal value for the change ratio of the environments. In this paper, we propose a second evolutionary adaptation method, namely, win or update exploration ratio (WoUE). The results of the experiments showed that WoUE can acquire a more suitable exploration ratio than MAER, and the obtained ratio was near-optimal.


Author(s):  
Dawei Qiu ◽  
Jianhong Wang ◽  
Junkai Wang ◽  
Goran Strbac

With increasing prosumers employed with distributed energy resources (DER), advanced energy management has become increasingly important. To this end, integrating demand-side DER into electricity market is a trend for future smart grids. The double-side auction (DA) market is viewed as a promising peer-to-peer (P2P) energy trading mechanism that enables interactions among prosumers in a distributed manner. To achieve the maximum profit in a dynamic electricity market, prosumers act as price makers to simultaneously optimize their operations and trading strategies. However, the traditional DA market is difficult to be explicitly modelled due to its complex clearing algorithm and the stochastic bidding behaviors of the participants. For this reason, in this paper we model this task as a multi-agent reinforcement learning (MARL) problem and propose an algorithm called DA-MADDPG that is modified based on MADDPG by abstracting the other agents’ observations and actions through the DA market public information for each agent’s critic. The experiments show that 1) prosumers obtain more economic benefits in P2P energy trading w.r.t. the conventional electricity market independently trading with the utility company; and 2) DA-MADDPG performs better than the traditional Zero Intelligence (ZI) strategy and the other MARL algorithms, e.g., IQL, IDDPG, IPPO and MADDPG.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 232
Author(s):  
Joohyun Kim ◽  
Dongkwan Ryu ◽  
Juyeon Kim ◽  
Jae-Hoon Kim

In the Internet-of-Things (IoT) environments, the publish (pub)/subscribe (sub)-operated communication is widely employed. The use of pub/sub operation as a lightweight communication protocol facilitates communication among IoTs. The protocol consists of network nodes functioning as publishers, subscribers, and brokers, wherein brokers transfer messages from publishers to subscribers. Thus, the communication capability of the broker is a critical factor in the overall communication performance. In this study, multi-agent reinforcement learning (MARL) is applied to find the best combination of broker nodes. MARL goes through various combinations of broker nodes to find the best combination. However, MARL is inefficient to perform with an excessive number of broker nodes. Delaunay triangulation selects candidate broker nodes among the pool of broker nodes. The selection process operates as a preprocessing of the MARL. The suggested Delaunay triangulation is improved by the custom deletion method. Consequently, the two-stage hybrid approach outperforms any methods employing single-agent reinforcement learning (SARL). The MARL eliminates the performance fluctuation of the SARL caused by the iterative selection of broker nodes. Furthermore, the proposed approach requires a fewer number of candidate broker nodes and converges faster.


Sign in / Sign up

Export Citation Format

Share Document