scholarly journals The Important Role of Global State for Multi-Agent Reinforcement Learning

2021 ◽  
Vol 14 (1) ◽  
pp. 17
Author(s):  
Shuailong Li ◽  
Wei Zhang ◽  
Yuquan Leng ◽  
Xiaohui Wang

Environmental information plays an important role in deep reinforcement learning (DRL). However, many algorithms do not pay much attention to environmental information. In multi-agent reinforcement learning decision-making, because agents need to make decisions combined with the information of other agents in the environment, this makes the environmental information more important. To prove the importance of environmental information, we added environmental information to the algorithm. We evaluated many algorithms on a challenging set of StarCraft II micromanagement tasks. Compared with the original algorithm, the standard deviation (except for the VDN algorithm) was smaller than that of the original algorithm, which shows that our algorithm has better stability. The average score of our algorithm was higher than that of the original algorithm (except for VDN and COMA), which shows that our work significantly outperforms existing multi-agent RL methods.

Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 631
Author(s):  
Chunyang Hu

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1433
Author(s):  
Kaifang Wan ◽  
Dingwei Wu ◽  
Yiwei Zhai ◽  
Bo Li ◽  
Xiaoguang Gao ◽  
...  

A pursuit–evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit–evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.


2019 ◽  
Vol 1 (2) ◽  
pp. 590-610
Author(s):  
Zohreh Akbari ◽  
Rainer Unland

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.


2020 ◽  
Author(s):  
Milena Rmus ◽  
Samuel McDougle ◽  
Anne Collins

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports some aspects of learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human decision making, including the generalization of learned information, one-shot learning, and the synthesis of task information in complex environments. Instead, these aspects of instrumental behavior are assumed to be supported by the brain’s executive functions (EF). We review recent findings that highlight the importance of EF in learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in the brain and behavior.


2012 ◽  
Vol 11 (05) ◽  
pp. 935-960 ◽  
Author(s):  
JAVIER GARCÍA ◽  
FERNANDO BORRAJO ◽  
FERNANDO FERNÁNDEZ

Business simulators are powerful tools for both supporting the decision-making process of business managers as well as for business education. An example is SIMBA (SIMulator for Business Administration), a powerful simulator which is currently used as a web-based platform for business education in different institutions. In this paper, we propose the application of reinforcement learning (RL) for the creation of intelligent agents that can manage virtual companies in SIMBA. This application is not trivial, given the particular intrinsic characteristics of SIMBA: it is a generalized domain where hundreds of parameters modify the domain behavior; it is a multi-agent domain where both cooperation and competition among different agents can coexist; it is required to set dozens of continuous decision variables for a given business decision, which is made only after the study of hundreds of continuous variables. We will demonstrate empirically that all these challenges can be overcome through the use of RL, showing results for different learning scenarios.


2020 ◽  
Vol 1 (1) ◽  
Author(s):  
Graham Findlay ◽  
Giulio Tononi ◽  
Chiara Cirelli

Abstract The term hippocampal replay originally referred to the temporally compressed reinstantiation, during rest, of sequential neural activity observed during prior active wake. Since its description in the 1990s, hippocampal replay has often been viewed as the key mechanism by which a memory trace is repeatedly rehearsed at high speeds during sleep and gradually transferred to neocortical circuits. However, the methods used to measure the occurrence of replay remain debated, and it is now clear that the underlying neural events are considerably more complicated than the traditional narratives had suggested. “Replay-like” activity happens during wake, can play out in reverse order, may represent trajectories never taken by the animal, and may have additional functions beyond memory consolidation, from learning values and solving the problem of credit assignment to decision-making and planning. Still, we know little about the role of replay in cognition, and to what extent it differs between wake and sleep. This may soon change, however, because decades-long efforts to explain replay in terms of reinforcement learning (RL) have started to yield testable predictions and possible explanations for a diverse set of observations. Here, we (1) survey the diverse features of replay, focusing especially on the latest findings; (2) discuss recent attempts at unifying disparate experimental results and putatively different cognitive functions under the banner of RL; (3) discuss methodological issues and theoretical biases that impede progress or may warrant a partial revaluation of the current literature, and finally; (4) highlight areas of considerable uncertainty and promising avenues of inquiry.


Author(s):  
Paulo Trigo

The key motivation for this chapter is the perception that within the near future, markets will be composed of individuals that may simultaneously undertake the roles of consumers, producers and traders. Those individuals are economically motivated “prosumer” (producer-consumer) agents that not only consume, but can also produce, store and trade assets. This chapter describes the most relevant aspects of a simulation tool that provides (human and virtual) prosumer agents an interactive and real-time game-like environment where they can explore (long-term and short-term) strategic behaviour and experience the effects of social influence in their decision-making processes. The game-like environment is focused on the simulation of electricity markets, it is named ITEM-game (“Investment and Trading in Electricity Markets”), and it is publically available (ITEM-Game, 2013) for any player to explore the role of a prosumer agent.


2020 ◽  
Vol 17 (3) ◽  
pp. 172988142091696
Author(s):  
Xiaoli Liu

This article studies a multi-agent reinforcement learning algorithm based on agent action prediction. In multi-agent system, the action of learning agent selection is inevitably affected by the action of other agents, so the reinforcement learning system needs to consider the joint state and joint action of multi-agent based on this. In addition, the application of this method in the cooperative strategy learning of soccer robot is studied, so that the multi-agent system can pass through the environment. To realize the division of labour and cooperation of multi-robots, the interactive learning is used to master the behaviour strategy. Combined with the characteristics of decision-making of soccer robot, this article analyses the role transformation and experience sharing of multi-agent reinforcement learning, and applies it to the local attack strategy of soccer robot, uses this algorithm to learn the action selection strategy of the main robot in the team, and uses Matlab platform for simulation verification. The experimental results prove the effectiveness of the research method, and the superiority of the proposed method is validated compared with some simple methods.


Sign in / Sign up

Export Citation Format

Share Document