Reinforcement Learning Based Hierarchical Multi-Agent Robotic Search Team in Uncertain Environment

Field of robotics has been under the limelight because of recent advances in Artificial Intelligence (AI). Due to increased diversity in multi-agent systems, new models are being developed to handle complexity of such systems. However, most of these models do not address problems such as; uncertainty handling, efficient learning, agent coordination and fault detection. This paper presents a novel approach of implementing Reinforcement Learning (RL) on hierarchical robotic search teams. The proposed algorithm handles uncertainties in the system by implementing Q-learning and depicts enhanced efficiency as well as better time consumption compared to prior models. The reason for that is each agent can take action on its own thus there is less dependency on leader agent for RL policy. The performance of this algorithm is measured by introducing agents in an unknown environment with both Markov Decision Process (MDP) and RL policies at their disposal. Simulation-based comparison of the agent motion is presented using the results from of MDP and RL policies. Furthermore, qualitative comparison of the proposed model with prior models is also presented.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/65 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yong Liu ◽

Yujing Hu ◽

Yang Gao ◽

Yingfeng Chen ◽

Changjie Fan

Keyword(s):

Reinforcement Learning ◽

Knowledge Transfer ◽

Value Function ◽

Single Agent ◽

Multi Agent Systems ◽

Agent Systems ◽

Markov Decision ◽

Dimensional State Space ◽

Multi Agent ◽

Function Transfer

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

Download Full-text

Multiagent reinforcement learning using Non-Parametric Approximation

Respuestas ◽

10.22463/0122820x.1738 ◽

2018 ◽

Vol 23 (2) ◽

pp. 53-61

Author(s):

David Luviano Cruz ◽

Francesco José García Luna ◽

Luis Asunción Pérez Domínguez

Keyword(s):

Reinforcement Learning ◽

Hybrid Control ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Generation Task ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Optimal Set ◽

Parametric Approximation

This paper presents a hybrid control proposal for multi-agent systems, where the advantages of the reinforcement learning and nonparametric functions are exploited. A modified version of the Q-learning algorithm is used which will provide data training for a Kernel, this approach will provide a sub optimal set of actions to be used by the agents. The proposed algorithm is experimentally tested in a path generation task in an unknown environment for mobile robots.

Download Full-text

Reinforcement Learning under Threats

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019939 ◽

2019 ◽

Vol 33 ◽

pp. 9939-9940 ◽

Cited By ~ 1

Author(s):

Victor Gallego ◽

Roi Naveiro ◽

David Rios Insua

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Potential Threat ◽

Q Learning ◽

Learning Framework ◽

Opponent Modeling ◽

Theoretical Approaches ◽

New Learning ◽

Markov Decision ◽

Multi Agent

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Download Full-text

Training Coordination Proxy Agents Using Reinforcement Learning

Handbook of Research on Agent-Based Societies ◽

10.4018/978-1-60566-236-7.ch011 ◽

2009 ◽

pp. 158-172 ◽

Cited By ~ 1

Author(s):

Myriam Abramson

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Machine Learning Techniques ◽

Multi Agent Systems ◽

Adjustable Autonomy ◽

Novel Approach ◽

Learning Techniques ◽

Multi Agent ◽

Mixed Initiative ◽

Learning Team

In heterogeneous multi-agent systems, where human and non-human agents coexist, intelligent proxy agents can help smooth out fundamental differences. In this context, delegating the coordination role to proxy agents can improve the overall outcome of a task at the expense of human cognitive overload due to switching subtasks. Stability and commitment are characteristics of human teamwork, but must not prevent the detection of better opportunities. In addition, coordination proxy agents must be trained from examples as a single agent, but must interact with multiple agents. We apply machine learning techniques to the task of learning team preferences from mixed-initiative interactions and compare the outcome results of different simulated user patterns. This chapter introduces a novel approach for the adjustable autonomy of coordination proxies based on the reinforcement learning of abstract actions. In conclusion, some consequences of the symbiotic relationship that such an approach suggests are discussed.

Download Full-text

Learning Agent Communication under Limited Bandwidth by Message Pruning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5957 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5142-5149

Author(s):

Hangyu Mao ◽

Zhengchao Zhang ◽

Zhen Xiao ◽

Zhibo Gong ◽

Yan Ni

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

Communication Strategy ◽

Multi Agent Systems ◽

Agent Communication ◽

Limited Bandwidth ◽

Agent Systems ◽

Gating Mechanism ◽

Learning Agent ◽

Multi Agent

Communication is a crucial factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been applied to learn the communication strategy and the control policy for multiple agents. However, the practical limited bandwidth in multi-agent communication has been largely ignored by the existing DRL methods. Specifically, many methods keep sending messages incessantly, which consumes too much bandwidth. As a result, they are inapplicable to multi-agent systems with limited bandwidth. To handle this problem, we propose a gating mechanism to adaptively prune less beneficial messages. We evaluate the gating mechanism on several tasks. Experiments demonstrate that it can prune a lot of messages with little impact on performance. In fact, the performance may be greatly improved by pruning redundant messages. Moreover, the proposed gating mechanism is applicable to several previous methods, equipping them the ability to address bandwidth restricted settings.

Download Full-text

An Approach for Fault Tolerance in Multi-Agent Systems using Learning Agents

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2015070103 ◽

2015 ◽

Vol 11 (3) ◽

pp. 30-44

Author(s):

Mounira Bouzahzah ◽

Ramdane Maamri

Keyword(s):

Fault Tolerance ◽

Formal Model ◽

Fault Tolerant ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Q Learning ◽

Learning Agents ◽

Agent Systems ◽

Learning Agent ◽

Multi Agent

Through this paper, the authors propose a new approach to get fault tolerant multi-agent systems using learning agents. Generally, the exceptions in the multi-agent system are divided into two main groups: private exceptions that are treated directly by the agents and global exceptions that combine all unexpected exceptions that need handlers to be solved. The proposed approach solves the problem of these global exceptions using learning agents. This work uses a formal model called hierarchical plans to model the activities of the system's agents in order to facilitate the exception detection and to model the communication with the learning agent. This latter uses a modified version of the Q Learning Algorithm in order to choose which handler can be used to solve an exceptions. The paper tries to give a new direction in the field of fault tolerance in multi-agent systems by using learning agents, the proposed solution makes it possible to adapt the handler used in case of failure within the context changes and treat repeated exceptions using learning agent experiences.

Download Full-text

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

IET Control Theory and Applications ◽

10.1049/iet-cta.2018.6266 ◽

2019 ◽

Vol 13 (17) ◽

pp. 2866-2876

Author(s):

Syed Ali Asad Rizvi ◽

Zongli Lin

Keyword(s):

Reinforcement Learning ◽

Discrete Time ◽

Output Feedback ◽

Multi Agent Systems ◽

Agent Systems ◽

Optimal Output ◽

Multi Agent

Download Full-text

Boltzmann Distributed Replicator Dynamics: Population Games in a Microgrid Context

Games ◽

10.3390/g12010008 ◽

2021 ◽

Vol 12 (1) ◽

pp. 8

Author(s):

Gustavo Chica-Pedraza ◽

Eduardo Mojica-Nava ◽

Ernesto Cadena-Muñoz

Keyword(s):

Control Method ◽

Optimization Problems ◽

Replicator Dynamics ◽

Boltzmann Distribution ◽

Full Information ◽

Multi Agent Systems ◽

Q Learning ◽

Population Games ◽

Distributed Approach ◽

Multi Agent

Multi-Agent Systems (MAS) have been used to solve several optimization problems in control systems. MAS allow understanding the interactions between agents and the complexity of the system, thus generating functional models that are closer to reality. However, these approaches assume that information between agents is always available, which means the employment of a full-information model. Some tendencies have been growing in importance to tackle scenarios where information constraints are relevant issues. In this sense, game theory approaches appear as a useful technique that use a strategy concept to analyze the interactions of the agents and achieve the maximization of agent outcomes. In this paper, we propose a distributed control method of learning that allows analyzing the effect of the exploration concept in MAS. The dynamics obtained use Q-learning from reinforcement learning as a way to include the concept of exploration into the classic exploration-less Replicator Dynamics equation. Then, the Boltzmann distribution is used to introduce the Boltzmann-Based Distributed Replicator Dynamics as a tool for controlling agents behaviors. This distributed approach can be used in several engineering applications, where communications constraints between agents are considered. The behavior of the proposed method is analyzed using a smart grid application for validation purposes. Results show that despite the lack of full information of the system, by controlling some parameters of the method, it has similar behavior to the traditional centralized approaches.

Download Full-text