Team learning from human demonstration with coordination confidence

Abstract Among an array of techniques proposed to speed-up reinforcement learning (RL), learning from human demonstration has a proven record of success. A related technique, called Human-Agent Transfer, and its confidence-based derivatives have been successfully applied to single-agent RL. This article investigates their application to collaborative multi-agent RL problems. We show that a first-cut extension may leave room for improvement in some domains, and propose a new algorithm called coordination confidence (CC). CC analyzes the difference in perspectives between a human demonstrator (global view) and the learning agents (local view) and informs the agents’ action choices when the difference is critical and simply following the human demonstration can lead to miscoordination. We conduct experiments in three domains to investigate the performance of CC in comparison with relevant baselines.

Download Full-text

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Applied Sciences ◽

10.3390/app11114948 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4948

Author(s):

Lorenzo Canese ◽

Gian Carlo Cardarilli ◽

Luca Di Di Nunzio ◽

Rocco Fazzolari ◽

Daniele Giardino ◽

...

Keyword(s):

Reinforcement Learning ◽

Mathematical Models ◽

Learning Algorithms ◽

Single Agent ◽

Critical Issues ◽

Multi Agent ◽

Pros And Cons ◽

Application Fields

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

Download Full-text

On-Demand Channel Bonding in Heterogeneous WLANs: A Multi-Agent Deep Reinforcement Learning Approach

Sensors ◽

10.3390/s20102789 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2789 ◽

Cited By ~ 1

Author(s):

Hang Qi ◽

Hao Huang ◽

Zhiqun Hu ◽

Xiangming Wen ◽

Zhaoming Lu

Keyword(s):

Reinforcement Learning ◽

Transmission Rate ◽

Single Agent ◽

Time Of Day ◽

Action Space ◽

Traffic Load ◽

Traffic Demand ◽

Channel Bonding ◽

On Demand ◽

Multi Agent

In order to meet the ever-increasing traffic demand of Wireless Local Area Networks (WLANs), channel bonding is introduced in IEEE 802.11 standards. Although channel bonding effectively increases the transmission rate, the wider channel reduces the number of non-overlapping channels and is more susceptible to interference. Meanwhile, the traffic load differs from one access point (AP) to another and changes significantly depending on the time of day. Therefore, the primary channel and channel bonding bandwidth should be carefully selected to meet traffic demand and guarantee the performance gain. In this paper, we proposed an On-Demand Channel Bonding (O-DCB) algorithm based on Deep Reinforcement Learning (DRL) for heterogeneous WLANs to reduce transmission delay, where the APs have different channel bonding capabilities. In this problem, the state space is continuous and the action space is discrete. However, the size of action space increases exponentially with the number of APs by using single-agent DRL, which severely affects the learning rate. To accelerate learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is used to train O-DCB. Real traffic traces collected from a campus WLAN are used to train and test O-DCB. Simulation results reveal that the proposed algorithm has good convergence and lower delay than other algorithms.

Download Full-text

Multi-mobile agent itinerary planning algorithms for data gathering in wireless sensor networks: A review paper

International Journal of Distributed Sensor Networks ◽

10.1177/1550147716684841 ◽

2017 ◽

Vol 13 (1) ◽

pp. 155014771668484 ◽

Cited By ~ 10

Author(s):

Huthiafa Q Qadori ◽

Zuriati A Zulkarnain ◽

Zurina Mohd Hanapi ◽

Shamala Subramaniam

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Mobile Agent ◽

Mobile Agents ◽

Single Agent ◽

Data Gathering ◽

Optimal Number ◽

Wireless Sensor ◽

Itinerary Planning ◽

Multi Agent

Recently, wireless sensor networks have employed the concept of mobile agent to reduce energy consumption and obtain effective data gathering. Typically, in data gathering based on mobile agent, it is an important and essential step to find out the optimal itinerary planning for the mobile agent. However, single-agent itinerary planning suffers from two primary disadvantages: task delay and large size of mobile agent as the scale of the network is expanded. Thus, using multi-agent itinerary planning overcomes the drawbacks of single-agent itinerary planning. Despite the advantages of multi-agent itinerary planning, finding the optimal number of distributed mobile agents, source nodes grouping, and optimal itinerary of each mobile agent for simultaneous data gathering are still regarded as critical issues in wireless sensor network. Therefore, in this article, the existing algorithms that have been identified in the literature to address the above issues are reviewed. The review shows that most of the algorithms used one parameter to find the optimal number of mobile agents in multi-agent itinerary planning without utilizing other parameters. More importantly, the review showed that theses algorithms did not take into account the security of the data gathered by the mobile agent. Accordingly, we indicated the limitations of each proposed algorithm and new directions are provided for future research.

Download Full-text

Foundations for Knowledge and Belief Management

International Journal of Knowledge Management ◽

10.4018/ijkm.2021070106 ◽

2021 ◽

Vol 17 (3) ◽

pp. 88-99

Author(s):

Roderic A. Girle

Keyword(s):

Intelligent Systems ◽

Turing Test ◽

Multi Agent Systems ◽

Agent Systems ◽

Conceptual Structures ◽

Interacting Agents ◽

Questions And Answers ◽

Multi Agent ◽

The Difference ◽

Dialogue Theory

Three foundational principles are introduced: intelligent systems such as those that would pass the Turing test should display multi-agent or interactional intelligence; multi-agent systems should be based on conceptual structures common to all interacting agents, machine and human; and multi-agent systems should have an underlying interactional logic such as dialogue logic. In particular, a multi-agent rather than an orthodox analysis of the key concepts of knowledge and belief is discussed. The contrast that matters is the difference between the different questions and answers about the support for claims to know and claims to believe. A simple multi-agent system based on dialogue theory which provides for such a difference is set out.

Download Full-text

Discrete sampling theorem to Shannon’s sampling theorem using the hyperreal numbers R

Revista de Matemática Teoría y Aplicaciones ◽

10.15517/rmta.v28i2.43356 ◽

2021 ◽

Vol 28 (2) ◽

pp. 163-182

Author(s):

José L. Simancas-García ◽

Kemel George-González

Keyword(s):

Computational Cost ◽

Number System ◽

Sampling Theorem ◽

Discrete Sampling ◽

Sampled Signal ◽

Speed Up ◽

Shannon’S Sampling Theorem ◽

The Difference ◽

Band Limited ◽

Hyperreal Numbers

Shannon’s sampling theorem is one of the most important results of modern signal theory. It describes the reconstruction of any band-limited signal from a finite number of its samples. On the other hand, although less well known, there is the discrete sampling theorem, proved by Cooley while he was working on the development of an algorithm to speed up the calculations of the discrete Fourier transform. Cooley showed that a sampled signal can be resampled by selecting a smaller number of samples, which reduces computational cost. Then it is possible to reconstruct the original sampled signal using a reverse process. In principle, the two theorems are not related. However, in this paper we will show that in the context of Non Standard Mathematical Analysis (NSA) and Hyperreal Numerical System R, the two theorems are equivalent. The difference between them becomes a matter of scale. With the scale changes that the hyperreal number system allows, the discrete variables and functions become continuous, and Shannon’s sampling theorem emerges from the discrete sampling theorem.

Download Full-text

Adaptive behaviors in multi-agent source localization using passive sensing

Adaptive Behavior ◽

10.1177/1059712316664120 ◽

2016 ◽

Vol 24 (6) ◽

pp. 446-463 ◽

Cited By ~ 4

Author(s):

Mansoor Shaukat ◽

Mandar Chitre

Keyword(s):

Source Localization ◽

Single Agent ◽

Analytical Models ◽

Localization Algorithm ◽

Two Phase ◽

Temporal Sampling ◽

Passive Sensing ◽

Source Signal ◽

Localization Strategy ◽

Multi Agent

In this paper, the role of adaptive group cohesion in a cooperative multi-agent source localization problem is investigated. A distributed source localization algorithm is presented for a homogeneous team of simple agents. An agent uses a single sensor to sense the gradient and two sensors to sense its neighbors. The algorithm is a set of individualistic and social behaviors where the individualistic behavior is as simple as an agent keeping its previous heading and is not self-sufficient in localizing the source. Source localization is achieved as an emergent property through agent’s adaptive interactions with the neighbors and the environment. Given a single agent is incapable of localizing the source, maintaining team connectivity at all times is crucial. Two simple temporal sampling behaviors, intensity-based-adaptation and connectivity-based-adaptation, ensure an efficient localization strategy with minimal agent breakaways. The agent behaviors are simultaneously optimized using a two phase evolutionary optimization process. The optimized behaviors are estimated with analytical models and the resulting collective behavior is validated against the agent’s sensor and actuator noise, strong multi-path interference due to environment variability, initialization distance sensitivity and loss of source signal.

Download Full-text

Multi-agent reinforcement learning using ordinal action selection and approximate policy iteration

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691316500533 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650053

Author(s):

Daxue Liu ◽

Jun Wu ◽

Xin Xu

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Action Selection ◽

Policy Iteration ◽

Common Interest ◽

Policy Space ◽

Markov Games ◽

Approximate Policy Iteration ◽

Multi Agent ◽

Agent Coordination

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.

Download Full-text

BDI Approach to Build a Single Agent of a Distributed Multi-Agent System

Advances in Business Information Systems and Analytics - Modeling and Simulation Techniques for Improved Business Processes ◽

10.4018/978-1-5225-3226-2.ch002 ◽

2018 ◽

pp. 24-49

Author(s):

Maryam Ebrahimi

Keyword(s):

Swot Analysis ◽

Planning Process ◽

Single Agent ◽

Defensive Strategy ◽

Diversification Strategy ◽

Agent Based ◽

Autonomous Action ◽

System Validation ◽

Multi Agent ◽

Technology Strategies

The main purpose of this study is to describe and analyze an agent from a distributed multi-agent based system (ABS) according to the BDI architecture. This agent is capable of autonomous action to propose general technology strategies (TSs) in renewable energy SMEs based on a set of rules and interacts with a core agent in multi ABS. The recognition of internal strengths and weaknesses as well as external opportunities and threats takes place on the basis of technological SWOT-analysis. Proposed TSs are categorized into four types: aggressive strategy, turnaround oriented strategy, diversification strategy, and defensive strategy. Agent architecture in terms of three abstraction layers called psychological, theoretical, and implementation is explained. And after system validation by experts, some program codes and output results of this agent are presented. This system provides information to facilitate the TS planning process to be carried out effectively.

Download Full-text

A System Approach to Describing, Analysing and Control of the Behaviour of Agents in MAS

Cybernetics and Systems Theory in Management ◽

10.4018/978-1-61520-668-1.ch014 ◽

2010 ◽

pp. 253-273

Author(s):

František Capkovic

Keyword(s):

System Analysis ◽

Single Agent ◽

State Equation ◽

Control Synthesis ◽

System Control ◽

Multi Agent Systems ◽

Linear Discrete System ◽

Multi Agent ◽

System Properties ◽

And Control

The Petri nets (PN)-based analytical approach to describing both the single agent behaviour as well as the cooperation of several agents in MAS (multi agent systems) is presented. PN yield the possibility to express the agent behaviour and cooperation by means of the vector state equation in the form of linear discrete system. Hence, the modular approach to the creation of the MAS model can be successfully used too. Three different interconnections of modules (agents, interfaces, environment) expressed by PN subnets are introduced. The approach makes possible to use methods of linear algebra. Moreover, it can be successfully used at the system analysis (e.g. the reachability of states), at testing the system properties, and even at the system control synthesis.

Download Full-text

ACCELERATING THE EMERGENCE OF ORDER IN SWARMING SYSTEMS

Advances in Complex Systems ◽

10.1142/s0219525919500152 ◽

2019 ◽

Vol 23 (01) ◽

pp. 1950015 ◽

Cited By ~ 1

Author(s):

YANDONG XIAO ◽

CHULIANG SONG ◽

LIANG TIAN ◽

YANG-YU LIU

Keyword(s):

Large Population ◽

Unmanned Vehicles ◽

Global Alignment ◽

Multi Agent Systems ◽

Vicsek Model ◽

Contemporary Science ◽

Speed Up ◽

Multi Agent ◽

Noise Robust ◽

And Control

Our ability to understand and control the emergence of order in swarming systems is a fundamental challenge in contemporary science. The standard Vicsek model (SVM) — a minimal model for swarming systems of self-propelled particles — describes a large population of agents reaching global alignment without the need of central control. Yet, the emergence of order in this model takes time and is not robust to noise. In many real-world scenarios, we need a decentralized protocol to guide a swarming system (e.g., unmanned vehicles or nanorobots) to reach an ordered state in a prompt and noise-robust manner. Here, we find that introducing a simple adaptive rule based on the heading differences of neighboring particles in the Vicsek model can effectively speed up their global alignment, mitigate the disturbance of noise to alignment, and maintain a robust alignment under predation. This simple adaptive model of swarming systems could offer new insights in understanding the prompt and flexible formation of animals and help us design better protocols to achieve fast and robust alignment for multi-agent systems.

Download Full-text