Team learning from human demonstration with coordination confidence

2019 ◽  
Vol 34 ◽  
Author(s):  
Bikramjit Banerjee ◽  
Syamala Vittanala ◽  
Matthew Edmund Taylor

Abstract Among an array of techniques proposed to speed-up reinforcement learning (RL), learning from human demonstration has a proven record of success. A related technique, called Human-Agent Transfer, and its confidence-based derivatives have been successfully applied to single-agent RL. This article investigates their application to collaborative multi-agent RL problems. We show that a first-cut extension may leave room for improvement in some domains, and propose a new algorithm called coordination confidence (CC). CC analyzes the difference in perspectives between a human demonstrator (global view) and the learning agents (local view) and informs the agents’ action choices when the difference is critical and simply following the human demonstration can lead to miscoordination. We conduct experiments in three domains to investigate the performance of CC in comparison with relevant baselines.

2021 ◽  
Vol 11 (11) ◽  
pp. 4948
Author(s):  
Lorenzo Canese ◽  
Gian Carlo Cardarilli ◽  
Luca Di Di Nunzio ◽  
Rocco Fazzolari ◽  
Daniele Giardino ◽  
...  

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.


Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2789 ◽  
Author(s):  
Hang Qi ◽  
Hao Huang ◽  
Zhiqun Hu ◽  
Xiangming Wen ◽  
Zhaoming Lu

In order to meet the ever-increasing traffic demand of Wireless Local Area Networks (WLANs), channel bonding is introduced in IEEE 802.11 standards. Although channel bonding effectively increases the transmission rate, the wider channel reduces the number of non-overlapping channels and is more susceptible to interference. Meanwhile, the traffic load differs from one access point (AP) to another and changes significantly depending on the time of day. Therefore, the primary channel and channel bonding bandwidth should be carefully selected to meet traffic demand and guarantee the performance gain. In this paper, we proposed an On-Demand Channel Bonding (O-DCB) algorithm based on Deep Reinforcement Learning (DRL) for heterogeneous WLANs to reduce transmission delay, where the APs have different channel bonding capabilities. In this problem, the state space is continuous and the action space is discrete. However, the size of action space increases exponentially with the number of APs by using single-agent DRL, which severely affects the learning rate. To accelerate learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is used to train O-DCB. Real traffic traces collected from a campus WLAN are used to train and test O-DCB. Simulation results reveal that the proposed algorithm has good convergence and lower delay than other algorithms.


2017 ◽  
Vol 13 (1) ◽  
pp. 155014771668484 ◽  
Author(s):  
Huthiafa Q Qadori ◽  
Zuriati A Zulkarnain ◽  
Zurina Mohd Hanapi ◽  
Shamala Subramaniam

Recently, wireless sensor networks have employed the concept of mobile agent to reduce energy consumption and obtain effective data gathering. Typically, in data gathering based on mobile agent, it is an important and essential step to find out the optimal itinerary planning for the mobile agent. However, single-agent itinerary planning suffers from two primary disadvantages: task delay and large size of mobile agent as the scale of the network is expanded. Thus, using multi-agent itinerary planning overcomes the drawbacks of single-agent itinerary planning. Despite the advantages of multi-agent itinerary planning, finding the optimal number of distributed mobile agents, source nodes grouping, and optimal itinerary of each mobile agent for simultaneous data gathering are still regarded as critical issues in wireless sensor network. Therefore, in this article, the existing algorithms that have been identified in the literature to address the above issues are reviewed. The review shows that most of the algorithms used one parameter to find the optimal number of mobile agents in multi-agent itinerary planning without utilizing other parameters. More importantly, the review showed that theses algorithms did not take into account the security of the data gathered by the mobile agent. Accordingly, we indicated the limitations of each proposed algorithm and new directions are provided for future research.


2021 ◽  
Vol 17 (3) ◽  
pp. 88-99
Author(s):  
Roderic A. Girle

Three foundational principles are introduced: intelligent systems such as those that would pass the Turing test should display multi-agent or interactional intelligence; multi-agent systems should be based on conceptual structures common to all interacting agents, machine and human; and multi-agent systems should have an underlying interactional logic such as dialogue logic. In particular, a multi-agent rather than an orthodox analysis of the key concepts of knowledge and belief is discussed. The contrast that matters is the difference between the different questions and answers about the support for claims to know and claims to believe. A simple multi-agent system based on dialogue theory which provides for such a difference is set out.


2021 ◽  
Vol 28 (2) ◽  
pp. 163-182
Author(s):  
José L. Simancas-García ◽  
Kemel George-González

Shannon’s sampling theorem is one of the most important results of modern signal theory. It describes the reconstruction of any band-limited signal from a finite number of its samples. On the other hand, although less well known, there is the discrete sampling theorem, proved by Cooley while he was working on the development of an algorithm to speed up the calculations of the discrete Fourier transform. Cooley showed that a sampled signal can be resampled by selecting a smaller number of samples, which reduces computational cost. Then it is possible to reconstruct the original sampled signal using a reverse process. In principle, the two theorems are not related. However, in this paper we will show that in the context of Non Standard Mathematical Analysis (NSA) and Hyperreal Numerical System R, the two theorems are equivalent. The difference between them becomes a matter of scale. With the scale changes that the hyperreal number system allows, the discrete variables and functions become continuous, and Shannon’s sampling theorem emerges from the discrete sampling theorem.


2016 ◽  
Vol 24 (6) ◽  
pp. 446-463 ◽  
Author(s):  
Mansoor Shaukat ◽  
Mandar Chitre

In this paper, the role of adaptive group cohesion in a cooperative multi-agent source localization problem is investigated. A distributed source localization algorithm is presented for a homogeneous team of simple agents. An agent uses a single sensor to sense the gradient and two sensors to sense its neighbors. The algorithm is a set of individualistic and social behaviors where the individualistic behavior is as simple as an agent keeping its previous heading and is not self-sufficient in localizing the source. Source localization is achieved as an emergent property through agent’s adaptive interactions with the neighbors and the environment. Given a single agent is incapable of localizing the source, maintaining team connectivity at all times is crucial. Two simple temporal sampling behaviors, intensity-based-adaptation and connectivity-based-adaptation, ensure an efficient localization strategy with minimal agent breakaways. The agent behaviors are simultaneously optimized using a two phase evolutionary optimization process. The optimized behaviors are estimated with analytical models and the resulting collective behavior is validated against the agent’s sensor and actuator noise, strong multi-path interference due to environment variability, initialization distance sensitivity and loss of source signal.


Author(s):  
Daxue Liu ◽  
Jun Wu ◽  
Xin Xu

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.


Author(s):  
Maryam Ebrahimi

The main purpose of this study is to describe and analyze an agent from a distributed multi-agent based system (ABS) according to the BDI architecture. This agent is capable of autonomous action to propose general technology strategies (TSs) in renewable energy SMEs based on a set of rules and interacts with a core agent in multi ABS. The recognition of internal strengths and weaknesses as well as external opportunities and threats takes place on the basis of technological SWOT-analysis. Proposed TSs are categorized into four types: aggressive strategy, turnaround oriented strategy, diversification strategy, and defensive strategy. Agent architecture in terms of three abstraction layers called psychological, theoretical, and implementation is explained. And after system validation by experts, some program codes and output results of this agent are presented. This system provides information to facilitate the TS planning process to be carried out effectively.


Author(s):  
František Capkovic

The Petri nets (PN)-based analytical approach to describing both the single agent behaviour as well as the cooperation of several agents in MAS (multi agent systems) is presented. PN yield the possibility to express the agent behaviour and cooperation by means of the vector state equation in the form of linear discrete system. Hence, the modular approach to the creation of the MAS model can be successfully used too. Three different interconnections of modules (agents, interfaces, environment) expressed by PN subnets are introduced. The approach makes possible to use methods of linear algebra. Moreover, it can be successfully used at the system analysis (e.g. the reachability of states), at testing the system properties, and even at the system control synthesis.


2019 ◽  
Vol 23 (01) ◽  
pp. 1950015 ◽  
Author(s):  
YANDONG XIAO ◽  
CHULIANG SONG ◽  
LIANG TIAN ◽  
YANG-YU LIU

Our ability to understand and control the emergence of order in swarming systems is a fundamental challenge in contemporary science. The standard Vicsek model (SVM) — a minimal model for swarming systems of self-propelled particles — describes a large population of agents reaching global alignment without the need of central control. Yet, the emergence of order in this model takes time and is not robust to noise. In many real-world scenarios, we need a decentralized protocol to guide a swarming system (e.g., unmanned vehicles or nanorobots) to reach an ordered state in a prompt and noise-robust manner. Here, we find that introducing a simple adaptive rule based on the heading differences of neighboring particles in the Vicsek model can effectively speed up their global alignment, mitigate the disturbance of noise to alignment, and maintain a robust alignment under predation. This simple adaptive model of swarming systems could offer new insights in understanding the prompt and flexible formation of animals and help us design better protocols to achieve fast and robust alignment for multi-agent systems.


Sign in / Sign up

Export Citation Format

Share Document