Decentralized MCTS via Learned Teammate Models

Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other agents. In this paper, we present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search, combined with models of teammates learned from previous episodic runs. By only allowing one agent to adapt its models at a time, under the assumption of ideal policy approximation, successive iterations of our method are guaranteed to improve joint policies, and eventually lead to convergence to a Nash equilibrium. We test the efficiency of the algorithm by performing experiments in several scenarios of the spatial task allocation environment introduced in [Claes et al., 2015]. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators which exploit the spatial features of the problem, and that the proposed algorithm improves over the baseline planning performance for particularly challenging domain configurations.

Download Full-text

Online planning for multi-agent systems with consensus protocol

Proceedings of the 33rd Chinese Control Conference ◽

10.1109/chicc.2014.6896786 ◽

2014 ◽

Author(s):

Wenxu Zhang ◽

Xiaolong Chen ◽

Lei Ma

Keyword(s):

Multi Agent Systems ◽

Consensus Protocol ◽

Agent Systems ◽

Online Planning ◽

Multi Agent

Download Full-text

Spatial Ontologies in Multi-Agent Environmental Planning

Technologies for Supporting Reasoning Communities and Collaborative Decision Making ◽

10.4018/978-1-60960-091-4.ch015 ◽

2011 ◽

pp. 272-295 ◽

Cited By ~ 3

Author(s):

Dino Borri ◽

Domenico Camarda

Keyword(s):

Environmental Planning ◽

Spatial Behaviour ◽

Multi Agent Systems ◽

Dynamic Complexity ◽

Agent Systems ◽

Spatial Features ◽

Planning Effort ◽

Multi Agent ◽

Over Time ◽

Living Single

Landscapes and townscapes have been studied by many disciplinary areas over time. This study addresses the cognitive and perceptual dimensions of environmental spacescapes in planning by human agents. In fact, because of their dynamic complexity, environmental spacescapes create challengesfor the typical spatial behaviour of an agent perceiving and navigating in it. Therefore, environmental planning activities need to identify and manage the ‘fundamentals’ of spacescapes from the viewpoints of living single agents or multi-agent organizations, those to whom the planning effort is addressed. In this framework, the chapter deals with spatial ontologies in multi-agent systems. Some recent experiments are described and discussed here, highlighting spatial features of navigated environments from an environmental planning perspective.

Download Full-text

Monte-Carlo Tree Search in Continuous Action Spaces with Value Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5885 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4561-4568

Author(s):

Jongmin Lee ◽

Wonseok Jeon ◽

Geon-Hyeong Kim ◽

Kee-Eung Kim

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Continuous Action ◽

Monte Carlo Tree Search ◽

Online Planning ◽

Local Improvement ◽

Discrete Action ◽

Planning Algorithm ◽

Gradient Based ◽

Action Spaces

Monte-Carlo Tree Search (MCTS) is the state-of-the-art online planning algorithm for large problems with discrete action spaces. However, many real-world problems involve continuous action spaces, where MCTS is not as effective as in discrete action spaces. This is mainly due to common practices such as coarse discretization of the entire action space and failure to exploit local smoothness. In this paper, we introduce Value-Gradient UCT (VG-UCT), which combines traditional MCTS with gradient-based optimization of action particles. VG-UCT simultaneously performs a global search via UCT with respect to the finitely sampled set of actions and performs a local improvement via action value gradients. In the experiments, we demonstrate that our approach outperforms existing MCTS methods and other strong baseline algorithms for continuous action spaces.

Download Full-text

Online planning for multi-agent systems with bounded communication

Artificial Intelligence ◽

10.1016/j.artint.2010.09.008 ◽

2011 ◽

Vol 175 (2) ◽

pp. 487-511 ◽

Cited By ~ 46

Author(s):

Feng Wu ◽

Shlomo Zilberstein ◽

Xiaoping Chen

Keyword(s):

Multi Agent Systems ◽

Agent Systems ◽

Online Planning ◽

Multi Agent

Download Full-text

Integrating Decision Sharing with Prediction in Decentralized Planning for Multi-Agent Coordination under Uncertainty

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/64 ◽

2019 ◽

Author(s):

Minglong Li ◽

Wenjing Yang ◽

Zhongxuan Cai ◽

Shaowu Yang ◽

Ji Wang

Keyword(s):

Information Sharing ◽

Probability Distributions ◽

Search Tree ◽

Multi Agent Systems ◽

Monte Carlo Tree Search ◽

Decentralized Planning ◽

Markov Decision ◽

Planning Algorithm ◽

Multi Agent ◽

The Cost

The performance of decentralized multi-agent systems tends to benefit from information sharing and its effective utilization. However, too much or unnecessary sharing may hinder the performance due to the delay, instability and additional overhead of communications. Aiming to a satisfiable coordination performance, one would prefer the cost of communications as less as possible. In this paper, we propose an approach for improving the sharing utilization by integrating information sharing with prediction in decentralized planning. We present a novel planning algorithm by combining decision sharing and prediction based on decentralized Monte Carlo Tree Search called Dec-MCTS-SP. Each agent grows a search tree guided by the rewards calculated by the joint actions, which can not only be sampled from the shared probability distributions over action sequences, but also be predicted by a sufficiently-accurate and computationally-cheap heuristics-based method. Besides, several policies including sparse and discounted UCT and DIY-bonus are leveraged for performance improvement. We have implemented Dec-MCTS-SP in the case study on multi-agent information gathering under threat and uncertainty, which is formulated as Decentralized Partially Observable Markov Decision Process (Dec-POMDP). The factored belief vectors are integrated into Dec-MCTS-SP to handle the uncertainty. Comparing with the random, auction-based algorithm and Dec-MCTS, the evaluation shows that Dec-MCTS-SP can reduce communication cost significantly while still achieving a surprisingly higher coordination performance.

Download Full-text