scholarly journals Decentralized MCTS via Learned Teammate Models

Author(s):  
Aleksander Czechowski ◽  
Frans A. Oliehoek

Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other agents. In this paper, we present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search, combined with models of teammates learned from previous episodic runs. By only allowing one agent to adapt its models at a time, under the assumption of ideal policy approximation, successive iterations of our method are guaranteed to improve joint policies, and eventually lead to convergence to a Nash equilibrium. We test the efficiency of the algorithm by performing experiments in several scenarios of the spatial task allocation environment introduced in [Claes et al., 2015]. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators which exploit the spatial features of the problem, and that the proposed algorithm improves over the baseline planning performance for particularly challenging domain configurations.

Author(s):  
Dino Borri ◽  
Domenico Camarda

Landscapes and townscapes have been studied by many disciplinary areas over time. This study addresses the cognitive and perceptual dimensions of environmental spacescapes in planning by human agents. In fact, because of their dynamic complexity, environmental spacescapes create challengesfor the typical spatial behaviour of an agent perceiving and navigating in it. Therefore, environmental planning activities need to identify and manage the ‘fundamentals’ of spacescapes from the viewpoints of living single agents or multi-agent organizations, those to whom the planning effort is addressed. In this framework, the chapter deals with spatial ontologies in multi-agent systems. Some recent experiments are described and discussed here, highlighting spatial features of navigated environments from an environmental planning perspective.


2020 ◽  
Vol 34 (04) ◽  
pp. 4561-4568
Author(s):  
Jongmin Lee ◽  
Wonseok Jeon ◽  
Geon-Hyeong Kim ◽  
Kee-Eung Kim

Monte-Carlo Tree Search (MCTS) is the state-of-the-art online planning algorithm for large problems with discrete action spaces. However, many real-world problems involve continuous action spaces, where MCTS is not as effective as in discrete action spaces. This is mainly due to common practices such as coarse discretization of the entire action space and failure to exploit local smoothness. In this paper, we introduce Value-Gradient UCT (VG-UCT), which combines traditional MCTS with gradient-based optimization of action particles. VG-UCT simultaneously performs a global search via UCT with respect to the finitely sampled set of actions and performs a local improvement via action value gradients. In the experiments, we demonstrate that our approach outperforms existing MCTS methods and other strong baseline algorithms for continuous action spaces.


2011 ◽  
Vol 175 (2) ◽  
pp. 487-511 ◽  
Author(s):  
Feng Wu ◽  
Shlomo Zilberstein ◽  
Xiaoping Chen

Author(s):  
Minglong Li ◽  
Wenjing Yang ◽  
Zhongxuan Cai ◽  
Shaowu Yang ◽  
Ji Wang

The performance of decentralized multi-agent systems tends to benefit from information sharing and its effective utilization. However, too much or unnecessary sharing may hinder the performance due to the delay, instability and additional overhead of communications. Aiming to a satisfiable coordination performance, one would prefer the cost of communications as less as possible. In this paper, we propose an approach for improving the sharing utilization by integrating information sharing with prediction in decentralized planning. We present a novel planning algorithm by combining decision sharing and prediction based on decentralized Monte Carlo Tree Search called Dec-MCTS-SP. Each agent grows a search tree guided by the rewards calculated by the joint actions, which can not only be sampled from the shared probability distributions over action sequences, but also be predicted by a sufficiently-accurate and computationally-cheap heuristics-based method. Besides, several policies including sparse and discounted UCT and DIY-bonus are leveraged for performance improvement. We have implemented Dec-MCTS-SP in the case study on multi-agent information gathering under threat and uncertainty, which is formulated as Decentralized Partially Observable Markov Decision Process (Dec-POMDP). The factored belief vectors are integrated into Dec-MCTS-SP to handle the uncertainty. Comparing with the random, auction-based algorithm and Dec-MCTS, the evaluation shows that Dec-MCTS-SP can reduce communication cost significantly while still achieving a surprisingly higher coordination performance.


Sign in / Sign up

Export Citation Format

Share Document