scholarly journals Temporal sampling annealing schemes for receding horizon multi-agent planning

2021 ◽  
pp. 103823
Author(s):  
Aaron Ma ◽  
Mike Ouimet ◽  
Jorge Cortés
2005 ◽  
Vol 36 (4) ◽  
pp. 266-272 ◽  
Author(s):  
Xu Rui ◽  
Cui Pingyuan ◽  
Xu Xiaofei

2006 ◽  
pp. 301-325 ◽  
Author(s):  
Michael Bowling ◽  
Rune Jensen ◽  
Manuela Veloso

2018 ◽  
Vol 32 (6) ◽  
pp. 779-821
Author(s):  
Shlomi Maliah ◽  
Guy Shani ◽  
Roni Stern

Author(s):  
Yanlin Han ◽  
Piotr Gmytrasiewicz

This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.


2016 ◽  
Vol 24 (6) ◽  
pp. 446-463 ◽  
Author(s):  
Mansoor Shaukat ◽  
Mandar Chitre

In this paper, the role of adaptive group cohesion in a cooperative multi-agent source localization problem is investigated. A distributed source localization algorithm is presented for a homogeneous team of simple agents. An agent uses a single sensor to sense the gradient and two sensors to sense its neighbors. The algorithm is a set of individualistic and social behaviors where the individualistic behavior is as simple as an agent keeping its previous heading and is not self-sufficient in localizing the source. Source localization is achieved as an emergent property through agent’s adaptive interactions with the neighbors and the environment. Given a single agent is incapable of localizing the source, maintaining team connectivity at all times is crucial. Two simple temporal sampling behaviors, intensity-based-adaptation and connectivity-based-adaptation, ensure an efficient localization strategy with minimal agent breakaways. The agent behaviors are simultaneously optimized using a two phase evolutionary optimization process. The optimized behaviors are estimated with analytical models and the resulting collective behavior is validated against the agent’s sensor and actuator noise, strong multi-path interference due to environment variability, initialization distance sensitivity and loss of source signal.


Sign in / Sign up

Export Citation Format

Share Document