scholarly journals Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Author(s):  
Weinan Zhang ◽  
Xihuai Wang ◽  
Jian Shen ◽  
Ming Zhou

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

Author(s):  
Wenzhen Huang ◽  
Junge Zhang ◽  
Kaiqi Huang

Model-based reinforcement learning (RL) methods attempt to learn a dynamics model to simulate the real environment and utilize the model to make better decisions. However, the learned environment simulator often has more or less model error which would disturb making decision and reduce performance. We propose a bootstrapped model-based RL method which bootstraps the modules in each depth of the planning tree. This method can quantify the uncertainty of environment model on different state-action pairs and lead the agent to explore the pairs with higher uncertainty to reduce the potential model errors. Moreover, we sample target values from their bootstrap distribution to connect the uncertainties at current and subsequent time-steps and introduce the prior mechanism to improve the exploration efficiency. Experiment results demonstrate that our method efficiently decreases model error and outperforms TreeQN and other stateof-the-art methods on multiple Atari games.


2020 ◽  
Vol 34 (04) ◽  
pp. 6941-6948
Author(s):  
Qi Zhou ◽  
HouQiang Li ◽  
Jie Wang

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.


2009 ◽  
Vol 29 (2) ◽  
pp. 412-415
Author(s):  
Qiang LU ◽  
Ming CHEN ◽  
Zhi-guang WANG

2012 ◽  
Vol 457-458 ◽  
pp. 921-926
Author(s):  
Jin Zhi Zhao ◽  
Yuan Tao Liu ◽  
Hui Ying Zhao

A framework for building EDM collaborative manufacturing system using multi-agent technology to support organizations characterized by physically distributed, enterprise-wide, heterogeneous intelligent manufacturing system over Internet is proposed. According to the characteristics of agile EDM collaborative manufacturing system(AEDMCMS), the agent technology is combined with Petri net in order to analyze the model. Based on the basic Petri Net, the definition is extended and the Agent-oriented Petri net (APN) is proposed. AEDMCM is turned into the model of Petri Net which is suitable to the analysis and optimization of manufacturing processes.


2021 ◽  
Vol 54 (5) ◽  
pp. 19-24
Author(s):  
Tyler Westenbroek ◽  
Ayush Agrawal ◽  
Fernando Castañeda ◽  
S Shankar Sastry ◽  
Koushil Sreenath

Sign in / Sign up

Export Citation Format

Share Document