Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

Download Full-text

Bootstrap Estimated Uncertainty of the Environment Model for Model-Based Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013870 ◽

2019 ◽

Vol 33 ◽

pp. 3870-3877 ◽

Cited By ~ 1

Author(s):

Wenzhen Huang ◽

Junge Zhang ◽

Kaiqi Huang

Keyword(s):

Reinforcement Learning ◽

Potential Model ◽

Model Error ◽

Model Errors ◽

Dynamics Model ◽

State Action ◽

Model Based ◽

Environment Model ◽

Target Values ◽

Bootstrap Distribution

Model-based reinforcement learning (RL) methods attempt to learn a dynamics model to simulate the real environment and utilize the model to make better decisions. However, the learned environment simulator often has more or less model error which would disturb making decision and reduce performance. We propose a bootstrapped model-based RL method which bootstraps the modules in each depth of the planning tree. This method can quantify the uncertainty of environment model on different state-action pairs and lead the agent to explore the pairs with higher uncertainty to reduce the potential model errors. Moreover, we sample target values from their bootstrap distribution to connect the uncertainties at current and subsequent time-steps and introduce the prior mechanism to improve the exploration efficiency. Experiment results demonstrate that our method efficiently decreases model error and outperforms TreeQN and other stateof-the-art methods on multiple Atari games.

Download Full-text

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6177 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6941-6948

Author(s):

Qi Zhou ◽

HouQiang Li ◽

Jie Wang

Keyword(s):

Reinforcement Learning ◽

Performance Improvement ◽

Optimization Method ◽

Asymptotic Performance ◽

Model Based ◽

Model Free ◽

Deep Model ◽

Conservative Policy ◽

Policy Optimization ◽

Novel Model

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

Download Full-text

Research on multi-agent model based on track methodology of ontology building

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00412 ◽

2009 ◽

Vol 29 (2) ◽

pp. 412-415

Author(s):

Qiang LU ◽

Ming CHEN ◽

Zhi-guang WANG

Keyword(s):

Agent Model ◽

Model Based ◽

Multi Agent ◽

Ontology Building

Download Full-text

Simulation and Analysis of Rumor Propagation Model Based on Multi-Agent

Proceedings of the 2020 3rd International Conference on Computer Science and Software Engineering ◽

10.1145/3403746.3403918 ◽

2020 ◽

Author(s):

Hong Zhao ◽

Hao Zhang ◽

Yuli Mei

Keyword(s):

Propagation Model ◽

Model Based ◽

Rumor Propagation ◽

Multi Agent

Download Full-text

A decentralized model-based diagnostic tool for complex systems

Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001 ◽

10.1109/ictai.2001.974453 ◽

2002 ◽

Cited By ~ 7

Author(s):

Y. Pencole ◽

M.-O. Cordier ◽

L. Roze

Keyword(s):

Complex Systems ◽

Diagnostic Tool ◽

Model Based ◽

Decentralized Model

Download Full-text

An EDM Multi-Agent Collaborative Manufacturing System Based on Petri Net

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.457-458.921 ◽

2012 ◽

Vol 457-458 ◽

pp. 921-926

Author(s):

Jin Zhi Zhao ◽

Yuan Tao Liu ◽

Hui Ying Zhao

Keyword(s):

Petri Net ◽

Manufacturing System ◽

Intelligent Manufacturing ◽

Agent Technology ◽

Manufacturing Processes ◽

Intelligent Manufacturing System ◽

Collaborative Manufacturing ◽

Model Based ◽

Support Organizations ◽

Multi Agent

A framework for building EDM collaborative manufacturing system using multi-agent technology to support organizations characterized by physically distributed, enterprise-wide, heterogeneous intelligent manufacturing system over Internet is proposed. According to the characteristics of agile EDM collaborative manufacturing system(AEDMCMS), the agent technology is combined with Petri net in order to analyze the model. Based on the basic Petri Net, the definition is extended and the Agent-oriented Petri net (APN) is proposed. AEDMCM is turned into the model of Petri Net which is suitable to the analysis and optimization of manufacturing processes.

Download Full-text