Implementing an Online Scheduling Approach for Production with Multi Agent Proximal Policy Optimization (MAPPO)

Electric Vehicle (EV) sharing systems have recently experienced unprecedented growth across the world. One of the key challenges in their operation is vehicle rebalancing, i.e., repositioning the EVs across stations to better satisfy future user demand. This is particularly challenging in the shared EV context, because i) the range of EVs is limited while charging time is substantial, which constrains the rebalancing options; and ii) as a new mobility trend, most of the current EV sharing systems are still continuously expanding their station networks, i.e., the targets for rebalancing can change over time. To tackle these challenges, in this paper we model the rebalancing task as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We propose a novel approach of policy optimization with action cascading, which isolates the non-stationarity locally, and use two connected networks to solve the formulated MARL. We evaluate the proposed approach using a simulator calibrated with 1-year operation data from a real EV sharing system. Results show that our approach significantly outperforms the state-of-the-art, offering up to 14% gain in order satisfied rate and 12% increase in net revenue.

Download Full-text

Online Scheduling in Multi-project Environments: A Multi-agent Approach

Advances in Intelligent and Soft Computing - 7th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS 2009) ◽

10.1007/978-3-642-00487-2_31 ◽

2010 ◽

pp. 293-301

Author(s):

José Alberto Arauzo ◽

José Manuel Galán ◽

Javier Pajares ◽

Adolfo López-Paredes

Keyword(s):

Online Scheduling ◽

Multi Agent

Download Full-text

Semi-Online Scheduling Algorithm of Multi-Agent in Network Management

Journal of Computer Research and Development ◽

10.1360/crad20060401 ◽

2006 ◽

Vol 43 (4) ◽

pp. 571 ◽

Cited By ~ 2

Author(s):

Bo Liu

Keyword(s):

Network Management ◽

Scheduling Algorithm ◽

Online Scheduling ◽

Multi Agent

Download Full-text

Data-Driven Online Energy Scheduling of a Microgrid Based on Deep Reinforcement Learning

Energies ◽

10.3390/en14082120 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2120

Author(s):

Ying Ji ◽

Jianhui Wang ◽

Jiacan Xu ◽

Donglin Li

Keyword(s):

Reinforcement Learning ◽

Operating Cost ◽

Online Scheduling ◽

Optimal Scheduling ◽

Data Driven ◽

High Dimensional ◽

Continuous Control ◽

Renewable Energy Resources ◽

Continuous Actions ◽

Policy Optimization

The proliferation of distributed renewable energy resources (RESs) poses major challenges to the operation of microgrids due to uncertainty. Traditional online scheduling approaches relying on accurate forecasts become difficult to implement due to the increase of uncertain RESs. Although several data-driven methods have been proposed recently to overcome the challenge, they generally suffer from a scalability issue due to the limited ability to optimize high-dimensional continuous control variables. To address these issues, we propose a data-driven online scheduling method for microgrid energy optimization based on continuous-control deep reinforcement learning (DRL). We formulate the online scheduling problem as a Markov decision process (MDP). The objective is to minimize the operating cost of the microgrid considering the uncertainty of RESs generation, load demand, and electricity prices. To learn the optimal scheduling strategy, a Gated Recurrent Unit (GRU)-based network is designed to extract temporal features of uncertainty and generate the optimal scheduling decisions in an end-to-end manner. To optimize the policy with high-dimensional and continuous actions, proximal policy optimization (PPO) is employed to train the neural network-based policy in a data-driven fashion. The proposed method does not require any forecasting information on the uncertainty or a prior knowledge of the physical model of the microgrid. Simulation results using realistic power system data of California Independent System Operator (CAISO) demonstrate the effectiveness of the proposed method.

Download Full-text

Markov Decision Process Based Multi-agent System Applied to Aeroengine Maintenance Policy Optimization

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2008.427 ◽

2008 ◽

Cited By ~ 3

Author(s):

Jianrong Wang ◽

Shouming Hou ◽

Yingying Su ◽

Jianwei Du ◽

Wanshan Wang

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Multi Agent System ◽

Maintenance Policy ◽

Agent System ◽

Markov Decision ◽

Multi Agent ◽

Policy Optimization

Download Full-text

Dynamic Scheduling of Multi-Agent in Agent-Based Distributed Network Management

Architectural Design of Multi-Agent Systems ◽

10.4018/978-1-59904-108-7.ch007 ◽

2011 ◽

pp. 125-142

Author(s):

Luo Junzhou

Keyword(s):

Network Management ◽

Performance Test ◽

Dynamic Scheduling ◽

Scheduling Algorithm ◽

Online Scheduling ◽

Distributed Network ◽

Management Scenario ◽

Distributed Network Management ◽

Agent Scheduling ◽

Multi Agent

Agent technology has played an important role in distributed network management, and agent scheduling is an inevitable problem in a multi-agent system. This chapter introduces a network management scenario to support dynamic scheduling decisions. Some algorithms are proposed to decompose the whole network management task into several groups of sub-tasks. During the course of decomposition, different priorities are assigned to sub-tasks. Then, based on the priorities of these sub-tasks, a dynamic multi-agent scheduling algorithm based on dependences of sub-tasks is proposed. An experiment has been done with the decomposition algorithms, the results of which demonstrate the advantage of the algorithms. The performance test demonstrates that the competitive ratio of the dynamic scheduling algorithm is always smaller than that of the existing online scheduling algorithm, which indicates that the performance of the dynamic scheduling algorithm is better than the existing online scheduling algorithm. Finally, as an application example, the process of network stream management is presented. The authors hope that this scheduling method can give a new approach or suggestion for studying dynamic agents scheduling technology.

Download Full-text

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

10.21203/rs.3.rs-1156014/v1 ◽

2021 ◽

Author(s):

Zikai Feng ◽

Yuanyuan Wu ◽

Mengxing Huang ◽

Di Wu

Keyword(s):

Reinforcement Learning ◽

Stackelberg Game ◽

Learning Algorithm ◽

Computational Cost ◽

Stackelberg Equilibrium ◽

Large State Space ◽

Comparable Performance ◽

Aerial Vehicle ◽

Multi Agent ◽

Policy Optimization

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).

Download Full-text

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/466 ◽

2021 ◽

Author(s):

Weinan Zhang ◽

Xihuai Wang ◽

Jian Shen ◽

Ming Zhou

Keyword(s):

Upper Bound ◽

Sample Complexity ◽

Asymptotic Performance ◽

Dynamics Model ◽

Model Based ◽

Environment Model ◽

Cooperative Tasks ◽

Multi Agent ◽

Policy Optimization ◽

Decentralized Model

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

Download Full-text