scholarly journals A cooperative multi-agent deep reinforcement learning framework for real-time residential load scheduling

Author(s):  
Chi Zhang ◽  
Sanmukh R. Kuppannagari ◽  
Chuanxiu Xiong ◽  
Rajgopal Kannan ◽  
Viktor K. Prasanna
2021 ◽  
pp. 100162
Author(s):  
Guanghui Wen ◽  
Junjie Fu ◽  
Pengcheng Dai ◽  
Jialing Zhou

2021 ◽  
pp. 115707
Author(s):  
Weigui Jair Zhou ◽  
Budhitama Subagdja ◽  
Ah-Hwee Tan ◽  
Darren Wee-Sze Ong

Author(s):  
Victor Gallego ◽  
Roi Naveiro ◽  
David Rios Insua

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.


2022 ◽  
pp. 1-20
Author(s):  
D. Xu ◽  
G. Chen

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.


Sign in / Sign up

Export Citation Format

Share Document