Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).

Download Full-text

A multi-agent reinforcement learning algorithm based on Stackelberg game

2017 6th Data Driven Control and Learning Systems (DDCLS) ◽

10.1109/ddcls.2017.8068163 ◽

2017 ◽

Cited By ~ 5

Author(s):

Chi Cheng ◽

Zhangqing Zhu ◽

Bo Xin ◽

Chunlin Chen

Keyword(s):

Reinforcement Learning ◽

Stackelberg Game ◽

Learning Algorithm ◽

Multi Agent ◽

Reinforcement Learning Algorithm

Download Full-text

Bi-Level Actor-Critic for Multi-Agent Coordination

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6226 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7325-7332

Author(s):

Haifeng Zhang ◽

Weizhe Chen ◽

Zeren Huang ◽

Minne Li ◽

Yaodong Yang ◽

...

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Learning Algorithm ◽

Stackelberg Equilibrium ◽

Multi Agent Systems ◽

Matrix Games ◽

Markov Games ◽

The Arts ◽

Convergence Point ◽

Multi Agent

Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

A multi-agent reinforcement learning algorithm with fuzzy approximation for Distributed Stochastic Unit Commitment

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-182879 ◽

2019 ◽

Vol 37 (5) ◽

pp. 6613-6628

Author(s):

Ghorbani Farzaneh ◽

Afsharchi Mohsen ◽

Derhami Vali

Keyword(s):

Reinforcement Learning ◽

Unit Commitment ◽

Learning Algorithm ◽

Fuzzy Approximation ◽

Multi Agent ◽

Stochastic Unit Commitment ◽

Reinforcement Learning Algorithm

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

A Novel Distributed Multi-Agent Reinforcement Learning Algorithm Against Jamming Attacks

IEEE Communications Letters ◽

10.1109/lcomm.2021.3097290 ◽

2021 ◽

Vol 25 (10) ◽

pp. 3204-3208

Author(s):

Ibrahim Elleuch ◽

Ali Pourranjbar ◽

Georges Kaddoum

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Jamming Attacks ◽

Multi Agent ◽

Reinforcement Learning Algorithm

Download Full-text

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

Sensors ◽

10.3390/s20164546 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4546

Author(s):

Weiwei Zhao ◽

Hairong Chu ◽

Xikui Miao ◽

Lihong Guo ◽

Honghai Shen ◽

...

Keyword(s):

Reinforcement Learning ◽

Attitude Control ◽

Cooperative Control ◽

Learning Algorithm ◽

State Equations ◽

Learning Agent ◽

Environmental Adaptability ◽

Decentralized Execution ◽

Policy Optimization ◽

Multi Uav

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.

Download Full-text