scholarly journals Leader-Following Multi-Agent Coordination Control Accompanied With Hierarchical Q(λ)-Learning for Pursuit

2021 ◽  
Vol 2 ◽  
Author(s):  
Zhe-Yang Zhu ◽  
Cheng-Lin Liu

In this paper, we investigate a pursuit problem with multi-pursuer and single evader in a two-dimensional grid space with obstacles. Taking a different approach to previous studies, this paper aims to address a pursuit problem in which only some pursuers can directly access the evader’s position. It also proposes using a hierarchical Q(λ)-learning with improved reward, with simulation results indicating that the proposed method outperforms Q-learning.

2017 ◽  
Vol 40 (5) ◽  
pp. 1529-1537 ◽  
Author(s):  
Muhammad Iqbal ◽  
John Leth ◽  
Trung D Ngo

In this paper, we solve the leader-following consensus problem using a hierarchical nearly cyclic pursuit (HNCP) strategy for multi-agent systems. We extend the nearly cyclic pursuit strategy and the two-layer HNCP to the generalized L-layer HNCP that enables the agents to rendezvous at a point dictated by a beacon. We prove that the convergence rate of the generalized L-layer HNCP for the leader-following consensus problem is faster than that of the nearly cyclic pursuit. Simulation results demonstrate the effectiveness of the proposed method.


Author(s):  
Mohamed A. Aref ◽  
Sudharman K. Jayaweera

This article presents a design of a wideband autonomous cognitive radio (WACR) for anti-jamming and interference-avoidance. The proposed system model allows multiple WACRs to simultaneously operate over the same spectrum range producing a multi-agent environment. The objective of each radio is to predict and evade a dynamic jammer signal as well as avoiding transmissions of other WACRs. The proposed cognitive framework is made of two operations: sensing and transmission. Each operation is helped by its own learning algorithm based on Q-learning, but both will be experiencing the same RF environment. The simulation results indicate that the proposed cognitive anti-jamming technique has low computational complexity and significantly outperforms non-cognitive sub-band selection policy while being sufficiently robust against the impact of sensing errors.


2020 ◽  
Vol 73 (4) ◽  
pp. 874-891
Author(s):  
Wenjie Zhao ◽  
Zhou Fang ◽  
Zuqiang Yang

A distributed four-dimensional (4D) trajectory generation method based on multi-agent Q learning is presented for multiple unmanned aerial vehicles (UAVs). Based on this method, each vehicle can intelligently generate collision-free 4D trajectories for time-constrained cooperative flight tasks. For a single UAV, the 4D trajectory is generated by the bionic improved tau gravity guidance strategy, which can synchronously guide the position and velocity to the desired values at the arrival time. Furthermore, to optimise trajectory parameters, the continuous state and action wire fitting neural network Q (WFNNQ) learning method is applied. For multi-UAV applications, the learning is organised by the win or learn fast-policy hill climbing (WoLF-PHC) algorithm. Dynamic simulation results show that the proposed method can efficiently provide 4D trajectories for the multi-UAV system in challenging simultaneous arrival tasks, and the fully trained method can be used in similar trajectory generation scenarios.


Transport ◽  
2014 ◽  
Vol 29 (3) ◽  
pp. 296-306 ◽  
Author(s):  
Min Yang ◽  
Dounan Tang ◽  
Haoyang Ding ◽  
Wei Wang ◽  
Tianming Luo ◽  
...  

Staggered working hours has the potential to alleviate excessive demands on urban transport networks during the morning and afternoon peak hours and influence the travel behavior of individuals by affecting their activity schedules and reducing their commuting times. This study proposes a multi-agent-based Q-learning algorithm for evaluating the influence of staggered work hours by simulating travelers’ time and location choices in their activity patterns. Interactions among multiple travelers were also considered. Various types of agents were identified based on real activity–travel data for a mid-sized city in China. Reward functions based on time and location information were constructed using Origin–Destination (OD) survey data to simulate individuals’ temporal and spatial choices simultaneously. Interactions among individuals were then described by introducing a road impedance function to formulate a dynamic environment in which one traveler’s decisions influence the decisions of other travelers. Lastly, by applying the Q-learning algorithm, individuals’ activity–travel patterns under staggered working hours were simulated. Based on the simulation results, the effects of staggered working hours were evaluated on both a macroscopic level, at which the space–time distribution of the traffic volume in the network was determined, and a microscopic level, at which the timing of individuals’ leisure activities and their daily household commuting costs were determined. Based on the simulation results and experimental tests, an optimal scheme for staggering working hours was developed.


2021 ◽  
Vol 40 (1) ◽  
pp. 205-219
Author(s):  
Yanbin Zheng ◽  
Wenxin Fan ◽  
Mengyun Han

The multi-agent collaborative hunting problem is a typical problem in multi-agent coordination and collaboration research. Aiming at the multi-agent hunting problem with learning ability, a collaborative hunt method based on game theory and Q-learning is proposed. Firstly, a cooperative hunting team is established and a game model of cooperative hunting is built. Secondly, through the learning of the escaper’s strategy choice, the trajectory of the escaper’s limited T-step cumulative reward is established, and the trajectory is adjusted to the hunter’s strategy set. Finally, the Nash equilibrium solution is obtained by solving the cooperative hunt game, and each hunter executes the equilibrium strategy to complete the hunt task. C# simulation experiment shows that under the same conditions, this method can effectively solve the hunting problem of a single runaway with learning ability in the obstacle environment, and the comparative analysis of experimental data shows that the efficiency of this method is better than other methods.


Sign in / Sign up

Export Citation Format

Share Document