Graph neural network and reinforcement learning for multi‐agent cooperative control of connected autonomous vehicles

Abstract We describe a simulation environment that enables the design and testing of control policies for off-road mobility of autonomous agents. The environment is demonstrated in conjunction with the training and assessment of a reinforcement learning policy that uses sensor fusion and inter-agent communication to enable the movement of mixed convoys of human-driven and autonomous vehicles. Policies learned on rigid terrain are shown to transfer to hard (silt-like) and soft (snow-like) deformable terrains. The environment described performs the following: multi-vehicle multibody dynamics co-simulation in a time/space-coherent infrastructure that relies on the Message Passing Interface standard for low-latency parallel computing; sensor simulation (e.g., camera, GPU, IMU); simulation of a virtual world that can be altered by the agents present in the simulation; training that uses reinforcement learning to 'teach' the autonomous vehicles to drive in an obstacle-riddled course. The software stack described is open source.

Download Full-text

Multi-agent Reinforcement Learning for Autonomous Vehicles in Wireless Sensor Networks

10.36227/techrxiv.14778252.v1 ◽

2021 ◽

Author(s):

Haitham Afifi

Keyword(s):

Wireless Sensor Networks ◽

Reinforcement Learning ◽

Sensor Networks ◽

Real World ◽

Autonomous Vehicles ◽

Wireless Sensor ◽

Acoustic Sensor Networks ◽

Markov Decision ◽

Multi Agent

<div>We develop a Deep Reinforcement Learning (DeepRL) based multi-agent algorithm to efficiently control</div><div>autonomous vehicles in the context of Wireless Sensor Networks (WSNs). In contrast to other applications, WSNs</div><div>have two metrics for performance evaluation. First, quality of information (QoI) which is used to measure the</div><div>quality of sensed data. Second, quality of service (QoS) which is used to measure the network’s performance. As</div><div>a use case, we consider wireless acoustic sensor networks; a group of speakers move inside a room and there</div><div>are microphones installed on vehicles for streaming the audio data. We formulate an appropriate Markov Decision</div><div>Process (MDP) and present, besides a centralized solution, a multi-agent Deep Q-learning solution to control the vehicles. We compare the proposed solutions to a naive heuristic and two different real-world implementations: microphones being hold or preinstalled. We show using simulations that the performance of autonomous vehicles in terms of QoI and QoS is better than the real-world implementation and the proposed heuristic. Additionally, we provide theoretical analysis of the performance with respect to WSNs dynamics, such as speed, rooms dimensions and speaker’s talking time.</div>

Download Full-text

Training and inferring neural network function with multi-agent reinforcement learning

10.1101/598086 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew Chalk ◽

Gasper Tkacik ◽

Olivier Marre

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Recurrent Network ◽

Neural Systems ◽

Neural Recordings ◽

Reward Function ◽

Network Function ◽

Structure Changes ◽

Theoretical Predictions ◽

Multi Agent

AbstractA central goal in systems neuroscience is to understand the functions performed by neural circuits. Previous top-down models addressed this question by comparing the behaviour of an ideal model circuit, optimised to perform a given function, with neural recordings. However, this requires guessing in advance what function is being performed, which may not be possible for many neural systems. To address this, we propose a new framework for optimising a recurrent network using multi-agent reinforcement learning (RL). In this framework, a reward function quantifies how desirable each state of the network is for performing a given function. Each neuron is treated as an ‘agent’, which optimises its responses so as to drive the network towards rewarded states. Three applications follow from this. First, one can use multi-agent RL algorithms to optimise a recurrent neural network to perform diverse functions (e.g. efficient sensory coding or motor control). Second, one could use inverse RL to infer the function of a recorded neural network from data. Third, the theory predicts how neural networks should adapt their dynamics to maintain the same function when the external environment or network structure changes. This could lead to theoretical predictions about how neural network dynamics adapt to deal with cell death and/or varying sensory stimulus statistics.

Download Full-text

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

The Aeronautical Journal ◽

10.1017/aer.2021.112 ◽

2022 ◽

pp. 1-20

Author(s):

D. Xu ◽

G. Chen

Keyword(s):

Reinforcement Learning ◽

Safety Factor ◽

Cooperative Control ◽

Learning Framework ◽

Control Stage ◽

Autonomous Planning ◽

Operational Safety ◽

Policy Gradient ◽

Multi Agent ◽

Reward Mechanism

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.

Download Full-text

Simulation of football sport PID controller based on BP neural network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189570 ◽

2020 ◽

pp. 1-13

Author(s):

L.V. Qiangguo

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Bp Neural Network ◽

Pid Controller ◽

Single Agent ◽

Control Effect ◽

Learning Problem ◽

Learning Space ◽

Multi Agent ◽

Multi Agents

Multi-agent reinforcement learning in football simulation can be extended by single-agent reinforcement learning. However, compared with single agents, the learning space of multi-agents will increase dramatically with the increase in the number of agents, so the learning difficulty will also increase. Based on BP neural network as the model structure foundation, this research combines PID controller to control the process of model operation. In order to improve the calculation accuracy to improve the control effect, the prediction output is obtained through the prediction model instead of the actual measured value. In addition, with the football robot as the object, this research studies the multi-agent reinforcement learning problem and its application in the football robot. The content includes single-agent reinforcement learning, multi-agent system reinforcement learning, and ball hunting, role assignment, and action selection in football robot decision strategies based on this. The simulation results show that the method proposed in this paper has certain effects.

Download Full-text