Autonomous Bus Fleet Control Using Multiagent Reinforcement Learning

Autonomous buses are becoming increasingly popular and have been widely developed in many countries. However, autonomous buses must learn to navigate the city efficiently to be integrated into public transport systems. Efficient operation of these buses can be achieved by intelligent agents through reinforcement learning. In this study, we investigate the autonomous bus fleet control problem, which appears noisy to the agents owing to random arrivals and incomplete observation of the environment. We propose a multi-agent reinforcement learning method combined with an advanced policy gradient algorithm for this large-scale dynamic optimization problem. An agent-based simulation platform was developed to model the dynamic system of a fixed stop/station loop route, autonomous bus fleet, and passengers. This platform was also applied to assess the performance of the proposed algorithm. The experimental results indicate that the developed algorithm outperforms other reinforcement learning methods in the multi-agent domain. The simulation results also reveal the effectiveness of our proposed algorithm in outperforming the existing scheduled bus system in terms of the bus fleet size and passenger wait times for bus routes with comparatively lesser number of passengers.

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning

Remote Sensing ◽

10.3390/rs13122377 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2377

Author(s):

Yixin Huang ◽

Zhongcheng Mu ◽

Shufan Wu ◽

Benjie Cui ◽

Yuxiao Duan

Keyword(s):

Reinforcement Learning ◽

Task Scheduling ◽

Large Scale ◽

Gradient Algorithm ◽

Experimental Simulation ◽

Scheduling Problem ◽

Policy Gradient ◽

Scheduling Method ◽

Satellite Scheduling ◽

Metaheuristic Optimization Algorithms

Earth observation satellite task scheduling research plays a key role in space-based remote sensing services. An effective task scheduling strategy can maximize the utilization of satellite resources and obtain larger objective observation profits. In this paper, inspired by the success of deep reinforcement learning in optimization domains, the deep deterministic policy gradient algorithm is adopted to solve a time-continuous satellite task scheduling problem. Moreover, an improved graph-based minimum clique partition algorithm is proposed for preprocessing in the task clustering phase by considering the maximum task priority and the minimum observation slewing angle under constraint conditions. Experimental simulation results demonstrate that the deep reinforcement learning-based task scheduling method is feasible and performs much better than traditional metaheuristic optimization algorithms, especially in large-scale problems.

Download Full-text

A Deep Reinforcement Learning Algorithm Based on Tetanic Stimulation and Amnesic Mechanisms for Continuous Control of Multi-DOF Manipulator

Actuators ◽

10.3390/act10100254 ◽

2021 ◽

Vol 10 (10) ◽

pp. 254

Author(s):

Yangyang Hou ◽

Huajie Hong ◽

Dasheng Xu ◽

Zhe Zeng ◽

Yaping Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Algorithm ◽

Research Area ◽

Gradient Algorithm ◽

Data Sets ◽

Continuous Control ◽

Tetanic Stimulation ◽

Policy Gradient ◽

Active Research

Deep Reinforcement Learning (DRL) has been an active research area in view of its capability in solving large-scale control problems. Until presently, many algorithms have been developed, such as Deep Deterministic Policy Gradient (DDPG), Twin-Delayed Deep Deterministic Policy Gradient (TD3), and so on. However, the converging achievement of DRL often requires extensive collected data sets and training episodes, which is data inefficient and computing resource consuming. Motivated by the above problem, in this paper, we propose a Twin-Delayed Deep Deterministic Policy Gradient algorithm with a Rebirth Mechanism, Tetanic Stimulation and Amnesic Mechanisms (ATRTD3), for continuous control of a multi-DOF manipulator. In the training process of the proposed algorithm, the weighting parameters of the neural network are learned using Tetanic stimulation and Amnesia mechanism. The main contribution of this paper is that we show a biomimetic view to speed up the converging process by biochemical reactions generated by neurons in the biological brain during memory and forgetting. The effectiveness of the proposed algorithm is validated by a simulation example including the comparisons with previously developed DRL algorithms. The results indicate that our approach shows performance improvement in terms of convergence speed and precision.

Download Full-text

A Collaborative Control Method of Dual-Arm Robots Based on Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app11041816 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1816

Author(s):

Luyu Liu ◽

Qianyuan Liu ◽

Yong Song ◽

Bao Pang ◽

Xianfeng Yuan ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Strategy ◽

Control Method ◽

Gradient Algorithm ◽

Robot Arm ◽

Collaborative Control ◽

Policy Gradient ◽

Multi Agent ◽

Dual Arm Robot ◽

Dual Arm

Collaborative control of a dual-arm robot refers to collision avoidance and working together to accomplish a task. To prevent the collision of two arms, the control strategy of a robot arm needs to avoid competition and to cooperate with the other one during motion planning. In this paper, a dual-arm deep deterministic policy gradient (DADDPG) algorithm is proposed based on deep reinforcement learning of multi-agent cooperation. Firstly, the construction method of a replay buffer in a hindsight experience replay algorithm is introduced. The modeling and training method of the multi-agent deep deterministic policy gradient algorithm is explained. Secondly, a control strategy is assigned to each robotic arm. The arms share their observations and actions. The dual-arm robot is trained based on a mechanism of “rewarding cooperation and punishing competition”. Finally, the effectiveness of the algorithm is verified in the Reach, Push, and Pick up simulation environment built in this study. The experiment results show that the robot trained by the DADDPG algorithm can achieve cooperative tasks. The algorithm can make the robots explore the action space autonomously and reduce the level of competition with each other. The collaborative robots have better adaptability to coordination tasks.

Download Full-text

Dynamic Dispatching for Large-Scale Heterogeneous Fleet via Multi-agent Deep Reinforcement Learning

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378191 ◽

2020 ◽

Author(s):

Chi Zhang ◽

Philip Odonkor ◽

Shuai Zheng ◽

Hamed Khorasgani ◽

Susumu Serita ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Heterogeneous Fleet ◽

Multi Agent ◽

Dynamic Dispatching

Download Full-text

Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2021.103046 ◽

2021 ◽

Vol 125 ◽

pp. 103046

Author(s):

Tong Wang ◽

Jiahua Cao ◽

Azhar Hussain

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control ◽

Cooperative Group ◽

Adaptive Traffic Signal Control ◽

Multi Agent

Download Full-text

A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System

Symmetry ◽

10.3390/sym12040631 ◽

2020 ◽

Vol 12 (4) ◽

pp. 631

Author(s):

Chunyang Hu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Knowledge Transfer ◽

Large Scale ◽

Effective Control ◽

Small Scale ◽

Learning Agent ◽

Multi Agent ◽

Transfer Method ◽

Parameter Sharing

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

Trust in multi-agent systems

The Knowledge Engineering Review ◽

10.1017/s0269888904000116 ◽

2004 ◽

Vol 19 (1) ◽

pp. 1-25 ◽

Cited By ~ 343

Author(s):

SARVAPALI D. RAMCHURN ◽

DONG HUYNH ◽

NICHOLAS R. JENNINGS

Keyword(s):

Large Scale ◽

State Of The Art ◽

Comprehensive Treatment ◽

Multi Agent Systems ◽

Agent Based ◽

Agent Systems ◽

The Core ◽

Open Distributed Systems ◽

Multi Agent

Trust is a fundamental concern in large-scale open distributed systems. It lies at the core of all interactions between the entities that have to operate in such uncertain and constantly changing environments. Given this complexity, these components, and the ensuing system, are increasingly being conceptualised, designed, and built using agent-based techniques and, to this end, this paper examines the specific role of trust in multi-agent systems. In particular, we survey the state of the art and provide an account of the main directions along which research efforts are being focused. In so doing, we critically evaluate the relative strengths and weaknesses of the main models that have been proposed and show how, fundamentally, they all seek to minimise the uncertainty in interactions. Finally, we outline the areas that require further research in order to develop a comprehensive treatment of trust in complex computational settings.

Download Full-text