Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning

Earth observation satellite task scheduling research plays a key role in space-based remote sensing services. An effective task scheduling strategy can maximize the utilization of satellite resources and obtain larger objective observation profits. In this paper, inspired by the success of deep reinforcement learning in optimization domains, the deep deterministic policy gradient algorithm is adopted to solve a time-continuous satellite task scheduling problem. Moreover, an improved graph-based minimum clique partition algorithm is proposed for preprocessing in the task clustering phase by considering the maximum task priority and the minimum observation slewing angle under constraint conditions. Experimental simulation results demonstrate that the deep reinforcement learning-based task scheduling method is feasible and performs much better than traditional metaheuristic optimization algorithms, especially in large-scale problems.

Download Full-text

A Deep Reinforcement Learning Algorithm Based on Tetanic Stimulation and Amnesic Mechanisms for Continuous Control of Multi-DOF Manipulator

Actuators ◽

10.3390/act10100254 ◽

2021 ◽

Vol 10 (10) ◽

pp. 254

Author(s):

Yangyang Hou ◽

Huajie Hong ◽

Dasheng Xu ◽

Zhe Zeng ◽

Yaping Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Algorithm ◽

Research Area ◽

Gradient Algorithm ◽

Data Sets ◽

Continuous Control ◽

Tetanic Stimulation ◽

Policy Gradient ◽

Active Research

Deep Reinforcement Learning (DRL) has been an active research area in view of its capability in solving large-scale control problems. Until presently, many algorithms have been developed, such as Deep Deterministic Policy Gradient (DDPG), Twin-Delayed Deep Deterministic Policy Gradient (TD3), and so on. However, the converging achievement of DRL often requires extensive collected data sets and training episodes, which is data inefficient and computing resource consuming. Motivated by the above problem, in this paper, we propose a Twin-Delayed Deep Deterministic Policy Gradient algorithm with a Rebirth Mechanism, Tetanic Stimulation and Amnesic Mechanisms (ATRTD3), for continuous control of a multi-DOF manipulator. In the training process of the proposed algorithm, the weighting parameters of the neural network are learned using Tetanic stimulation and Amnesia mechanism. The main contribution of this paper is that we show a biomimetic view to speed up the converging process by biochemical reactions generated by neurons in the biological brain during memory and forgetting. The effectiveness of the proposed algorithm is validated by a simulation example including the comparisons with previously developed DRL algorithms. The results indicate that our approach shows performance improvement in terms of convergence speed and precision.

Download Full-text

Autonomous Bus Fleet Control Using Multiagent Reinforcement Learning

Journal of Advanced Transportation ◽

10.1155/2021/6654254 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Sung-Jung Wang ◽

S. K. Jason Chang

Keyword(s):

Reinforcement Learning ◽

Intelligent Agents ◽

Large Scale ◽

Gradient Algorithm ◽

Transport Systems ◽

Efficient Operation ◽

Fleet Size ◽

Agent Based ◽

Policy Gradient ◽

Multi Agent

Autonomous buses are becoming increasingly popular and have been widely developed in many countries. However, autonomous buses must learn to navigate the city efficiently to be integrated into public transport systems. Efficient operation of these buses can be achieved by intelligent agents through reinforcement learning. In this study, we investigate the autonomous bus fleet control problem, which appears noisy to the agents owing to random arrivals and incomplete observation of the environment. We propose a multi-agent reinforcement learning method combined with an advanced policy gradient algorithm for this large-scale dynamic optimization problem. An agent-based simulation platform was developed to model the dynamic system of a fixed stop/station loop route, autonomous bus fleet, and passengers. This platform was also applied to assess the performance of the proposed algorithm. The experimental results indicate that the developed algorithm outperforms other reinforcement learning methods in the multi-agent domain. The simulation results also reveal the effectiveness of our proposed algorithm in outperforming the existing scheduled bus system in terms of the bus fleet size and passenger wait times for bus routes with comparatively lesser number of passengers.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

A Multi-Graph Attributed Reinforcement Learning based Optimization Algorithm for Large-scale Hybrid Flow Shop Scheduling Problem

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining ◽

10.1145/3447548.3467135 ◽

2021 ◽

Author(s):

Fei Ni ◽

Jianye Hao ◽

Jiawen Lu ◽

Xialiang Tong ◽

Mingxuan Yuan ◽

...

Keyword(s):

Reinforcement Learning ◽

Optimization Algorithm ◽

Large Scale ◽

Flow Shop ◽

Flow Shop Scheduling ◽

Hybrid Flow Shop ◽

Scheduling Problem ◽

Shop Scheduling ◽

Hybrid Flow Shop Scheduling

Download Full-text

Energy-aware task scheduling optimization with deep reinforcement learning for large-scale heterogeneous systems

CCF Transactions on High Performance Computing ◽

10.1007/s42514-021-00083-8 ◽

2021 ◽

Author(s):

Jingbo Li ◽

Xingjun Zhang ◽

Zheng Wei ◽

Jia Wei ◽

Zeyu Ji

Keyword(s):

Reinforcement Learning ◽

Task Scheduling ◽

Large Scale ◽

Heterogeneous Systems ◽

Energy Aware ◽

Scheduling Optimization

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

A four-Phase Meta-Heuristic Algorithm for Solving Large Scale Instances of the Shift Minimization Personnel Task Scheduling Problem

2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) ◽

10.1109/synasc.2018.00067 ◽

2018 ◽

Author(s):

Sebastian Nechita ◽

Laura Diosan

Keyword(s):

Task Scheduling ◽

Heuristic Algorithm ◽

Large Scale ◽

Scheduling Problem

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Deep and reinforcement learning for automated task scheduling in large‐scale cloud computing systems

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5919 ◽

2020 ◽

Cited By ~ 2

Author(s):

Gaith Rjoub ◽

Jamal Bentahar ◽

Omar Abdel Wahab ◽

Ahmed Saleh Bataineh

Keyword(s):

Cloud Computing ◽

Reinforcement Learning ◽

Task Scheduling ◽

Large Scale ◽

Computing Systems

Download Full-text

Vibration Suppression for Large-Scale Flexible Structures Using Deep Reinforcement Learning Based on Cable-Driven Parallel Robots

Volume 7A: Dynamics, Vibration, and Control ◽

10.1115/imece2020-23259 ◽

2020 ◽

Author(s):

Haining Sun ◽

Xiaoqiang Tang ◽

Jinhao Wei

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Vibration Suppression ◽

External Disturbance ◽

Flexible Structures ◽

Parallel Robot ◽

Parallel Robots ◽

Gradient Algorithm ◽

Normal Operation ◽

Control Scheme

Abstract Specific satellites with ultra-long wings play a crucial role in many fields. However, external disturbance and self-rotation could result in undesired vibrations of flexible wings, which affects the normal operation of the satellites. In severe cases, the satellites will be damaged. Therefore, it is imperative to conduct vibration suppression for these flexible structures. Utilizing deep reinforcement learning (DRL), an active control scheme is presented in this paper to rapidly suppress the vibration of flexible structures with quite small controllable force based on a cable-driven parallel robot (CDPR). To verify the controller’s effectiveness, three groups of simulation with different initial disturbance are implemented. Besides, to enhance the contrast, a passive pre-tightening scheme is also tested. First, the dynamic model of the CDPR that is comprised of four cables and a flexible structure is established using the finite element method. Then, the dynamic behavior of the model under the controllable cable force is analyzed by Newmark-ß method. Furthermore, the agent of DRL is trained by the deep deterministic policy gradient algorithm (DDPG). Finally, the control scheme is conducted on Simulink environment to evaluate its performance, and the results are satisfactory, which validates the controller’s ability to suppress vibrations.

Download Full-text