A Collaborative Control Method of Dual-Arm Robots Based on Deep Reinforcement Learning

Collaborative control of a dual-arm robot refers to collision avoidance and working together to accomplish a task. To prevent the collision of two arms, the control strategy of a robot arm needs to avoid competition and to cooperate with the other one during motion planning. In this paper, a dual-arm deep deterministic policy gradient (DADDPG) algorithm is proposed based on deep reinforcement learning of multi-agent cooperation. Firstly, the construction method of a replay buffer in a hindsight experience replay algorithm is introduced. The modeling and training method of the multi-agent deep deterministic policy gradient algorithm is explained. Secondly, a control strategy is assigned to each robotic arm. The arms share their observations and actions. The dual-arm robot is trained based on a mechanism of “rewarding cooperation and punishing competition”. Finally, the effectiveness of the algorithm is verified in the Reach, Push, and Pick up simulation environment built in this study. The experiment results show that the robot trained by the DADDPG algorithm can achieve cooperative tasks. The algorithm can make the robots explore the action space autonomously and reduce the level of competition with each other. The collaborative robots have better adaptability to coordination tasks.

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Distributed Imitation-Orientated Deep Reinforcement Learning Method for Optimal PEMFC Output Voltage Control

Frontiers in Energy Research ◽

10.3389/fenrg.2021.741101 ◽

2021 ◽

Vol 9 ◽

Author(s):

Jiawen Li ◽

Yaping Li ◽

Tao Yu

Keyword(s):

Reinforcement Learning ◽

Control Strategy ◽

Output Voltage ◽

Proton Exchange Membrane ◽

Control Method ◽

Voltage Control ◽

Proton Exchange ◽

Gradient Algorithm ◽

Voltage Control Strategy ◽

The Stability

In order to improve the stability of proton exchange membrane fuel cell (PEMFC) output voltage, a data-driven output voltage control strategy based on regulation of the duty cycle of the DC-DC converter is proposed in this paper. In detail, an imitation-oriented twin delay deep deterministic (IO-TD3) policy gradient algorithm which offers a more robust voltage control strategy is demonstrated. This proposed output voltage control method is a distributed deep reinforcement learning training framework, the design of which is guided by the pedagogic concept of imitation learning. The effectiveness of the proposed control strategy is experimentally demonstrated.

Download Full-text

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/4275623 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Junta Wu ◽

Huiyun Li

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Training Data ◽

Gradient Algorithm ◽

Distributed Data ◽

Robot Arm ◽

Training Time ◽

Continuous Space ◽

Policy Gradient ◽

And Performance

Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance.

Download Full-text

Position Control of Cable-Driven Robotic Soft Arm Based on Deep Reinforcement Learning

Information ◽

10.3390/info11060310 ◽

2020 ◽

Vol 11 (6) ◽

pp. 310

Author(s):

Qiuxuan Wu ◽

Yueqin Gu ◽

Yancheng Li ◽

Botao Zhang ◽

Sergey A. Chepinskiy ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Strategy ◽

Position Control ◽

Control Method ◽

Strategy Training ◽

Soft Robot ◽

Robot Arm ◽

Control Task ◽

Modeling And Control ◽

And Migration

The cable-driven soft arm is mostly made of soft material; it is difficult to control because of the material characteristics, so the traditional robot arm modeling and control methods cannot be directly applied to the soft robot arm. In this paper, we combine the data-driven modeling method with the reinforcement learning control method to realize the position control task of robotic soft arm, the method of control strategy based on deep Q learning. In order to solve slow convergence and unstable effect in the process of simulation and migration when deep reinforcement learning is applied to the actual robot control task, a control strategy learning method is designed, which is based on the experimental data, to establish a simulation environment for control strategy training, and then applied to the real environment. Finally, it is proved by experiment that the method can effectively complete the control of the soft robot arm, which has better robustness than the traditional method.

Download Full-text

Autonomous Bus Fleet Control Using Multiagent Reinforcement Learning

Journal of Advanced Transportation ◽

10.1155/2021/6654254 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Sung-Jung Wang ◽

S. K. Jason Chang

Keyword(s):

Reinforcement Learning ◽

Intelligent Agents ◽

Large Scale ◽

Gradient Algorithm ◽

Transport Systems ◽

Efficient Operation ◽

Fleet Size ◽

Agent Based ◽

Policy Gradient ◽

Multi Agent

Autonomous buses are becoming increasingly popular and have been widely developed in many countries. However, autonomous buses must learn to navigate the city efficiently to be integrated into public transport systems. Efficient operation of these buses can be achieved by intelligent agents through reinforcement learning. In this study, we investigate the autonomous bus fleet control problem, which appears noisy to the agents owing to random arrivals and incomplete observation of the environment. We propose a multi-agent reinforcement learning method combined with an advanced policy gradient algorithm for this large-scale dynamic optimization problem. An agent-based simulation platform was developed to model the dynamic system of a fixed stop/station loop route, autonomous bus fleet, and passengers. This platform was also applied to assess the performance of the proposed algorithm. The experimental results indicate that the developed algorithm outperforms other reinforcement learning methods in the multi-agent domain. The simulation results also reveal the effectiveness of our proposed algorithm in outperforming the existing scheduled bus system in terms of the bus fleet size and passenger wait times for bus routes with comparatively lesser number of passengers.

Download Full-text

Building a Connected Communication Network for UAV Clusters Using DE-MADDPG

Symmetry ◽

10.3390/sym13081537 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1537

Author(s):

Zixiong Zhu ◽

Nianhao Xie ◽

Kang Zong ◽

Lei Chen

Keyword(s):

Communication Network ◽

Reinforcement Learning ◽

Control Method ◽

Control Mechanisms ◽

Motion Information ◽

Reward Function ◽

Learning Framework ◽

Policy Gradient ◽

Multi Agent ◽

Virtual Leader

Clusters of unmanned aerial vehicles (UAVs) are often used to perform complex tasks. In such clusters, the reliability of the communication network connecting the UAVs is an essential factor in their collective efficiency. Due to the complex wireless environment, however, communication malfunctions within the cluster are likely during the flight of UAVs. In such cases, it is important to control the cluster and rebuild the connected network. The asymmetry of the cluster topology also increases the complexity of the control mechanisms. The traditional control methods based on cluster consistency often rely on the motion information of the neighboring UAVs. The motion information, however, may become unavailable because of the interrupted communications. UAV control algorithms based on deep reinforcement learning have achieved outstanding results in many fields. Here, we propose a cluster control method based on the Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) to rebuild a communication network for UAV clusters. The DE-MADDPG improves the framework of the traditional multi-agent deep deterministic policy gradient (MADDPG) algorithm by decomposing the reward function. We further introduce the reward reshaping function to facilitate the convergence of the algorithm in sparse reward environments. To address the instability of the state-space in the reinforcement learning framework, we also propose the notion of the virtual leader–follower model. Extensive simulations show that the success rate of the DE-MADDPG is higher than that of the MADDPG algorithm, confirming the effectiveness of the proposed method.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211037847 ◽

2021 ◽

pp. 014233122110378

Author(s):

Qingyuan Zheng ◽

Duo Wang ◽

Zhang Chen ◽

Yiyong Sun ◽

Bin Liang

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Control Method ◽

Learning Control ◽

Action Space ◽

Gradient Algorithm ◽

Single Track ◽

Wheeled Robots ◽

Reward Function ◽

Wheeled Robot

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

A Multi-Priority Control of Asymmetric Coordination for Redundant Dual-Arm Robot

International Journal of Humanoid Robotics ◽

10.1142/s0219843619500087 ◽

2019 ◽

Vol 16 (02) ◽

pp. 1950008

Author(s):

Fuhai Zhang ◽

Jiadi Qu ◽

He Liu ◽

Yili Fu

Keyword(s):

Secondary Task ◽

Control Method ◽

Null Space ◽

Main Task ◽

External Impact ◽

Dual Arm Robot ◽

End Effectors ◽

Dual Arm ◽

Arm Coordination ◽

Asymmetric Coordination

The paper develops a multi-priority control method of asymmetric coordination for a redundant dual-arm robot. A novel dual-arm coordination impedance is introduced to the multi-priority control, and then the performance of the object tracking and the redundant joints is improved by the regulation of the relative Cartesian errors between two arms. The control of asymmetric coordination is divided into the main task and the secondary task. The control of the main task can regulate the two end-effectors’ errors and the relative errors by building the model of spatial parallel spring and damping (SPSDM), which establishes the dual-arm coordination impedance relation in Cartesian space. The control of the secondary task optimizes the performance of the redundant joint impedance and joint limit avoidance in null space. Finally, a typical asymmetric coordination experiment of peg-in-hole is carried out to verify the validity and feasibility of the proposed method. The results indicate that the proposed dual-arm coordination impedance can regulate the relative tracking errors between two objects directly, and in the context of the external impact force applied to the two end-effectors, the peg-in-hole dual-arm task can be achieved successfully.

Download Full-text