Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot

Deep reinforcement learning, the fastest growing technique, to solve real-world complex problems by creatinga simple mathematical framework. It includes an agent, action, environment, and a reward. An agent will interactwith the environment, takes an optimal action aiming to maximize the total reward. This paper proposesthe compelling technique of deep deterministic policy gradient for solving the complex continuous actionspace of 3-wheeled omnidirectional mobile robots. Three-wheeled Omnidirectional mobile robots tracking isa difficult task because of the orientation of the wheels which makes it rotate around its own axis rather tofollow the trajectory. A deep deterministic policy gradient (DDPG) algorithm has been designed to train in environmentswith continuous action space to follow the trajectory by training the neural networks defined forthe policy and value function to maximize the reward function defined for the tracking of the trajectory. DDPGagent environment is created in the Reinforcement learning toolbox in MATLAB 2019 while for Actor and criticnetwork design deep neural network designer is used. Results are shown to illustrate the effectiveness of thetechnique with a convergence of error approximately to zero.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

Trajectory Tracking Control for Mobile Robots Using Reinforcement Learning and PID

Iranian Journal of Science and Technology Transactions of Electrical Engineering ◽

10.1007/s40998-019-00286-4 ◽

2019 ◽

Vol 44 (3) ◽

pp. 1059-1068 ◽

Cited By ~ 1

Author(s):

Shuti Wang ◽

Xunhe Yin ◽

Peng Li ◽

Mingzhi Zhang ◽

Xin Wang

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Trajectory Tracking ◽

Tracking Control ◽

Trajectory Tracking Control

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Correction to: Trajectory Tracking Control for Mobile Robots Using Reinforcement Learning and PID

Iranian Journal of Science and Technology Transactions of Electrical Engineering ◽

10.1007/s40998-020-00311-x ◽

2020 ◽

Vol 44 (2) ◽

pp. 1031-1031

Author(s):

Shuti Wang ◽

Xunhe Yin ◽

Peng Li ◽

Mingzhi Zhang ◽

Xin Wang

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Trajectory Tracking ◽

Tracking Control ◽

Trajectory Tracking Control

Download Full-text

Toward Diverse Text Generation with Inverse Reinforcement Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/606 ◽

2018 ◽

Cited By ~ 5

Author(s):

Zhan Shi ◽

Xinchi Chen ◽

Xipeng Qiu ◽

Xuanjing Huang

Keyword(s):

Reinforcement Learning ◽

Generative Models ◽

Training Data ◽

Great Success ◽

Text Generation ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Total Reward ◽

Policy Gradient ◽

Adversarial Models

Text generation is a crucial task in NLP. Recently, several adversarial generative models have been proposed to improve the exposure bias problem in text generation. Though these models gain great success, they still suffer from the problems of reward sparsity and mode collapse. In order to address these two problems, in this paper, we employ inverse reinforcement learning (IRL) for text generation. Specifically, the IRL framework learns a reward function on training data, and then an optimal policy to maximum the expected total reward. Similar to the adversarial models, the reward and policy function in IRL are optimized alternately. Our method has two advantages: (1) the reward function can produce more dense reward signals. (2) the generation policy, trained by ``entropy regularized'' policy gradient, encourages to generate more diversified texts. Experiment results demonstrate that our proposed method can generate higher quality texts than the previous methods.

Download Full-text

On the use of the policy gradient and Hessian in inverse reinforcement learning

Intelligenza Artificiale ◽

10.3233/ia-180011 ◽

2020 ◽

Vol 14 (1) ◽

pp. 117-150

Author(s):

Alberto Maria Metelli ◽

Matteo Pirotta ◽

Marcello Restelli

Keyword(s):

Reinforcement Learning ◽

Sequential Decision ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Continuous Domains ◽

Learning Policies ◽

Finite Domains

Reinforcement Learning (RL) is an effective approach to solve sequential decision making problems when the environment is equipped with a reward function to evaluate the agent’s actions. However, there are several domains in which a reward function is not available and difficult to estimate. When samples of expert agents are available, Inverse Reinforcement Learning (IRL) allows recovering a reward function that explains the demonstrated behavior. Most of the classic IRL methods, in addition to expert’s demonstrations, require sampling the environment to evaluate each reward function, that, in turn, is built starting from a set of engineered features. This paper is about a novel model-free IRL approach that does not require to specify a function space where to search for the expert’s reward function. Leveraging on the fact that the policy gradient needs to be zero for an optimal policy, the algorithm generates an approximation space for the reward function, in which a reward is singled out employing a second-order criterion. After introducing our approach for finite domains, we extend it to continuous ones. The empirical results, on both finite and continuous domains, show that the reward function recovered by our algorithm allows learning policies that outperform those obtained with the true reward function, in terms of learning speed.

Download Full-text

Quadrotor Motion Control Using Deep Reinforcement Learning

Journal of Unmanned Vehicle Systems ◽

10.1139/juvs-2021-0010 ◽

2021 ◽

Author(s):

Zifei Jiang ◽

Alan F. Lynch

Keyword(s):

Reinforcement Learning ◽

Neural Nets ◽

Neural Net ◽

Reward Function ◽

Model Free ◽

Policy Gradient ◽

Aerial Vehicle ◽

Stochastic Controller ◽

Policy Optimization ◽

Gradient Approach

We present a deep neural net-based controller trained by a model-free reinforcement learning (RL) algorithm to achieve hover stabilization for a quadrotor unmanned aerial vehicle (UAV). With RL, two neural nets are trained. One neural net is used as a stochastic controller which gives the distribution of control inputs. The other maps the UAV state to a scalar which estimates the reward of the controller. A proximal policy optimization (PPO) method, which is an actor-critic policy gradient approach, is used to train the neural nets. Simulation results show that the trained controller achieves a comparable level of performance to a manually-tuned PID controller, despite not depending on any model information. The paper considers different choices of reward function and their influence on controller performance.

Download Full-text

ANALYSIS OF HIERARCHICAL LEARNING WITH REINFORCEMENT FOR THE IMPLEMENTATION OF BEHAVIORAL STRATEGIES OF INTELLIGENT AGENTS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2020.09.pp.035-045 ◽

2020 ◽

pp. 35-45

Author(s):

Yu. V. Dubenko ◽

Ye. Ye. Dyshkant ◽

D. A. Gura

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Intelligent Agents ◽

Visual Information ◽

Transport Infrastructure ◽

Behavioral Strategies ◽

Clustering Methods ◽

Reward Function ◽

Lack Of Information ◽

Technical Vision

The paper discusses the task of evaluating the possibility of using robotic systems (intelligent agents) as a way to solve a problem of monitoring complex infrastructure objects, such as buildings, structures, bridges, roads and other transport infrastructure objects. Methods and algorithms for implementing behavioral strategies of robots, in particular, search algorithms based on decision trees, are examined. The emphasis is placed on the importance of forming the ability of robots to self-learn through reinforcement learning associated with modeling the behavior of living creatures when interacting with unknown elements of the environment. The Q-learning method is considered as one of the types of reinforcement learning that introduces the concept of action value, as well as the approach of “hierarchical reinforcement learning” and its varieties “Options Framework”, “Feudal”, “MaxQ”. The problems of determining such parameters as the value and reward function of agents (mobile robots), as well as the mandatory presence of a subsystem of technical vision, are identified in the segmentation of macro actions. Thus, the implementation of the task of segmentation of macro-actions requires improving the methodological base by applying intelligent algorithms and methods, including deep clustering methods. Improving the effectiveness of hierarchical training with reinforcement when mobile robots operate in conditions of lack of information about the monitoring object is possible by transmitting visual information in a variety of states, which will also increase the portability of experience between them in the future when performing tasks on various objects.

Download Full-text