scholarly journals Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot

2021 ◽  
Vol 50 (3) ◽  
pp. 507-521
Author(s):  
Atif Mehmood ◽  
Inam ul Hasan Shaikh ◽  
Ahsan Ali

Deep reinforcement learning, the fastest growing technique, to solve real-world complex problems by creatinga simple mathematical framework. It includes an agent, action, environment, and a reward. An agent will interactwith the environment, takes an optimal action aiming to maximize the total reward. This paper proposesthe compelling technique of deep deterministic policy gradient for solving the complex continuous actionspace of 3-wheeled omnidirectional mobile robots. Three-wheeled Omnidirectional mobile robots tracking isa difficult task because of the orientation of the wheels which makes it rotate around its own axis rather tofollow the trajectory. A deep deterministic policy gradient (DDPG) algorithm has been designed to train in environmentswith continuous action space to follow the trajectory by training the neural networks defined forthe policy and value function to maximize the reward function defined for the tracking of the trajectory. DDPGagent environment is created in the Reinforcement learning toolbox in MATLAB 2019 while for Actor and criticnetwork design deep neural network designer is used. Results are shown to illustrate the effectiveness of thetechnique with a convergence of error approximately to zero.

Author(s):  
Feng Pan ◽  
Hong Bao

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.


Author(s):  
Buvanesh Pandian V

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.


Author(s):  
Shihui Li ◽  
Yi Wu ◽  
Xinyue Cui ◽  
Honghua Dong ◽  
Fei Fang ◽  
...  

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.


Author(s):  
Zhan Shi ◽  
Xinchi Chen ◽  
Xipeng Qiu ◽  
Xuanjing Huang

Text generation is a crucial task in NLP. Recently, several adversarial generative models have been proposed to improve the exposure bias problem in text generation. Though these models gain great success, they still suffer from the problems of reward sparsity and mode collapse. In order to address these two problems, in this paper, we employ inverse reinforcement learning (IRL) for text generation. Specifically, the IRL framework learns a reward function on training data, and then an optimal policy to maximum the expected total reward. Similar to the adversarial models, the reward and policy function in IRL are optimized alternately. Our method has two advantages: (1) the reward function can produce more dense reward signals. (2) the generation policy, trained by ``entropy regularized'' policy gradient, encourages to generate more diversified texts. Experiment results demonstrate that our proposed method can generate higher quality texts than the previous methods.


2020 ◽  
Vol 14 (1) ◽  
pp. 117-150
Author(s):  
Alberto Maria Metelli ◽  
Matteo Pirotta ◽  
Marcello Restelli

Reinforcement Learning (RL) is an effective approach to solve sequential decision making problems when the environment is equipped with a reward function to evaluate the agent’s actions. However, there are several domains in which a reward function is not available and difficult to estimate. When samples of expert agents are available, Inverse Reinforcement Learning (IRL) allows recovering a reward function that explains the demonstrated behavior. Most of the classic IRL methods, in addition to expert’s demonstrations, require sampling the environment to evaluate each reward function, that, in turn, is built starting from a set of engineered features. This paper is about a novel model-free IRL approach that does not require to specify a function space where to search for the expert’s reward function. Leveraging on the fact that the policy gradient needs to be zero for an optimal policy, the algorithm generates an approximation space for the reward function, in which a reward is singled out employing a second-order criterion. After introducing our approach for finite domains, we extend it to continuous ones. The empirical results, on both finite and continuous domains, show that the reward function recovered by our algorithm allows learning policies that outperform those obtained with the true reward function, in terms of learning speed.


Author(s):  
Zifei Jiang ◽  
Alan F. Lynch

We present a deep neural net-based controller trained by a model-free reinforcement learning (RL) algorithm to achieve hover stabilization for a quadrotor unmanned aerial vehicle (UAV). With RL, two neural nets are trained. One neural net is used as a stochastic controller which gives the distribution of control inputs. The other maps the UAV state to a scalar which estimates the reward of the controller. A proximal policy optimization (PPO) method, which is an actor-critic policy gradient approach, is used to train the neural nets. Simulation results show that the trained controller achieves a comparable level of performance to a manually-tuned PID controller, despite not depending on any model information. The paper considers different choices of reward function and their influence on controller performance.


Author(s):  
Yu. V. Dubenko ◽  
Ye. Ye. Dyshkant ◽  
D. A. Gura

The paper discusses the task of evaluating the possibility of using robotic systems (intelligent agents) as a way to solve a problem of monitoring complex infrastructure objects, such as buildings, structures, bridges, roads and other transport infrastructure objects. Methods and algorithms for implementing behavioral strategies of robots, in particular, search algorithms based on decision trees, are examined. The emphasis is placed on the importance of forming the ability of robots to self-learn through reinforcement learning associated with modeling the behavior of living creatures when interacting with unknown elements of the environment. The Q-learning method is considered as one of the types of reinforcement learning that introduces the concept of action value, as well as the approach of “hierarchical reinforcement learning” and its varieties “Options Framework”, “Feudal”, “MaxQ”. The problems of determining such parameters as the value and reward function of agents (mobile robots), as well as the mandatory presence of a subsystem of technical vision, are identified in the segmentation of macro actions. Thus, the implementation of the task of segmentation of macro-actions requires improving the methodological base by applying intelligent algorithms and methods, including deep clustering methods. Improving the effectiveness of hierarchical training with reinforcement when mobile robots operate in conditions of lack of information about the monitoring object is possible by transmitting visual information in a variety of states, which will also increase the portability of experience between them in the future when performing tasks on various objects.


Sign in / Sign up

Export Citation Format

Share Document