scholarly journals Reinforcement learning control of robot manipulator

2021 ◽  
Vol 13 (3) ◽  
pp. 42-53
Author(s):  
Lucas Pereira Cotrim ◽  
Marcos Menon José ◽  
Eduardo Lobo Lustosa Cabral

Since the establishment of robotics in industrial applications, industrial robot programming involves therepetitive and time-consuming process of manually specifying a fixed trajectory, which results in machineidle time in terms of production and the necessity of completely reprogramming the robot for different tasks.The increasing number of robotics applications in unstructured environments requires not only intelligent butalso reactive controllers, due to the unpredictability of the environment and safety measures respectively. This paper presents a comparative analysis of two classes of Reinforcement Learning algorithms, value iteration (Q-Learning/DQN) and policy iteration (REINFORCE), applied to the discretized task of positioning a robotic manipulator in an obstacle-filled simulated environment, with no previous knowledge of the obstacles’ positions or of the robot arm dynamics. The agent’s performance and algorithm convergence are analyzed under different reward functions and on four increasingly complex test projects: 1-Degree of Freedom (DOF) robot, 2-DOF robot, Kuka KR16 Industrial robot, Kuka KR16 Industrial robot with random setpoint/obstacle placement. The DQN algorithm presented significantly better performance and reduced training time across all test projects and the third reward function generated better agents for both algorithms.

Author(s):  
Thomas Recchia ◽  
Jae Chung ◽  
Kishore Pochiraju

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.


2019 ◽  
pp. 32-38
Author(s):  
Sándor Rácz ◽  
Géza Szabó ◽  
József Pető

5G networks provide technology enablers targeting industrial applications. One key enabler is the Ultra Reliable Low Latency Communication (URLLC). This paper studies the performance impact of network delay on closed-loop control for industrial applications. We investigate the performance of the closed-loop control of an UR5 industrial robot arm assuming fix delay. The goal is to stress the system at the upper limit of the possible network delay. We prove that to achieve the maximum speed, URLLC is a must have.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5893
Author(s):  
Xin Yu ◽  
Yushan Sun ◽  
Xiangbin Wang ◽  
Guocheng Zhang

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.


Author(s):  
Yang Gao ◽  
Christian M. Meyer ◽  
Mohsen Mesgar ◽  
Iryna Gurevych

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.


Author(s):  
Alberto Camacho ◽  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.


Author(s):  
Zhenhai Gao ◽  
Xiangtong Yan ◽  
Fei Gao ◽  
Lei He

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.


2021 ◽  
Vol 13 (12) ◽  
pp. 168781402110670
Author(s):  
Xusheng Wang ◽  
Jiexin Xie ◽  
Shijie Guo ◽  
Yue Li ◽  
Pengfei Sun ◽  
...  

Deep reinforcement learning (DRL) provides a new solution for rehabilitation robot trajectory planning in the unstructured working environment, which can bring great convenience to patients. Previous researches mainly focused on optimization strategies but ignored the construction of reward functions, which leads to low efficiency. Different from traditional sparse reward function, this paper proposes two dense reward functions. First, azimuth reward function mainly provides a global guidance and reasonable constraints in the exploration. To further improve the efficiency, a process-oriented aspiration reward function is proposed, it is capable of accelerating the exploration process and avoid locally optimal solution. Experiments show that the proposed reward functions are able to accelerate the convergence rate by 38.4% on average with the mainstream DRL methods. The mean of convergence also increases by 9.5%, and the percentage of standard deviation decreases by 21.2%–23.3%. Results show that the proposed reward functions can significantly improve learning efficiency of DRL methods, and then provide practical possibility for automatic trajectory planning of rehabilitation robot.


2022 ◽  
Vol 73 ◽  
pp. 173-208
Author(s):  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.


Author(s):  
Zhen Yu ◽  
Yimin Feng ◽  
Lijun Liu

In general reinforcement learning tasks, the formulation of reward functions is a very important step in reinforcement learning. The reward function is not easy to formulate in a large number of systems. The network training effect is sensitive to the reward function, and different reward value functions will get different results. For a class of systems that meet specific conditions, the traditional reinforcement learning method is improved. A state quantity function is designed to replace the reward function, which is more efficient than the traditional reward function. At the same time, the predictive network link is designed so that the network can learn the value of the general state by using the special state. The overall structure of the network will be improved based on the Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the algorithm was successfully applied in the environment of FrozenLake, and achieved good performance. The experiment proves the effectiveness of the algorithm and realizes rewardless reinforcement learning in a class of systems.


Author(s):  
N. Botteghi ◽  
R. Schulte ◽  
B. Sirmacek ◽  
M. Poel ◽  
C. Brune

Abstract. Autonomously exploring and mapping is one of the open challenges of robotics and artificial intelligence. Especially when the environments are unknown, choosing the optimal navigation directive is not straightforward. In this paper, we propose a reinforcement learning framework for navigating, exploring, and mapping unknown environments. The reinforcement learning agent is in charge of selecting the commands for steering the mobile robot, while a SLAM algorithm estimates the robot pose and maps the environments. The agent, to select optimal actions, is trained to be curious about the world. This concept translates into the introduction of a curiosity-driven reward function that encourages the agent to steer the mobile robot towards unknown and unseen areas of the world and the map. We test our approach in explorations challenges in different indoor environments. The agent trained with the proposed reward function outperforms the agents trained with reward functions commonly used in the literature for solving such tasks.


Sign in / Sign up

Export Citation Format

Share Document