Reinforcement learning control of robot manipulator

Since the establishment of robotics in industrial applications, industrial robot programming involves therepetitive and time-consuming process of manually specifying a fixed trajectory, which results in machineidle time in terms of production and the necessity of completely reprogramming the robot for different tasks.The increasing number of robotics applications in unstructured environments requires not only intelligent butalso reactive controllers, due to the unpredictability of the environment and safety measures respectively. This paper presents a comparative analysis of two classes of Reinforcement Learning algorithms, value iteration (Q-Learning/DQN) and policy iteration (REINFORCE), applied to the discretized task of positioning a robotic manipulator in an obstacle-filled simulated environment, with no previous knowledge of the obstacles’ positions or of the robot arm dynamics. The agent’s performance and algorithm convergence are analyzed under different reward functions and on four increasingly complex test projects: 1-Degree of Freedom (DOF) robot, 2-DOF robot, Kuka KR16 Industrial robot, Kuka KR16 Industrial robot with random setpoint/obstacle placement. The DQN algorithm presented significantly better performance and reduced training time across all test projects and the third reward function generated better agents for both algorithms.

Download Full-text

Combat Robot Strategy Adaptation Using Multiple Learning Agents

Volume 4: Dynamics, Control and Uncertainty, Parts A and B ◽

10.1115/imece2012-87521 ◽

2012 ◽

Author(s):

Thomas Recchia ◽

Jae Chung ◽

Kishore Pochiraju

Keyword(s):

Reinforcement Learning ◽

Robotic Systems ◽

Multi Agent System ◽

Learning Agents ◽

Loosely Coupled ◽

Reward Function ◽

Strategy Adaptation ◽

Agent Learning ◽

Multi Agent ◽

Reward Functions

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.

Download Full-text

Performance Evaluation of Closed-loop Industrial Applications Over Imperfect Networks

Infocommunications journal ◽

10.36244/icj.2019.2.4 ◽

2019 ◽

pp. 32-38

Author(s):

Sándor Rácz ◽

Géza Szabó ◽

József Pető

Keyword(s):

Closed Loop ◽

Industrial Robot ◽

Industrial Applications ◽

Maximum Speed ◽

Closed Loop Control ◽

Robot Arm ◽

5G Networks ◽

Network Delay ◽

Performance Impact ◽

Loop Control

5G networks provide technology enablers targeting industrial applications. One key enabler is the Ultra Reliable Low Latency Communication (URLLC). This paper studies the performance impact of network delay on closed-loop control for industrial applications. We investigate the performance of the closed-loop control of an UR5 industrial robot arm assuming fix delay. The goal is to stress the system at the upper limit of the possible network delay. We prove that to achieve the maximum speed, URLLC is a must have.

Download Full-text

End-to-End AUV Motion Planning Method Based on Soft Actor–Critic

Sensors ◽

10.3390/s21175893 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5893

Author(s):

Xin Yu ◽

Yushan Sun ◽

Xiangbin Wang ◽

Guocheng Zhang

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Planning System ◽

Optimal Decision ◽

Target Point ◽

Planning Problem ◽

Training Time ◽

Reward Function ◽

End To End

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.

Download Full-text

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yang Gao ◽

Christian M. Meyer ◽

Mohsen Mesgar ◽

Iryna Gurevych

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Poor Performance ◽

Parameter Tuning ◽

Test Time ◽

Sequential Decision ◽

Time Data ◽

Training Time ◽

Search Spaces ◽

Reward Function

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

Download Full-text

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/840 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alberto Camacho ◽

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Normal Form ◽

State Of The Art ◽

Formal Languages ◽

Function Structure ◽

Q Learning ◽

Reward Function ◽

Form Representation ◽

Reward Shaping ◽

Reward Functions

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Download Full-text

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211063081 ◽

2021 ◽

pp. 095440702110630

Author(s):

Zhenhai Gao ◽

Xiangtong Yan ◽

Fei Gao ◽

Lei He

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Autonomous Driving ◽

Decision Strategies ◽

Reward Function ◽

Human Driver ◽

Reward Functions ◽

A Current ◽

Better Than

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Download Full-text

Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions

Advances in Mechanical Engineering ◽

10.1177/16878140211067011 ◽

2021 ◽

Vol 13 (12) ◽

pp. 168781402110670

Author(s):

Xusheng Wang ◽

Jiexin Xie ◽

Shijie Guo ◽

Yue Li ◽

Pengfei Sun ◽

...

Keyword(s):

Reinforcement Learning ◽

Trajectory Planning ◽

Working Environment ◽

Rehabilitation Robot ◽

Exploration Process ◽

Reward Function ◽

Robot Trajectory ◽

Reward Functions ◽

Low Efficiency ◽

Robot Trajectory Planning

Deep reinforcement learning (DRL) provides a new solution for rehabilitation robot trajectory planning in the unstructured working environment, which can bring great convenience to patients. Previous researches mainly focused on optimization strategies but ignored the construction of reward functions, which leads to low efficiency. Different from traditional sparse reward function, this paper proposes two dense reward functions. First, azimuth reward function mainly provides a global guidance and reasonable constraints in the exploration. To further improve the efficiency, a process-oriented aspiration reward function is proposed, it is capable of accelerating the exploration process and avoid locally optimal solution. Experiments show that the proposed reward functions are able to accelerate the convergence rate by 38.4% on average with the mainstream DRL methods. The mean of convergence also increases by 9.5%, and the percentage of standard deviation decreases by 21.2%–23.3%. Results show that the proposed reward functions can significantly improve learning efficiency of DRL methods, and then provide practical possibility for automatic trajectory planning of rehabilitation robot.

Download Full-text

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12440 ◽

2022 ◽

Vol 73 ◽

pp. 173-208

Author(s):

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Finite State Machine ◽

Expressive Power ◽

State Machine ◽

Function Structure ◽

Efficient Manner ◽

Reward Function ◽

Optimal Policies ◽

Finite State ◽

Reward Functions

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.

Download Full-text

Reward-Free Reinforcement Learning Algorithm Using Prediction Network

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200744 ◽

2020 ◽

Author(s):

Zhen Yu ◽

Yimin Feng ◽

Lijun Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Value Functions ◽

Learning Method ◽

Reward Function ◽

Network Training ◽

Learning Tasks ◽

Reward Value ◽

Policy Gradient ◽

Reward Functions

In general reinforcement learning tasks, the formulation of reward functions is a very important step in reinforcement learning. The reward function is not easy to formulate in a large number of systems. The network training effect is sensitive to the reward function, and different reward value functions will get different results. For a class of systems that meet specific conditions, the traditional reinforcement learning method is improved. A state quantity function is designed to replace the reward function, which is more efficient than the traditional reward function. At the same time, the predictive network link is designed so that the network can learn the value of the general state by using the special state. The overall structure of the network will be improved based on the Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the algorithm was successfully applied in the environment of FrozenLake, and achieved good performance. The experiment proves the effectiveness of the algorithm and realizes rewardless reinforcement learning in a class of systems.

Download Full-text

CURIOSITY-DRIVEN REINFORCEMENT LEARNING AGENT FOR MAPPING UNKNOWN INDOOR ENVIRONMENTS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-1-2021-129-2021 ◽

2021 ◽

Vol V-1-2021 ◽

pp. 129-136

Author(s):

N. Botteghi ◽

R. Schulte ◽

B. Sirmacek ◽

M. Poel ◽

C. Brune

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Mobile Robot ◽

Indoor Environments ◽

Reward Function ◽

Learning Framework ◽

Learning Agent ◽

The World ◽

Reward Functions ◽

Slam Algorithm

Abstract. Autonomously exploring and mapping is one of the open challenges of robotics and artificial intelligence. Especially when the environments are unknown, choosing the optimal navigation directive is not straightforward. In this paper, we propose a reinforcement learning framework for navigating, exploring, and mapping unknown environments. The reinforcement learning agent is in charge of selecting the commands for steering the mobile robot, while a SLAM algorithm estimates the robot pose and maps the environments. The agent, to select optimal actions, is trained to be curious about the world. This concept translates into the introduction of a curiosity-driven reward function that encourages the agent to steer the mobile robot towards unknown and unseen areas of the world and the map. We test our approach in explorations challenges in different indoor environments. The agent trained with the proposed reward function outperforms the agents trained with reward functions commonly used in the literature for solving such tasks.

Download Full-text