Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

Andrew Lobbezoo; Yanjun Qian; Hyock-Ju Kwon

doi:10.3390/robotics10030105

Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

Robotics ◽

10.3390/robotics10030105 ◽

2021 ◽

Vol 10 (3) ◽

pp. 105

Author(s):

Andrew Lobbezoo ◽

Yanjun Qian ◽

Hyock-Ju Kwon

Keyword(s):

Reinforcement Learning ◽

Critical Discussion ◽

Value Iteration ◽

Open Problems ◽

Work Related ◽

Model Generalization ◽

Pick And Place ◽

Reward Shaping ◽

Place Task ◽

Policy Optimization

The field of robotics has been rapidly developing in recent years, and the work related to training robotic agents with reinforcement learning has been a major focus of research. This survey reviews the application of reinforcement learning for pick-and-place operations, a task that a logistics robot can be trained to complete without support from a robotics engineer. To introduce this topic, we first review the fundamentals of reinforcement learning and various methods of policy optimization, such as value iteration and policy search. Next, factors which have an impact on the pick-and-place task, such as reward shaping, imitation learning, pose estimation, and simulation environment are examined. Following the review of the fundamentals and key factors for reinforcement learning, we present an extensive review of all methods implemented by researchers in the field to date. The strengths and weaknesses of each method from literature are discussed, and details about the contribution of each manuscript to the field are reviewed. The concluding critical discussion of the available literature, and the summary of open problems indicates that experiment validation, model generalization, and grasp pose selection are topics that require additional research.

Download Full-text

Pick-and-Place Task using Wheeled Mobile Manipulator - A Control Design Perspective

2020 International Conference on Computing and Information Technology (ICCIT-1441) ◽

10.1109/iccit-144147971.2020.9213717 ◽

2020 ◽

Author(s):

Muhammad Affan ◽

Syed Umaid Ahmed ◽

Riaz Uddin

Keyword(s):

Control Design ◽

Mobile Manipulator ◽

Pick And Place ◽

Place Task ◽

Design Perspective

Download Full-text

Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation

Machine Learning ◽

10.1007/s10994-021-06006-6 ◽

2021 ◽

Author(s):

Srivatsan Krishnan ◽

Behzad Boroujerdian ◽

William Fu ◽

Aleksandra Faust ◽

Vijay Janapa Reddi

Keyword(s):

Reinforcement Learning ◽

Embedded System ◽

Broad Class ◽

Visual Navigation ◽

Raspberry Pi ◽

Latency Distribution ◽

Hardware In The Loop ◽

Resource Constrained ◽

Aerial Robot ◽

Policy Optimization

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.

Download Full-text

Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3045087 ◽

2021 ◽

pp. 1-10

Author(s):

Tao Bian ◽

Zhong-Ping Jiang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Nonlinear Systems ◽

Continuous Time ◽

Value Iteration ◽

Adaptive Optimal Control ◽

A Value

Download Full-text

An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app11041514 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1514 ◽

Cited By ~ 2

Author(s):

Quang-Duy Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Traffic Congestion ◽

Autonomous Vehicles ◽

Penetration Rate ◽

Autonomous Vehicle ◽

Effective Means ◽

Urban Network ◽

Learning Agents ◽

Policy Optimization ◽

The Impact

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.

Download Full-text

Hierarchical Reinforcement Learning

ACM Computing Surveys ◽

10.1145/3453160 ◽

2021 ◽

Vol 54 (5) ◽

pp. 1-35

Author(s):

Shubham Pateria ◽

Budhitama Subagdja ◽

Ah-hwee Tan ◽

Chai Quek

Keyword(s):

Reinforcement Learning ◽

Future Research ◽

Comprehensive Overview ◽

Open Problems ◽

Practical Applications ◽

Hierarchical Reinforcement Learning ◽

The Past ◽

Agent Learning ◽

Multi Agent ◽

Supplementary Material

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

Download Full-text

Multimodal Mixed Reality Impact on a Hand Guiding Task with a Holographic Cobot

Multimodal Technologies and Interaction ◽

10.3390/mti4040078 ◽

2020 ◽

Vol 4 (4) ◽

pp. 78

Author(s):

Andoni Rivera Pinto ◽

Johan Kildal ◽

Elena Lazkano

Keyword(s):

Augmented Reality ◽

Industrial Production ◽

User Study ◽

Haptic Feedback ◽

Mixed Reality ◽

Robot Arm ◽

Pick And Place ◽

Place Task ◽

Guidance Technique ◽

The Impact

In the context of industrial production, a worker that wants to program a robot using the hand-guidance technique needs that the robot is available to be programmed and not in operation. This means that production with that robot is stopped during that time. A way around this constraint is to perform the same manual guidance steps on a holographic representation of the digital twin of the robot, using augmented reality technologies. However, this presents the limitation of a lack of tangibility of the visual holograms that the user tries to grab. We present an interface in which some of the tangibility is provided through ultrasound-based mid-air haptics actuation. We report a user study that evaluates the impact that the presence of such haptic feedback may have on a pick-and-place task of the wrist of a holographic robot arm which we found to be beneficial.

Download Full-text

Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control.

International Journal of Emerging Electric Power Systems ◽

10.2202/1553-779x.1066 ◽

2005 ◽

Vol 3 (1) ◽

Cited By ~ 14

Author(s):

Damien Ernst ◽

Mevludin Glavic ◽

Pierre Geurts ◽

Louis Wehenkel

Keyword(s):

Reinforcement Learning ◽

Power System ◽

Control Problem ◽

Learning Algorithm ◽

Electrical Power ◽

Complex Case ◽

Iteration Algorithm ◽

Value Iteration ◽

Learning Context ◽

Power System Control

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.

Download Full-text

A Model-Based Factored Bayesian Reinforcement Learning Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1092 ◽

2014 ◽

Vol 513-517 ◽

pp. 1092-1095

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Practical Applications ◽

Model Based ◽

Online Planning ◽

Bayesian Reinforcement Learning ◽

Bayesian Inference Method ◽

Unknown Structure

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.

Download Full-text

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Download Full-text

Queuing theory based part-flow estimation in a pick-and-place task with a multi-robot system

Journal of Advanced Mechanical Design Systems and Manufacturing ◽

10.1299/jamdsm.2018jamdsm0061 ◽

2018 ◽

Vol 12 (2) ◽

pp. JAMDSM0061-JAMDSM0061

Author(s):

Yanjiang HUANG ◽

Ryosuke CHIBA ◽

Tamio ARAI ◽

Tsuyoshi UEYAMA ◽

Xianmin ZHANG ◽

...

Keyword(s):

Queuing Theory ◽

Robot System ◽

Flow Estimation ◽

Pick And Place ◽

Place Task ◽

Multi Robot

Download Full-text