scholarly journals Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

Robotics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 105
Author(s):  
Andrew Lobbezoo ◽  
Yanjun Qian ◽  
Hyock-Ju Kwon

The field of robotics has been rapidly developing in recent years, and the work related to training robotic agents with reinforcement learning has been a major focus of research. This survey reviews the application of reinforcement learning for pick-and-place operations, a task that a logistics robot can be trained to complete without support from a robotics engineer. To introduce this topic, we first review the fundamentals of reinforcement learning and various methods of policy optimization, such as value iteration and policy search. Next, factors which have an impact on the pick-and-place task, such as reward shaping, imitation learning, pose estimation, and simulation environment are examined. Following the review of the fundamentals and key factors for reinforcement learning, we present an extensive review of all methods implemented by researchers in the field to date. The strengths and weaknesses of each method from literature are discussed, and details about the contribution of each manuscript to the field are reviewed. The concluding critical discussion of the available literature, and the summary of open problems indicates that experiment validation, model generalization, and grasp pose selection are topics that require additional research.

2021 ◽  
Author(s):  
Srivatsan Krishnan ◽  
Behzad Boroujerdian ◽  
William Fu ◽  
Aleksandra Faust ◽  
Vijay Janapa Reddi

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.


2021 ◽  
Vol 11 (4) ◽  
pp. 1514 ◽  
Author(s):  
Quang-Duy Tran ◽  
Sang-Hoon Bae

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.


2021 ◽  
Vol 54 (5) ◽  
pp. 1-35
Author(s):  
Shubham Pateria ◽  
Budhitama Subagdja ◽  
Ah-hwee Tan ◽  
Chai Quek

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.


2020 ◽  
Vol 4 (4) ◽  
pp. 78
Author(s):  
Andoni Rivera Pinto ◽  
Johan Kildal ◽  
Elena Lazkano

In the context of industrial production, a worker that wants to program a robot using the hand-guidance technique needs that the robot is available to be programmed and not in operation. This means that production with that robot is stopped during that time. A way around this constraint is to perform the same manual guidance steps on a holographic representation of the digital twin of the robot, using augmented reality technologies. However, this presents the limitation of a lack of tangibility of the visual holograms that the user tries to grab. We present an interface in which some of the tangibility is provided through ultrasound-based mid-air haptics actuation. We report a user study that evaluates the impact that the presence of such haptic feedback may have on a pick-and-place task of the wrist of a holographic robot arm which we found to be beneficial.


Author(s):  
Damien Ernst ◽  
Mevludin Glavic ◽  
Pierre Geurts ◽  
Louis Wehenkel

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.


2014 ◽  
Vol 513-517 ◽  
pp. 1092-1095
Author(s):  
Bo Wu ◽  
Yan Peng Feng ◽  
Hong Yan Zheng

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.


2022 ◽  
pp. 1-12
Author(s):  
Shuailong Li ◽  
Wei Zhang ◽  
Huiwen Zhang ◽  
Xin Zhang ◽  
Yuquan Leng

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.


2018 ◽  
Vol 12 (2) ◽  
pp. JAMDSM0061-JAMDSM0061
Author(s):  
Yanjiang HUANG ◽  
Ryosuke CHIBA ◽  
Tamio ARAI ◽  
Tsuyoshi UEYAMA ◽  
Xianmin ZHANG ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document