Grounded action transformation for sim-to-real reinforcement learning

AbstractReinforcement learning in simulation is a promising alternative to the prohibitive sample cost of reinforcement learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the target, physical system. Grounded simulation learning (gsl) is a general framework that promises to address this issue by altering the simulator to better match the real world (Farchy et al. 2013 in Proceedings of the 12th international conference on autonomous agents and multiagent systems (AAMAS)). This article introduces a new algorithm for gsl—Grounded Action Transformation (GAT)—and applies it to learning control policies for a humanoid robot. We evaluate our algorithm in controlled experiments where we show it to allow policies learned in simulation to transfer to the real world. We then apply our algorithm to learning a fast bipedal walk on a humanoid robot and demonstrate a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk. This striking empirical success notwithstanding, further empirical analysis shows that gat may struggle when the real world has stochastic state transitions. To address this limitation we generalize gat to the stochasticgat (sgat) algorithm and empirically show that sgat leads to successful real world transfer in situations where gat may fail to find a good policy. Our results contribute to a deeper understanding of grounded simulation learning and demonstrate its effectiveness for applying reinforcement learning to learn robot control policies entirely in simulation.

Download Full-text

Real–Sim–Real Transfer for Real-World Robot Control Policy Learning with Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app10051555 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1555

Author(s):

Naijun Liu ◽

Yinghao Cai ◽

Tao Lu ◽

Rui Wang ◽

Shuo Wang

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Robot Control ◽

Training Phase ◽

Control Policies ◽

Promising Alternative ◽

The Real ◽

Training Costs ◽

Simulated Environments ◽

Simulated Environment

Compared to traditional data-driven learning methods, recently developed deep reinforcement learning (DRL) approaches can be employed to train robot agents to obtain control policies with appealing performance. However, learning control policies for real-world robots through DRL is costly and cumbersome. A promising alternative is to train policies in simulated environments and transfer the learned policies to real-world scenarios. Unfortunately, due to the reality gap between simulated and real-world environments, the policies learned in simulated environments often cannot be generalized well to the real world. Bridging the reality gap is still a challenging problem. In this paper, we propose a novel real–sim–real (RSR) transfer method that includes a real-to-sim training phase and a sim-to-real inference phase. In the real-to-sim training phase, a task-relevant simulated environment is constructed based on semantic information of the real-world scenario and coordinate transformation, and then a policy is trained with the DRL method in the built simulated environment. In the sim-to-real inference phase, the learned policy is directly applied to control the robot in real-world scenarios without any real-world data. Experimental results in two different robot control tasks show that the proposed RSR method can train skill policies with high generalization performance and significantly low training costs.

Download Full-text

How to train your robot with deep reinforcement learning: lessons we have learned

The International Journal of Robotics Research ◽

10.1177/0278364920987859 ◽

2021 ◽

pp. 027836492098785

Author(s):

Julian Ibarz ◽

Jie Tan ◽

Chelsea Finn ◽

Mrinal Kalakrishnan ◽

Peter Pastor ◽

...

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Case Studies ◽

Real World ◽

Review Article ◽

The Real ◽

Complex Skills ◽

Real World Learning ◽

Level Sensor ◽

Embodied Agent

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

Download Full-text

Off-line path integral reinforcement learning using stochastic robot dynamics approximated by sparse pseudo-input Gaussian processes: Application to humanoid robot motor learning in the real environment

2013 IEEE International Conference on Robotics and Automation ◽

10.1109/icra.2013.6630740 ◽

2013 ◽

Cited By ~ 1

Author(s):

Norikazu Sugimoto ◽

Jun Morimoto

Keyword(s):

Reinforcement Learning ◽

Motor Learning ◽

Path Integral ◽

Gaussian Processes ◽

Humanoid Robot ◽

Robot Dynamics ◽

The Real ◽

Real Environment ◽

Line Path

Download Full-text

Real-world reinforcement learning for autonomous humanoid robot docking

Robotics and Autonomous Systems ◽

10.1016/j.robot.2012.05.019 ◽

2012 ◽

Vol 60 (11) ◽

pp. 1400-1407 ◽

Cited By ~ 16

Author(s):

Nicolás Navarro-Guerrero ◽

Cornelius Weber ◽

Pascal Schroeter ◽

Stefan Wermter

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Humanoid Robot ◽

Autonomous Humanoid

Download Full-text

Rule Extraction by Structural Learning with an Immediate Critic

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.1999.p0341 ◽

1999 ◽

Vol 3 (5) ◽

pp. 341-347 ◽

Cited By ~ 1

Author(s):

Masumi Ishikawa ◽

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Back Propagation ◽

Rule Extraction ◽

Distributed Representation ◽

Structural Learning ◽

The Real ◽

Training Samples

Studies on rule extraction using neural networks have exclusively adopted supervised learning, in which correct outputs are always given as training samples. The real world, however, does not always provide correct answers. We advocate the use of learning with an immediate critic, which is simple reinforcement learning. It uses an immediate binary reinforcement signal indicating whether or not an output is correct. This, of course, makes learning more difficult and time-consuming than supervised learning. Learning with an immediate critic alone, however, is not powerful enough in extracting rules from data because distributed representation emerges just as in back propagation learning. We propose to combine learning with an immediate critic and structural learning with forgetting (SLF) - structural learning with an immediate critic and forgetting (SLCF). A procedure of rule extraction from data by SLCF is similar to that by SLF. Applications of the proposed method to rule extraction from lenses data demonstrate its effectiveness.

Download Full-text

Adaptive Reinforcement Learning Integrating Exploitation-and Exploration-oriented Learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.1999.p0474 ◽

1999 ◽

Vol 3 (6) ◽

pp. 474-478

Author(s):

Satoshi Kurihara ◽

◽

Rikio Onai ◽

Toshiharu Sugawara ◽

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Real World ◽

Large Scale ◽

Autonomous Systems ◽

Learning System ◽

The Internet ◽

Changing Environment ◽

The Real ◽

Exploitation And Exploration

We propose and evaluate an adaptive reinforcement learning system that integrates both exploitation- and exploration-oriented learning (ArLee). Compared to conventional reinforcement learning, ArLee is more robust in a dynamically changing environment and conducts exploration-oriented learning efficiently even in a large-scale environment. It is thus well suited for autonomous systems, for example, software agents and mobile robots, that operate in dynamic, large-scale environments, such as the real world and the Internet. Simulation demonstrates the learning system’s basic effectiveness.

Download Full-text

Virtual-Taobao: Virtualizing Real-World Online Retail Environment for Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014902 ◽

2019 ◽

Vol 33 ◽

pp. 4902-4909 ◽

Cited By ~ 6

Author(s):

Jing-Cheng Shi ◽

Yang Yu ◽

Qing Da ◽

Shi-Yong Chen ◽

An-Xiang Zeng

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Physical Environment ◽

Physical World ◽

Online Retail ◽

The Real ◽

Policy Model ◽

Physical Environments ◽

Sampling Cost ◽

Norm Constraint

Applying reinforcement learning in physical-world tasks is extremely challenging. It is commonly infeasible to sample a large number of trials, as required by current reinforcement learning methods, in a physical environment. This paper reports our project on using reinforcement learning for better commodity search in Taobao, one of the largest online retail platforms and meanwhile a physical environment with a high sampling cost. Instead of training reinforcement learning in Taobao directly, we present our environment-building approach: we build Virtual-Taobao, a simulator learned from historical customer behavior data, and then we train policies in Virtual-Taobao with no physical sampling costs. To improve the simulation precision, we propose GAN-SD (GAN for Simulating Distributions) for customer feature generation with better matched distribution; we propose MAIL (Multiagent Adversarial Imitation Learning) for generating better generalizable customer actions. To further avoid overfitting the imperfection of the simulator, we propose ANC (Action Norm Constraint) strategy to regularize the policy model. In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers’ records. Compared with the real Taobao, Virtual-Taobao faithfully recovers important properties of the real environment. We further show that the policies trained purely in Virtual-Taobao, which has zero physical sampling cost, can have significantly superior real-world performance to the traditional supervised approaches, through online A/B tests. We hope this work may shed some light on applying reinforcement learning in complex physical environments.

Download Full-text

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

Towards Autonomous Robotic Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-23232-9_21 ◽

2011 ◽

pp. 231-240 ◽

Cited By ~ 7

Author(s):

Nicolás Navarro ◽

Cornelius Weber ◽

Stefan Wermter

Keyword(s):

Reinforcement Learning ◽

Home Environment ◽

Real World ◽

Humanoid Robot ◽

Autonomous Humanoid

Download Full-text

Reinforcement learning in the real world

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) ◽

10.1109/ijcnn.2004.1380847 ◽

2005 ◽

Cited By ~ 5

Author(s):

A.G. Barto

Keyword(s):

Reinforcement Learning ◽

Real World ◽

The Real

Download Full-text

Automatic Reconstruction of Building-Scale Indoor 3D Environment with a Deep-Reinforcement-Learning-Based Mobile Robot

International Journal of Robotics and Automation Technology ◽

10.31875/2409-9694.2019.06.2 ◽

2021 ◽

Vol 6 ◽

pp. 11-23

Author(s):

Menglong Yang ◽

Katashi Nagao

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Real World ◽

Low Cost ◽

Three Dimensional ◽

Point Clouds ◽

Autonomous Agent ◽

Entire Area ◽

The Real ◽

3D Point Clouds

The aim of this paper is to digitize the environments in which humans live, at low cost, and reconstruct highly accurate three-dimensional environments that are based on those in the real world. This three-dimensional content can be used such as for virtual reality environments and three-dimensional maps for automatic driving systems. In general, however, a three-dimensional environment must be carefully reconstructed by manually moving the sensors used to first scan the real environment on which the three-dimensional one is based. This is done so that every corner of an entire area can be measured, but time and costs increase as the area expands. Therefore, a system that creates three-dimensional content that is based on real-world large-scale buildings at low cost is proposed. This involves automatically scanning the indoors with a mobile robot that uses low-cost sensors and generating 3D point clouds. When the robot reaches an appropriate measurement position, it collects the three-dimensional data of shapes observable from that position by using a 3D sensor and 360-degree panoramic camera. The problem of determining an appropriate measurement position is called the “next best view problem,” and it is difficult to solve in a complicated indoor environment. To deal with this problem, a deep reinforcement learning method is employed. It combines reinforcement learning, with which an autonomous agent learns strategies for selecting behavior, and deep learning done using a neural network. As a result, 3D point cloud data can be generated with better quality than the conventional rule-based approach.

Download Full-text