Real–Sim–Real Transfer for Real-World Robot Control Policy Learning with Deep Reinforcement Learning

Compared to traditional data-driven learning methods, recently developed deep reinforcement learning (DRL) approaches can be employed to train robot agents to obtain control policies with appealing performance. However, learning control policies for real-world robots through DRL is costly and cumbersome. A promising alternative is to train policies in simulated environments and transfer the learned policies to real-world scenarios. Unfortunately, due to the reality gap between simulated and real-world environments, the policies learned in simulated environments often cannot be generalized well to the real world. Bridging the reality gap is still a challenging problem. In this paper, we propose a novel real–sim–real (RSR) transfer method that includes a real-to-sim training phase and a sim-to-real inference phase. In the real-to-sim training phase, a task-relevant simulated environment is constructed based on semantic information of the real-world scenario and coordinate transformation, and then a policy is trained with the DRL method in the built simulated environment. In the sim-to-real inference phase, the learned policy is directly applied to control the robot in real-world scenarios without any real-world data. Experimental results in two different robot control tasks show that the proposed RSR method can train skill policies with high generalization performance and significantly low training costs.

Download Full-text

Grounded action transformation for sim-to-real reinforcement learning

Machine Learning ◽

10.1007/s10994-021-05982-z ◽

2021 ◽

Author(s):

Josiah P. Hanna ◽

Siddharth Desai ◽

Haresh Karnan ◽

Garrett Warnell ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Humanoid Robot ◽

State Transitions ◽

Learning To Learn ◽

Control Policies ◽

Promising Alternative ◽

The Real ◽

Simulation Learning ◽

Grounded Action

AbstractReinforcement learning in simulation is a promising alternative to the prohibitive sample cost of reinforcement learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the target, physical system. Grounded simulation learning (gsl) is a general framework that promises to address this issue by altering the simulator to better match the real world (Farchy et al. 2013 in Proceedings of the 12th international conference on autonomous agents and multiagent systems (AAMAS)). This article introduces a new algorithm for gsl—Grounded Action Transformation (GAT)—and applies it to learning control policies for a humanoid robot. We evaluate our algorithm in controlled experiments where we show it to allow policies learned in simulation to transfer to the real world. We then apply our algorithm to learning a fast bipedal walk on a humanoid robot and demonstrate a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk. This striking empirical success notwithstanding, further empirical analysis shows that gat may struggle when the real world has stochastic state transitions. To address this limitation we generalize gat to the stochasticgat (sgat) algorithm and empirically show that sgat leads to successful real world transfer in situations where gat may fail to find a good policy. Our results contribute to a deeper understanding of grounded simulation learning and demonstrate its effectiveness for applying reinforcement learning to learn robot control policies entirely in simulation.

Download Full-text

How to train your robot with deep reinforcement learning: lessons we have learned

The International Journal of Robotics Research ◽

10.1177/0278364920987859 ◽

2021 ◽

pp. 027836492098785

Author(s):

Julian Ibarz ◽

Jie Tan ◽

Chelsea Finn ◽

Mrinal Kalakrishnan ◽

Peter Pastor ◽

...

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Case Studies ◽

Real World ◽

Review Article ◽

The Real ◽

Complex Skills ◽

Real World Learning ◽

Level Sensor ◽

Embodied Agent

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

Download Full-text

An Application of Transfer to American Football: From Observation of Raw Video to Control in a Simulated Environment

AI Magazine ◽

10.1609/aimag.v32i2.2336 ◽

2011 ◽

Vol 32 (2) ◽

pp. 107 ◽

Cited By ~ 14

Author(s):

David J. Stracuzzi ◽

Alan Fern ◽

Kamal Ali ◽

Robin Hess ◽

Jervis Pinto ◽

...

Keyword(s):

Real World ◽

Intelligent Systems ◽

American Football ◽

Video Observation ◽

Prototype System ◽

The Real ◽

Simulated Environment

Automatic transfer of learned knowledge from one task or domain to another offers great potential to simplify and expedite the construction and deployment of intelligent systems. In practice however, there are many barriers to achieving this goal. In this article, we present a prototype system for the real-world context of transferring knowledge of American football from video observation to control in a game simulator. We trace an example play from the raw video through execution and adaptation in the simulator, highlighting the system's component algorithms along with issues of complexity, generality, and scale. We then conclude with a discussion of the implications of this work for other applications, along with several possible improvements.

Download Full-text

Motion Analysis of Human Lifting Works with Heavy Objects

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2005.p0628 ◽

2005 ◽

Vol 17 (6) ◽

pp. 628-635 ◽

Cited By ~ 2

Author(s):

Nobutomo Matsunaga ◽

◽

Shigeyasu Kawaji

Keyword(s):

Motion Control ◽

Motion Analysis ◽

Real World ◽

Control Strategy ◽

Robot Control ◽

Autonomous Robots ◽

Biped Robot ◽

Weight Lifting ◽

The Real ◽

Lower Back

Advances in robot development involves autonomous work in the real world, where robots may lift or carry heavy objects. Motion control of autonomous robots is an important issue, in which configurations and motion differ depending on the robot and the object. Isaka et al. analyzed that lifting configuration is important in realizing efficient lifting minimizing the burden on the lower back, but their analysis was limited to weight lifting of a fixed object. Biped robot control requires analyzing different lifting in diverse situations. Thus, motion analysis is important in clarifying control strategy. We analyzed dynamics of human lifting of barbells in different situations, and found that lifting can be divided into four motions.

Download Full-text

Rule Extraction by Structural Learning with an Immediate Critic

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.1999.p0341 ◽

1999 ◽

Vol 3 (5) ◽

pp. 341-347 ◽

Cited By ~ 1

Author(s):

Masumi Ishikawa ◽

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Back Propagation ◽

Rule Extraction ◽

Distributed Representation ◽

Structural Learning ◽

The Real ◽

Training Samples

Studies on rule extraction using neural networks have exclusively adopted supervised learning, in which correct outputs are always given as training samples. The real world, however, does not always provide correct answers. We advocate the use of learning with an immediate critic, which is simple reinforcement learning. It uses an immediate binary reinforcement signal indicating whether or not an output is correct. This, of course, makes learning more difficult and time-consuming than supervised learning. Learning with an immediate critic alone, however, is not powerful enough in extracting rules from data because distributed representation emerges just as in back propagation learning. We propose to combine learning with an immediate critic and structural learning with forgetting (SLF) - structural learning with an immediate critic and forgetting (SLCF). A procedure of rule extraction from data by SLCF is similar to that by SLF. Applications of the proposed method to rule extraction from lenses data demonstrate its effectiveness.

Download Full-text

Augmented-Reality Displays

Virtual Environments and Advanced Interface Design ◽

10.1093/oso/9780195075557.003.0025 ◽

1995 ◽

Author(s):

Woodrow Barfield ◽

Craig Rosenberg

Keyword(s):

Augmented Reality ◽

Virtual Environment ◽

Real World ◽

Visual Scene ◽

Virtual Object ◽

Direct Perception ◽

The Real ◽

Simulated Environment ◽

Technological Advances ◽

Update Rate

Recent technological advancements in virtual environment equipment have led to the development of augmented reality displays for applications in medicine, manufacturing, and scientific visualization (Bajura et al., 1992; Janin et al., 1993; Milgram et al., 1991; Lion et al., 1993). However, even with technological advances in virtual environment equipment, the development of augmented reality displays are still in the early stages of development, primarily demonstrating the possibilities, the use, and the technical realization of the concept. The purpose of this chapter is to review the literature on the design and use of augmented reality displays, to suggest applications for this technology, and to suggest new techniques to create these displays. In addition, the chapter also discusses the technological issues associated with creating augmented realities such as image registration, update rate, and the range and sensitivity of position sensors. Furthermore, the chapter discusses humanfactors issues and visual requirements that should be considered when creating augmented-reality displays. Essentially, an augmented-reality display allows a designer to combine part or all of a real-world visual scene, with synthetic imagery. Typically, the real-world visual scene in an augmented-reality display is captured by video or directly viewed. In terms of descriptions of augmented reality found in the literature, Janin et al. (1993) used the term “augmented reality” to signify a see-through head-mounted display (HMD) which allowed the user to view his surroundings with the addition of computer graphics overlaid on the real-world scene. Similarly, Robinett (1992) suggested the term “augmented reality” for a real image that was being enhanced with synthetic parts; he called the result a “merged representation”. Finally, Fuchs and Neuman (1993) observed that an augmented-reality display combined a simulated environment with direct perception of the world with the capability to interactively manipulate the real or virtual object(s). Based on the above descriptions, most current augmented-reality displays are designed using see-through HMDs which allow the observer to view the real world directly with the naked eye. However, if video is used to capture the real world, one may use either an opaque HMD or a screen-based system to view the scene (Lion et al., 1993).

Download Full-text

Adaptive Reinforcement Learning Integrating Exploitation-and Exploration-oriented Learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.1999.p0474 ◽

1999 ◽

Vol 3 (6) ◽

pp. 474-478

Author(s):

Satoshi Kurihara ◽

◽

Rikio Onai ◽

Toshiharu Sugawara ◽

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Real World ◽

Large Scale ◽

Autonomous Systems ◽

Learning System ◽

The Internet ◽

Changing Environment ◽

The Real ◽

Exploitation And Exploration

We propose and evaluate an adaptive reinforcement learning system that integrates both exploitation- and exploration-oriented learning (ArLee). Compared to conventional reinforcement learning, ArLee is more robust in a dynamically changing environment and conducts exploration-oriented learning efficiently even in a large-scale environment. It is thus well suited for autonomous systems, for example, software agents and mobile robots, that operate in dynamic, large-scale environments, such as the real world and the Internet. Simulation demonstrates the learning system’s basic effectiveness.

Download Full-text

Virtual-Taobao: Virtualizing Real-World Online Retail Environment for Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014902 ◽

2019 ◽

Vol 33 ◽

pp. 4902-4909 ◽

Cited By ~ 6

Author(s):

Jing-Cheng Shi ◽

Yang Yu ◽

Qing Da ◽

Shi-Yong Chen ◽

An-Xiang Zeng

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Physical Environment ◽

Physical World ◽

Online Retail ◽

The Real ◽

Policy Model ◽

Physical Environments ◽

Sampling Cost ◽

Norm Constraint

Applying reinforcement learning in physical-world tasks is extremely challenging. It is commonly infeasible to sample a large number of trials, as required by current reinforcement learning methods, in a physical environment. This paper reports our project on using reinforcement learning for better commodity search in Taobao, one of the largest online retail platforms and meanwhile a physical environment with a high sampling cost. Instead of training reinforcement learning in Taobao directly, we present our environment-building approach: we build Virtual-Taobao, a simulator learned from historical customer behavior data, and then we train policies in Virtual-Taobao with no physical sampling costs. To improve the simulation precision, we propose GAN-SD (GAN for Simulating Distributions) for customer feature generation with better matched distribution; we propose MAIL (Multiagent Adversarial Imitation Learning) for generating better generalizable customer actions. To further avoid overfitting the imperfection of the simulator, we propose ANC (Action Norm Constraint) strategy to regularize the policy model. In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers’ records. Compared with the real Taobao, Virtual-Taobao faithfully recovers important properties of the real environment. We further show that the policies trained purely in Virtual-Taobao, which has zero physical sampling cost, can have significantly superior real-world performance to the traditional supervised approaches, through online A/B tests. We hope this work may shed some light on applying reinforcement learning in complex physical environments.

Download Full-text

Development of a Swarm UAV Simulator Integrating Realistic Motion Control Models for Disaster Operations

Volume 3: Vibration in Mechanical Systems; Modeling and Validation; Dynamic Systems and Control Education; Vibrations and Control of Systems; Modeling and Estimation for Vehicle Safety and Integrity; Modeling and Control of IC Engines and Aftertreatment Systems; Unmanned Aerial Vehicles (UAVs) and Their Applications; Dynamics and Control of Renewable Energy Systems; Energy Harvesting; Control of Smart Buildings and Microgrids; Energy Systems ◽

10.1115/dscc2017-5177 ◽

2017 ◽

Author(s):

Kazi Tanvir Ahmed Siddiqui ◽

David Feil-Seifer ◽

Tianyi Jiang ◽

Sonu Jose ◽

Siming Liu ◽

...

Keyword(s):

User Interface ◽

Unmanned Aerial Vehicles ◽

Motion Control ◽

User Interfaces ◽

Real World ◽

The Real ◽

Simulated Environments ◽

Aerial Vehicles ◽

Waypoint Navigation ◽

And Training

Simulation environments for Unmanned Aerial Vehicles (UAVs) can be very useful for prototyping user interfaces and training personnel that will operate UAVs in the real world. The realistic operation of such simulations will only enhance the value of such training. In this paper, we present the integration of a model-based waypoint navigation controller into the Reno Rescue Simulator for the purposes of providing a more realistic user interface in simulated environments. We also present potential uses for such simulations, even for real-world operation of UAVs.

Download Full-text

Reinforcement learning in the real world

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) ◽

10.1109/ijcnn.2004.1380847 ◽

2005 ◽

Cited By ~ 5

Author(s):

A.G. Barto

Keyword(s):

Reinforcement Learning ◽

Real World ◽

The Real

Download Full-text