Off-line path integral reinforcement learning using stochastic robot dynamics approximated by sparse pseudo-input Gaussian processes: Application to humanoid robot motor learning in the real environment

It is difficult for robots to achieve tasks contacting environment due to error between the controller models and the real environment. To solve this problem, we propose having a robot autonomously obtains proficient robust skills against model error. Numerical simulation and experiments using an autonomous space robot demonstrate the feasibility of our proposal in the real environment.

Download Full-text

Grounded action transformation for sim-to-real reinforcement learning

Machine Learning ◽

10.1007/s10994-021-05982-z ◽

2021 ◽

Author(s):

Josiah P. Hanna ◽

Siddharth Desai ◽

Haresh Karnan ◽

Garrett Warnell ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Humanoid Robot ◽

State Transitions ◽

Learning To Learn ◽

Control Policies ◽

Promising Alternative ◽

The Real ◽

Simulation Learning ◽

Grounded Action

AbstractReinforcement learning in simulation is a promising alternative to the prohibitive sample cost of reinforcement learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the target, physical system. Grounded simulation learning (gsl) is a general framework that promises to address this issue by altering the simulator to better match the real world (Farchy et al. 2013 in Proceedings of the 12th international conference on autonomous agents and multiagent systems (AAMAS)). This article introduces a new algorithm for gsl—Grounded Action Transformation (GAT)—and applies it to learning control policies for a humanoid robot. We evaluate our algorithm in controlled experiments where we show it to allow policies learned in simulation to transfer to the real world. We then apply our algorithm to learning a fast bipedal walk on a humanoid robot and demonstrate a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk. This striking empirical success notwithstanding, further empirical analysis shows that gat may struggle when the real world has stochastic state transitions. To address this limitation we generalize gat to the stochasticgat (sgat) algorithm and empirically show that sgat leads to successful real world transfer in situations where gat may fail to find a good policy. Our results contribute to a deeper understanding of grounded simulation learning and demonstrate its effectiveness for applying reinforcement learning to learn robot control policies entirely in simulation.

Download Full-text

Autonomous Robust Skill Generation Using Reinforcement Learning with Plant Variation

Advances in Mechanical Engineering ◽

10.1155/2014/276264 ◽

2014 ◽

Vol 6 ◽

pp. 276264 ◽

Cited By ~ 3

Author(s):

Kei Senda ◽

Yurika Tani

Keyword(s):

Reinforcement Learning ◽

Numerical Simulations ◽

Space Robot ◽

Modeling Error ◽

Plant Variation ◽

The Real ◽

Real Environment ◽

Complete Contact ◽

Structure Assembly ◽

Contact Tasks

This paper discusses an autonomous space robot for a truss structure assembly using some reinforcement learning. It is difficult for a space robot to complete contact tasks within a real environment, for example, a peg-in-hole task, because of error between the real environment and the controller model. In order to solve problems, we propose an autonomous space robot able to obtain proficient and robust skills by overcoming error to complete a task. The proposed approach develops skills by reinforcement learning that considers plant variation, that is, modeling error. Numerical simulations and experiments show the proposed method is useful in real environments.

Download Full-text

Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization

International Journal of Humanoid Robotics ◽

10.1142/s0219843614500248 ◽

2014 ◽

Vol 11 (03) ◽

pp. 1450024 ◽

Cited By ~ 6

Author(s):

Paweł Wawrzyński

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Control Systems ◽

Humanoid Robot ◽

Learning Algorithm ◽

Robot Dynamics ◽

Fast Walking ◽

Model Free ◽

Initial Control ◽

Experience Replay

In this paper, a control system for humanoid robot walking is approximately optimized by means of reinforcement learning. Given is a 18 DOF humanoid whose gait is based on replaying a simple trajectory. This trajectory is translated into a reactive policy. A neural network whose input represents the robot state learns to produce appropriate output that additively modifies the initial control. The learning algorithm applied is actor–critic with experience replay. In 50 min of learning, the slow initial gait changes to a dexterous and fast walking. No model of the robot dynamics is engaged. The methodology in use is generic and can be applied to optimize control systems for diverse robots of comparable complexity.

Download Full-text

A recurrent reinforcement learning approach applicable to highly uncertain environments

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420916258 ◽

2020 ◽

Vol 17 (2) ◽

pp. 172988142091625

Author(s):

Yang Li ◽

Shijie Guo ◽

Lishuang Zhu ◽

Toshiharu Mukai ◽

Zhongxue Gan

Keyword(s):

Reinforcement Learning ◽

Learning Approaches ◽

Simulation Environment ◽

Model Uncertainties ◽

Uncertain Environments ◽

The Real ◽

Real Environment ◽

Adaptive Policy ◽

And Robotics

Reinforcement learning has been a promising approach in control and robotics since data-driven learning leads to non-necessity of engineering knowledge. However, it usually requires many interactions with environments to train a controller. This is a practical limitation in some real environments, for example, robots where interactions with environments are restricted and time inefficient. Thus, learning is generally conducted with a simulation environment, and after the learning, migration is performed to apply the learned policy to the real environment, but the differences between the simulation environment and the real environment, for example, friction coefficients at joints, changing loads, may cause undesired results on the migration. To solve this problem, most learning approaches concentrate on retraining, system or parameter identification, as well as adaptive policy training. In this article, we propose an approach where an adaptive policy is learned by extracting more information from the data. An environmental encoder, which indirectly reflects the parameters of an environment, is trained by explicitly incorporating model uncertainties into long-term planning and policy learning. This approach can identify the environment differences when migrating the learned policy to a real environment, thus increase the adaptability of the policy. Moreover, its applicability to autonomous learning in control tasks is also verified.

Download Full-text

Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot

Applied Soft Computing ◽

10.1016/j.asoc.2021.107601 ◽

2021 ◽

pp. 107601

Author(s):

Saeed Saeedvand ◽

Hanjaya Mandala ◽

Jacky Baltes

Keyword(s):

Reinforcement Learning ◽

Humanoid Robot

Download Full-text

How to train your robot with deep reinforcement learning: lessons we have learned

The International Journal of Robotics Research ◽

10.1177/0278364920987859 ◽

2021 ◽

pp. 027836492098785

Author(s):

Julian Ibarz ◽

Jie Tan ◽

Chelsea Finn ◽

Mrinal Kalakrishnan ◽

Peter Pastor ◽

...

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Case Studies ◽

Real World ◽

Review Article ◽

The Real ◽

Complex Skills ◽

Real World Learning ◽

Level Sensor ◽

Embodied Agent

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

Download Full-text

Registration Error Analysis for Augmented Reality

Presence Teleoperators & Virtual Environments ◽

10.1162/pres.1997.6.4.413 ◽

1997 ◽

Vol 6 (4) ◽

pp. 413-432 ◽

Cited By ~ 67

Author(s):

Richard L. Holloway

Keyword(s):

Augmented Reality ◽

Error Analysis ◽

Optical Distortion ◽

Calibration Error ◽

Head Tracking ◽

System Delay ◽

System Calibration ◽

The Real ◽

Registration Error ◽

Real Environment

Augmented reality (AR) systems typically use see-through head-mounted displays (STHMDs) to superimpose images of computer-generated objects onto the user's view of the real environment in order to augment it with additional information. The main failing of current AR systems is that the virtual objects displayed in the STHMD appear in the wrong position relative to the real environment. This registration error has many causes: system delay, tracker error, calibration error, optical distortion, and misalignment of the model, to name only a few. Although some work has been done in the area of system calibration and error correction, very little work has been done on characterizing the nature and sensitivity of the errors that cause misregistration in AR systems. This paper presents the main results of an end-to-end error analysis of an optical STHMD-based tool for surgery planning. The analysis was done with a mathematical model of the system and the main results were checked by taking measurements on a real system under controlled circumstances. The model makes it possible to analyze the sensitivity of the system-registration error to errors in each part of the system. The major results of the analysis are: (1) Even for moderate head velocities, system delay causes more registration error than all other sources combined; (2) eye tracking is probably not necessary; (3) tracker error is a significant problem both in head tracking and in system calibration; (4) the World (or reference) coordinate system adds error and should be omitted when possible; (5) computational correction of optical distortion may introduce more delay-induced registration error than the distortion error it corrects, and (6) there are many small error sources that will make submillimeter registration almost impossible in an optical STHMD system without feedback. Although this model was developed for optical STHMDs for surgical planning, many of the results apply to other HMDs as well.

Download Full-text

RATIOS OF DIFFERENT TYPES OF ACTIVITY DEPENDING ON THE FREQUENCY OF TRANSFER OF SOCIAL ACTIVITY FROM THE REAL ENVIRONMENT TO THE VIRTUAL ENVIRONMENT AND VICE VERSA

Bulletin of Udmurt University. Series Philosophy. Psychology. Pedagogy ◽

10.35634/2412-9550-2019-29-3-286-290 ◽

2019 ◽

Vol 29 (3) ◽

pp. 286-290

Author(s):

A.I. Zagranichny

Keyword(s):

Virtual Environment ◽

Social Activity ◽

Educational Activity ◽

Young Specialist ◽

Play Activity ◽

The Real ◽

Real Environment ◽

Different Types ◽

Communicative Activity

The article presents the results of a research of different types of activity depending on the frequency of transfer of social activity from the real environment to the virtual environment and vice versa. In the course of the research the following types of activity were identified: play activity; educational activity; work; communicative activity. 214 respondents from the following cities participated in the research: Balakovo, Saratov, Moscow. They were at the age of 15 to 24 years. 52% of them were women. They had the following social statuses: "pupil", "student", "young specialist". The correlation interrelation between the specified types of activity and the frequency of transfer of social activity from one environment into another has been analyzed and interpreted. In the course of the research the following results were received: the frequency of transfer of social activity from the real environment to the virtual environment has a direct positive link with such types of activity as play activity (r=0.221; p <0.01); educational activity (r=0.228; p <0.01) and communicative activity (r=0.346; p <0.01). The frequency of transfer of social activity from the virtual environment to the real one has a direct positive link only with two types of activity: educational activity (r=0.188; p <0.05) and communicative activity (r=0.331; p <0.01).

Download Full-text