Reinforcement Learning of Shared Control Policies for Dexterous Telemanipulation

Abstract We present a form-aware reinforcement learning (RL) method to extend control knowledge from one design form to another, without losing the ability to control the original design. A major challenge in developing control knowledge is the creation of generalized control policies across designs of varying form. Our presented RL policy is form-aware because in addition to receiving dynamic state information about the environment, it also receives states that encode information about the form of the design that is being controlled. In this paper, we investigate the impact of this mixed state space on transfer learning. We present a transfer learning method for extending a control policy to a different design form, while continuing to expose the agent to the original design during the training of the new design. To demonstrate this concept, we present a case study of a multi-rotor aircraft simulation, wherein the designated task is to achieve a stable hover. We show that by introducing form states, an RL agent is able to learn a control policy to achieve the hovering task with both a four rotor and three rotor design at once, whereas without the form states it can only hover with the four rotor design. We also benchmark our method against a test case that removes the transfer learning component, as well as a test case that removes the continued exposure to the original design to show the value of each of these components. We find that form states, transfer learning, and parallel learning all contribute to a more robust control policy for the new design, and that parallel learning is especially important for maintaining control knowledge of the original design.

Download Full-text

Shared control policies for safe wheelchair navigation of elderly adults with cognitive and mobility impairments: Designing a wizard of oz study

2014 American Control Conference ◽

10.1109/acc.2014.6859446 ◽

2014 ◽

Cited By ~ 9

Author(s):

Ian M. Mitchell ◽

Pooja Viswanathan ◽

Bikram Adhikari ◽

Eric Rothfels ◽

Alan K. Mackworth

Keyword(s):

Shared Control ◽

Elderly Adults ◽

Control Policies ◽

Wizard Of Oz ◽

Mobility Impairments

Download Full-text

Reinforcement Learning for Control of a Shape Memory Alloy Based Self-Folding Sheet

Volume 5B: 39th Mechanisms and Robotics Conference ◽

10.1115/detc2015-46980 ◽

2015 ◽

Cited By ~ 1

Author(s):

Peyman Moghadas ◽

Richard Malak ◽

Darren Hartl

Keyword(s):

Reinforcement Learning ◽

Shape Memory Alloy ◽

Shape Memory ◽

Laminate Plate ◽

Reduced Order Model ◽

Hysteretic Behavior ◽

Order Model ◽

Control Policies ◽

Reduced Order ◽

Highly Nonlinear

Origami-inspired engineering provides engineers with new means for creating complicated three-dimensional structures through use of folding and fold-like operations. Motivated by the vision of origami engineering, we have created and modeled a reconfigurable self-folding sheet based on a laminate structure of shape memory alloy (SMA) surrounding a layer of elastomer. Folding behavior is achieved by activating an SMA layer through localized heating. In prior work, we demonstrated localized control of such a sheet using PID and On/Off type feedback controllers. The implementation of these control strategies requires several workarounds to deal with the highly nonlinear and hysteretic behavior of the SMA-based laminate sheet. In the current work, we use a reinforcement learning algorithm to learn control policies that better handle these aspects of the sheet behavior. We perform learning on a reduced order model of the sheet developed based on classical laminate plate theory. This significantly reduces computational costs compared to more complicated finite element modeling options. We demonstrate the effectiveness of the learned control policies in several folding scenarios on the reduced order model. Our results show that reinforcement learning can be a useful tool in feedback control of SMA-based structures.

Download Full-text

A multi-agent reinforcement learning approach to obtaining dynamic control policies for stochastic lot scheduling problem

Simulation Modelling Practice and Theory ◽

10.1016/j.simpat.2004.12.003 ◽

2005 ◽

Vol 13 (5) ◽

pp. 389-406 ◽

Cited By ~ 41

Author(s):

Carlos D. Paternina-Arboleda ◽

Tapas K. Das

Keyword(s):

Reinforcement Learning ◽

Dynamic Control ◽

Learning Approach ◽

Scheduling Problem ◽

Control Policies ◽

Lot Scheduling ◽

Multi Agent

Download Full-text

Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.733104 ◽

2021 ◽

Vol 8 ◽

Author(s):

S. M. Nahid Mahmud ◽

Scott A. Nivison ◽

Zachary I. Bell ◽

Rushikesh Kamalapurkar

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Model Parameters ◽

Parametric Uncertainties ◽

Learning Approaches ◽

Learning Method ◽

Critical Systems ◽

Control Policies ◽

Safety Critical ◽

Model Based

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.

Download Full-text

Deep reinforcement learning for shared control of mobile robots

IET Cyber-Systems and Robotics ◽

10.1049/csy2.12036 ◽

2021 ◽

Author(s):

Chong Tian ◽

Shahil Shaik ◽

Yue Wang

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Shared Control

Download Full-text

Deep Reinforcement Learning for room temperature control: a black-box pipeline from data to policies

Journal of Physics Conference Series ◽

10.1088/1742-6596/2042/1/012004 ◽

2021 ◽

Vol 2042 (1) ◽

pp. 012004

Author(s):

L Di Natale ◽

B Svetozarevic ◽

P Heer ◽

C N Jones

Keyword(s):

Reinforcement Learning ◽

Temperature Control ◽

Room Temperature ◽

Expert Knowledge ◽

Low Complexity ◽

Black Box ◽

Control Policies ◽

Reward Function ◽

Complex Models ◽

Save Energy

Abstract Deep Reinforcement Learning (DRL) recently emerged as a possibility to control complex systems without the need to model them. However, since weeks long experiments are needed to assess the performance of a building controller, people still have to rely on accurate simulation environments to train and tune DRL agents in tractable amounts of time before deploying them, shifting the burden back to the original issue of designing complex models. In this work, we show that it is possible to learn control policies on simple black-box linear room temperature models, thereby alleviating the heavy engineering usually required to build accurate surrogates. We develop a black-box pipeline, where historical data is taken as input to produce room temperature control policies. The trained DRL agents are capable of beating industrial rule-based controllers both in terms of energy consumption and comfort satisfaction, using novel penalties to introduce expert knowledge, i.e. to incentivize agents to follow expected behaviors, in the reward function. Moreover, one of the best agents was deployed on a real building for one week and was able to save energy while maintaining adequate comfort levels, indicating that low-complexity models might be enough to learn control policies that perform well on real buildings.

Download Full-text