scholarly journals Reinforcement Learning of Shared Control Policies for Dexterous Telemanipulation

Author(s):  
Takahiro Hasegawa ◽  
Takamitsu Matsubara ◽  
Kenji Sugimoto
Author(s):  
James D. Cunningham ◽  
Simon W. Miller ◽  
Michael A. Yukish ◽  
Timothy W. Simpson ◽  
Conrad S. Tucker

Abstract We present a form-aware reinforcement learning (RL) method to extend control knowledge from one design form to another, without losing the ability to control the original design. A major challenge in developing control knowledge is the creation of generalized control policies across designs of varying form. Our presented RL policy is form-aware because in addition to receiving dynamic state information about the environment, it also receives states that encode information about the form of the design that is being controlled. In this paper, we investigate the impact of this mixed state space on transfer learning. We present a transfer learning method for extending a control policy to a different design form, while continuing to expose the agent to the original design during the training of the new design. To demonstrate this concept, we present a case study of a multi-rotor aircraft simulation, wherein the designated task is to achieve a stable hover. We show that by introducing form states, an RL agent is able to learn a control policy to achieve the hovering task with both a four rotor and three rotor design at once, whereas without the form states it can only hover with the four rotor design. We also benchmark our method against a test case that removes the transfer learning component, as well as a test case that removes the continued exposure to the original design to show the value of each of these components. We find that form states, transfer learning, and parallel learning all contribute to a more robust control policy for the new design, and that parallel learning is especially important for maintaining control knowledge of the original design.


Author(s):  
Peyman Moghadas ◽  
Richard Malak ◽  
Darren Hartl

Origami-inspired engineering provides engineers with new means for creating complicated three-dimensional structures through use of folding and fold-like operations. Motivated by the vision of origami engineering, we have created and modeled a reconfigurable self-folding sheet based on a laminate structure of shape memory alloy (SMA) surrounding a layer of elastomer. Folding behavior is achieved by activating an SMA layer through localized heating. In prior work, we demonstrated localized control of such a sheet using PID and On/Off type feedback controllers. The implementation of these control strategies requires several workarounds to deal with the highly nonlinear and hysteretic behavior of the SMA-based laminate sheet. In the current work, we use a reinforcement learning algorithm to learn control policies that better handle these aspects of the sheet behavior. We perform learning on a reduced order model of the sheet developed based on classical laminate plate theory. This significantly reduces computational costs compared to more complicated finite element modeling options. We demonstrate the effectiveness of the learned control policies in several folding scenarios on the reduced order model. Our results show that reinforcement learning can be a useful tool in feedback control of SMA-based structures.


2021 ◽  
Vol 8 ◽  
Author(s):  
S. M. Nahid Mahmud ◽  
Scott A. Nivison ◽  
Zachary I. Bell ◽  
Rushikesh Kamalapurkar

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.


2021 ◽  
Vol 2042 (1) ◽  
pp. 012004
Author(s):  
L Di Natale ◽  
B Svetozarevic ◽  
P Heer ◽  
C N Jones

Abstract Deep Reinforcement Learning (DRL) recently emerged as a possibility to control complex systems without the need to model them. However, since weeks long experiments are needed to assess the performance of a building controller, people still have to rely on accurate simulation environments to train and tune DRL agents in tractable amounts of time before deploying them, shifting the burden back to the original issue of designing complex models. In this work, we show that it is possible to learn control policies on simple black-box linear room temperature models, thereby alleviating the heavy engineering usually required to build accurate surrogates. We develop a black-box pipeline, where historical data is taken as input to produce room temperature control policies. The trained DRL agents are capable of beating industrial rule-based controllers both in terms of energy consumption and comfort satisfaction, using novel penalties to introduce expert knowledge, i.e. to incentivize agents to follow expected behaviors, in the reward function. Moreover, one of the best agents was deployed on a real building for one week and was able to save energy while maintaining adequate comfort levels, indicating that low-complexity models might be enough to learn control policies that perform well on real buildings.


Sign in / Sign up

Export Citation Format

Share Document