scholarly journals Open Loop Position Control of Soft Continuum Arm Using Deep Reinforcement Learning

Author(s):  
Sreeshankar Satheeshbabu ◽  
Naveen Kumar Uppalapati ◽  
Girish Chowdhary ◽  
Girish Krishnan

Soft robots undergo large nonlinear spatial deformations due to both inherent actuation and external loading. The physics underlying these deformations is complex, and often requires intricate analytical and numerical models. The complexity of these models may render traditional model based control difficult and unsuitable. Model-free methods offer an alternative for analyzing the behavior of such complex systems without the need for elaborate modeling techniques.In this paper, we present a model-free approach for open loop position control of a soft spatial continuum arm, based on deep reinforcement learning. The continuum arm is pneumatically actuated and attains a spatial workspace by a combination ofunidirectional bending and bidirectional torsional deformation. We use Deep-Q Learning with experience replay to train the system in simulation. The efficacy and robustness of the control policy obtained from the system is validated both in simulation and on the continuum arm prototype for varying external loading conditions

2015 ◽  
Vol 787 ◽  
pp. 843-847
Author(s):  
Leo Raju ◽  
R.S. Milton ◽  
S. Sakthiyanandan

In this paper, two solar Photovoltaic (PV) systems are considered; one in the department with capacity of 100 kW and the other in the hostel with capacity of 200 kW. Each one has battery and load. The capital cost and energy savings by conventional methods are compared and it is proved that the energy dependency from grid is reduced in solar micro-grid element, operating in distributed environment. In the smart grid frame work, the grid energy consumption is further reduced by optimal scheduling of the battery, using Reinforcement Learning. Individual unit optimization is done by a model free reinforcement learning method, called Q-Learning and it is compared with distributed operations of solar micro-grid using a Multi Agent Reinforcement Learning method, called Joint Q-Learning. The energy planning is designed according to the prediction of solar PV energy production and observed load pattern of department and the hostel. A simulation model was developed using Python programming.


2021 ◽  
Vol 11 (18) ◽  
pp. 8419
Author(s):  
Jiang Zhao ◽  
Jiaming Sun ◽  
Zhihao Cai ◽  
Longhong Wang ◽  
Yingxun Wang

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.


Author(s):  
Todd M. Gureckis ◽  
Bradley C. Love

Reinforcement learning (RL) refers to the scientific study of how animals and machines adapt their behavior in order to maximize reward. The history of RL research can be traced to early work in psychology on instrumental learning behavior. However, the modern field of RL is a highly interdisciplinary area that lies that the intersection of ideas in computer science, machine learning, psychology, and neuroscience. This chapter summarizes the key mathematical ideas underlying this field including the exploration/exploitation dilemma, temporal-difference (TD) learning, Q-learning, and model-based versus model-free learning. In addition, a broad survey of open questions in psychology and neuroscience are reviewed.


Author(s):  
Mingfang Liu ◽  
Zhirui Zhao ◽  
Wei Zhang ◽  
Lina Hao

Humanoid robotic hand actuated by shape memory alloy (SMA) represents a new emerging technology. SMA has a wide range of potential applications in many different fields, ranging from industrial assembly to biomedicine applications, due to the characteristic of high power-to-weight ratio, low driving voltages and noiselessness. However, nonlinearities of SMA and complex dynamic models of SMA-based robotic hands result in difficulties in controlling. In this paper, a humanoid SMA-based robotic hand composed of five fingers is presented with the ability of adaptive grasping. Reinforcement learning as a model-free control strategy can search for optimal control of systems with nonlinear and uncertainty. Therefore, an adaptive SA-Q-Learning (ASA-Q-learning) controller is proposed to control the humanoid robotic finger. The performance of ASA-Q-learning controller is compared with SA-Q-learning and PID controller through experimentation. Results have shown that ASA-Q-learning controller can control the humanoid SMA-based robotic hand effectively with faster convergence rate and higher control precision than SA-Q-learning and PID controller, and is feasible for implementation in a model-free system.


2020 ◽  
Vol 17 (1) ◽  
pp. 172988141989834
Author(s):  
Guoyu Zuo ◽  
Qishen Zhao ◽  
Jiahao Lu ◽  
Jiangeng Li

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.


2021 ◽  
Author(s):  
William Fernando Villota Jácome ◽  
Oscar Mauricio Caicedo Rendon ◽  
Nelson Luis Saldanha da Fonseca

Network Slicing is a promising technology for providing customized logical and virtualized networks for the industry’s vertical segments.This paper proposes SARA and DSARA for the performance of admission control and resource allocation for network slice requests of eMBB, URLLC, and MIoT type in the 5G core network. SARA introduced a Q-learning based algorithm and DSARA a DQN-based algorithm to select the most profitable requests from a set that arrived in given time windows. These algorithms are model-free, meaning they do not make assumptions about the substrate network as do optimization based approaches.


Author(s):  
Ziwei Luo ◽  
Jing Hu ◽  
Xin Wang ◽  
Siwei Lyu ◽  
Bin Kong ◽  
...  

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.


Author(s):  
Yufei Wei ◽  
Xiaotong Nie ◽  
Motoaki Hiraga ◽  
Kazuhiro Ohkura ◽  
Zlatan Car ◽  
...  

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.


Author(s):  
M. A. Bucci ◽  
O. Semeraro ◽  
A. Allauzen ◽  
G. Wisniewski ◽  
L. Cordier ◽  
...  

Deep reinforcement learning (DRL) is applied to control a nonlinear, chaotic system governed by the one-dimensional Kuramoto–Sivashinsky (KS) equation. DRL uses reinforcement learning principles for the determination of optimal control solutions and deep neural networks for approximating the value function and the control policy. Recent applications have shown that DRL may achieve superhuman performance in complex cognitive tasks. In this work, we show that using restricted localized actuation, partial knowledge of the state based on limited sensor measurements and model-free DRL controllers, it is possible to stabilize the dynamics of the KS system around its unstable fixed solutions, here considered as target states. The robustness of the controllers is tested by considering several trajectories in the phase space emanating from different initial conditions; we show that DRL is always capable of driving and stabilizing the dynamics around target states. The possibility of controlling the KS system in the chaotic regime by using a DRL strategy solely relying on local measurements suggests the extension of the application of RL methods to the control of more complex systems such as drag reduction in bluff-body wakes or the enhancement/diminution of turbulent mixing.


Sign in / Sign up

Export Citation Format

Share Document