Open Loop Position Control of Soft Continuum Arm Using Deep Reinforcement Learning

Soft robots undergo large nonlinear spatial deformations due to both inherent actuation and external loading. The physics underlying these deformations is complex, and often requires intricate analytical and numerical models. The complexity of these models may render traditional model based control difficult and unsuitable. Model-free methods offer an alternative for analyzing the behavior of such complex systems without the need for elaborate modeling techniques.In this paper, we present a model-free approach for open loop position control of a soft spatial continuum arm, based on deep reinforcement learning. The continuum arm is pneumatically actuated and attains a spatial workspace by a combination ofunidirectional bending and bidirectional torsional deformation. We use Deep-Q Learning with experience replay to train the system in simulation. The efficacy and robustness of the control policy obtained from the system is validated both in simulation and on the continuum arm prototype for varying external loading conditions

Download Full-text

Energy Optimization of Solar Micro-Grid Using Multi Agent Reinforcement Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.787.843 ◽

2015 ◽

Vol 787 ◽

pp. 843-847

Author(s):

Leo Raju ◽

R.S. Milton ◽

S. Sakthiyanandan

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Learning Method ◽

Solar Pv ◽

Q Learning ◽

Pv Systems ◽

Model Free ◽

Individual Unit ◽

Multi Agent ◽

Micro Grid

In this paper, two solar Photovoltaic (PV) systems are considered; one in the department with capacity of 100 kW and the other in the hostel with capacity of 200 kW. Each one has battery and load. The capital cost and energy savings by conventional methods are compared and it is proved that the energy dependency from grid is reduced in solar micro-grid element, operating in distributed environment. In the smart grid frame work, the grid energy consumption is further reduced by optimal scheduling of the battery, using Reinforcement Learning. Individual unit optimization is done by a model free reinforcement learning method, called Q-Learning and it is compared with distributed operations of solar micro-grid using a Multi Agent Reinforcement Learning method, called Joint Q-Learning. The energy planning is designed according to the prediction of solar PV energy production and observed load pattern of department and the hostel. A simulation model was developed using Python programming.

Download Full-text

End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control

Applied Sciences ◽

10.3390/app11188419 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8419

Author(s):

Jiang Zhao ◽

Jiaming Sun ◽

Zhihao Cai ◽

Longhong Wang ◽

Yingxun Wang

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Control Method ◽

Control Policy ◽

Input Image ◽

Autonomous Control ◽

Policy Network ◽

Model Free ◽

Control Command ◽

End To End

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.

Download Full-text

Computational Reinforcement Learning

10.1093/oxfordhb/9780199957996.013.5 ◽

2015 ◽

Author(s):

Todd M. Gureckis ◽

Bradley C. Love

Keyword(s):

Reinforcement Learning ◽

Instrumental Learning ◽

Learning Behavior ◽

Q Learning ◽

Mathematical Ideas ◽

Model Free ◽

Versus Model ◽

Open Questions ◽

History Of ◽

Exploration Exploitation

Reinforcement learning (RL) refers to the scientific study of how animals and machines adapt their behavior in order to maximize reward. The history of RL research can be traced to early work in psychology on instrumental learning behavior. However, the modern field of RL is a highly interdisciplinary area that lies that the intersection of ideas in computer science, machine learning, psychology, and neuroscience. This chapter summarizes the key mathematical ideas underlying this field including the exploration/exploitation dilemma, temporal-difference (TD) learning, Q-learning, and model-based versus model-free learning. In addition, a broad survey of open questions in psychology and neuroscience are reviewed.

Download Full-text

Reinforcement learning control of a humanoid robotic hand actuated by shape memory alloy

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406220982019 ◽

2021 ◽

pp. 095440622098201

Author(s):

Mingfang Liu ◽

Zhirui Zhao ◽

Wei Zhang ◽

Lina Hao

Keyword(s):

Reinforcement Learning ◽

Shape Memory Alloy ◽

Shape Memory ◽

Pid Controller ◽

Weight Ratio ◽

Robotic Hand ◽

Free System ◽

Q Learning ◽

Model Free ◽

Wide Range

Humanoid robotic hand actuated by shape memory alloy (SMA) represents a new emerging technology. SMA has a wide range of potential applications in many different fields, ranging from industrial assembly to biomedicine applications, due to the characteristic of high power-to-weight ratio, low driving voltages and noiselessness. However, nonlinearities of SMA and complex dynamic models of SMA-based robotic hands result in difficulties in controlling. In this paper, a humanoid SMA-based robotic hand composed of five fingers is presented with the ability of adaptive grasping. Reinforcement learning as a model-free control strategy can search for optimal control of systems with nonlinear and uncertainty. Therefore, an adaptive SA-Q-Learning (ASA-Q-learning) controller is proposed to control the humanoid robotic finger. The performance of ASA-Q-learning controller is compared with SA-Q-learning and PID controller through experimentation. Results have shown that ASA-Q-learning controller can control the humanoid SMA-based robotic hand effectively with faster convergence rate and higher control precision than SA-Q-learning and PID controller, and is feasible for implementation in a model-free system.

Download Full-text

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419898342 ◽

2020 ◽

Vol 17 (1) ◽

pp. 172988141989834

Author(s):

Guoyu Zuo ◽

Qishen Zhao ◽

Jiahao Lu ◽

Jiangeng Li

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Learning To Learn ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Experience Replay ◽

Speed Up ◽

Reward Functions ◽

Robotic Tasks

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Download Full-text

Admission Control for 5G Network Slicing based on (Deep) Reinforcement Learning

10.36227/techrxiv.14498190.v1 ◽

2021 ◽

Author(s):

William Fernando Villota Jácome ◽

Oscar Mauricio Caicedo Rendon ◽

Nelson Luis Saldanha da Fonseca

Keyword(s):

Resource Allocation ◽

Reinforcement Learning ◽

Admission Control ◽

Time Windows ◽

Core Network ◽

Network Slicing ◽

Q Learning ◽

Model Free ◽

5G Network ◽

Substrate Network

Network Slicing is a promising technology for providing customized logical and virtualized networks for the industry’s vertical segments.This paper proposes SARA and DSARA for the performance of admission control and resource allocation for network slice requests of eMBB, URLLC, and MIoT type in the 5G core network. SARA introduced a Q-learning based algorithm and DSARA a DQN-based algorithm to select the most profitable requests from a set that arrived in given time windows. These algorithms are model-free, meaning they do not make assumptions about the substrate network as do optimization based approaches.

Download Full-text

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/382 ◽

2021 ◽

Author(s):

Ziwei Luo ◽

Jing Hu ◽

Xin Wang ◽

Siwei Lyu ◽

Bin Kong ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

High Dimensional ◽

Continuous Control ◽

Continuous Space ◽

Model Free ◽

Recent Success ◽

Image Translation ◽

Continuous State ◽

And Control

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text

Control of chaotic systems by deep reinforcement learning

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2019.0351 ◽

2019 ◽

Vol 475 (2231) ◽

pp. 20190351 ◽

Cited By ~ 3

Author(s):

M. A. Bucci ◽

O. Semeraro ◽

A. Allauzen ◽

G. Wisniewski ◽

L. Cordier ◽

...

Keyword(s):

Reinforcement Learning ◽

Bluff Body ◽

Initial Conditions ◽

Control Policy ◽

Chaotic Regime ◽

Model Free ◽

Local Measurements ◽

Target States ◽

The One ◽

Learning Principles

Deep reinforcement learning (DRL) is applied to control a nonlinear, chaotic system governed by the one-dimensional Kuramoto–Sivashinsky (KS) equation. DRL uses reinforcement learning principles for the determination of optimal control solutions and deep neural networks for approximating the value function and the control policy. Recent applications have shown that DRL may achieve superhuman performance in complex cognitive tasks. In this work, we show that using restricted localized actuation, partial knowledge of the state based on limited sensor measurements and model-free DRL controllers, it is possible to stabilize the dynamics of the KS system around its unstable fixed solutions, here considered as target states. The robustness of the controllers is tested by considering several trajectories in the phase space emanating from different initial conditions; we show that DRL is always capable of driving and stabilizing the dynamics around target states. The possibility of controlling the KS system in the chaotic regime by using a DRL strategy solely relying on local measurements suggests the extension of the application of RL methods to the control of more complex systems such as drag reduction in bluff-body wakes or the enhancement/diminution of turbulent mixing.

Download Full-text