Action selection in continuous state and action spaces by cooperation and competition of extended kohonen maps

Author(s):  
Kian Hsiang Low ◽  
Wee Kheng Leow ◽  
Marcelo H. Ang
Author(s):  
Jie Zhong ◽  
Tao Wang ◽  
Lianglun Cheng

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3039
Author(s):  
Bao Chau Phan ◽  
Ying-Chih Lai ◽  
Chin E. Lin

On the issues of global environment protection, the renewable energy systems have been widely considered. The photovoltaic (PV) system converts solar power into electricity and significantly reduces the consumption of fossil fuels from environment pollution. Besides introducing new materials for the solar cells to improve the energy conversion efficiency, the maximum power point tracking (MPPT) algorithms have been developed to ensure the efficient operation of PV systems at the maximum power point (MPP) under various weather conditions. The integration of reinforcement learning and deep learning, named deep reinforcement learning (DRL), is proposed in this paper as a future tool to deal with the optimization control problems. Following the success of deep reinforcement learning (DRL) in several fields, the deep Q network (DQN) and deep deterministic policy gradient (DDPG) are proposed to harvest the MPP in PV systems, especially under a partial shading condition (PSC). Different from the reinforcement learning (RL)-based method, which is only operated with discrete state and action spaces, the methods adopted in this paper are used to deal with continuous state spaces. In this study, DQN solves the problem with discrete action spaces, while DDPG handles the continuous action spaces. The proposed methods are simulated in MATLAB/Simulink for feasibility analysis. Further tests under various input conditions with comparisons to the classical Perturb and observe (P&O) MPPT method are carried out for validation. Based on the simulation results in this study, the performance of the proposed methods is outstanding and efficient, showing its potential for further applications.


2016 ◽  
Vol 7 (3) ◽  
pp. 23-42 ◽  
Author(s):  
Daniel Hein ◽  
Alexander Hentschel ◽  
Thomas A. Runkler ◽  
Steffen Udluft

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.


2019 ◽  
Vol 2019 ◽  
pp. 1-16
Author(s):  
Zhenyu Zhao ◽  
Shuguang Yuan ◽  
Qingyun Nie ◽  
Weishang Guo

In a spot wholesale electricity market containing strategic bidding interactions among wind power producers and other participants such as fossil generation companies and distribution companies, the randomly fluctuating natures of wind power hinders not only the modeling and simulating of the dynamic bidding process and equilibrium of the electricity market but also the effectiveness about keeping economy and reliability in market clearing (economic dispatching) corresponding to the independent system operator. Because the gradient descent continuous actor-critic algorithm is demonstrated as an effective method in dealing with Markov’s decision-making problems with continuous state and action spaces and the robust economic dispatch model can optimize the permitted real-time wind power deviation intervals based on wind power producers’ bidding power output, in this paper, considering bidding interactions among wind power producers and other participants, we propose a gradient descent continuous actor-critic algorithm-based hour-ahead electricity market modeling approach with the robust economic dispatch model embedded. Simulations are implemented on the IEEE 30-bus test system, which, to some extent, verifies the market operation economy and the robustness against wind power fluctuations by using our proposed modeling approach.


2019 ◽  
Vol 9 (17) ◽  
pp. 3456 ◽  
Author(s):  
Enrico Anderlini ◽  
Gordon G. Parker ◽  
Giles Thomas

To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) willneed to autonomously dock onto a charging station. Here, reinforcement learning strategies wereapplied for the first time to control the docking of an AUV onto a fixed platform in a simulationenvironment. Two reinforcement learning schemes were investigated: one with continuous stateand action spaces, deep deterministic policy gradient (DDPG), and one with continuous state butdiscrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as stepchanges in the control input signals. The performance of the reinforcement learning strategies wascompared with classical and optimal control techniques. The control actions selected by DDPG sufferfrom chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents thebest compromise between short docking time and low control effort, whilst meeting the dockingrequirements. Whereas the reinforcement learning algorithms present a very high computational costat training time, they are five orders of magnitude faster than optimal control at deployment time,thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performancesimilar to optimal control at a much lower computational cost at deployment, whilst also presentinga more general framework.


2013 ◽  
Vol 39 (2) ◽  
pp. 267-278 ◽  
Author(s):  
Ngo Anh Vien ◽  
Wolfgang Ertel ◽  
Tae Choong Chung

Sign in / Sign up

Export Citation Format

Share Document