Action selection in continuous state and action spaces by cooperation and competition of extended kohonen maps

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition

Sensors ◽

10.3390/s20113039 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3039

Author(s):

Bao Chau Phan ◽

Ying-Chih Lai ◽

Chin E. Lin

Keyword(s):

Reinforcement Learning ◽

Maximum Power ◽

Maximum Power Point ◽

Partial Shading ◽

Discrete State ◽

Efficient Operation ◽

Pv Systems ◽

Continuous State ◽

Power Point ◽

Action Spaces

On the issues of global environment protection, the renewable energy systems have been widely considered. The photovoltaic (PV) system converts solar power into electricity and significantly reduces the consumption of fossil fuels from environment pollution. Besides introducing new materials for the solar cells to improve the energy conversion efficiency, the maximum power point tracking (MPPT) algorithms have been developed to ensure the efficient operation of PV systems at the maximum power point (MPP) under various weather conditions. The integration of reinforcement learning and deep learning, named deep reinforcement learning (DRL), is proposed in this paper as a future tool to deal with the optimization control problems. Following the success of deep reinforcement learning (DRL) in several fields, the deep Q network (DQN) and deep deterministic policy gradient (DDPG) are proposed to harvest the MPP in PV systems, especially under a partial shading condition (PSC). Different from the reinforcement learning (RL)-based method, which is only operated with discrete state and action spaces, the methods adopted in this paper are used to deal with continuous state spaces. In this study, DQN solves the problem with discrete action spaces, while DDPG handles the continuous action spaces. The proposed methods are simulated in MATLAB/Simulink for feasibility analysis. Further tests under various input conditions with comparisons to the classical Perturb and observe (P&O) MPPT method are carried out for validation. Based on the simulation results in this study, the performance of the proposed methods is outstanding and efficient, showing its potential for further applications.

Download Full-text

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning ◽

10.1109/adprl.2009.4927527 ◽

2009 ◽

Cited By ~ 7

Author(s):

Jun Ma ◽

Warren B. Powell

Keyword(s):

Least Squares ◽

Markov Decision Process ◽

Decision Process ◽

Recursive Least Squares ◽

Iteration Algorithm ◽

Continuous State ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Policy Iteration Algorithm ◽

Action Spaces

Download Full-text

2P1-F30 Q-learning in Continuous State and Action Spaces

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2010._2p1-f30_1 ◽

2010 ◽

Vol 2010 (0) ◽

pp. _2P1-F30_1-_2P1-F30_4

Author(s):

Kazuaki YAMADA

Keyword(s):

Q Learning ◽

Continuous State ◽

Action Spaces

Download Full-text

Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2016070102 ◽

2016 ◽

Vol 7 (3) ◽

pp. 23-42 ◽

Cited By ~ 5

Author(s):

Daniel Hein ◽

Alexander Hentschel ◽

Thomas A. Runkler ◽

Steffen Udluft

Keyword(s):

Particle Swarm Optimization ◽

Reinforcement Learning ◽

A Priori ◽

Particle Swarm ◽

Optimization Techniques ◽

Swarm Optimization ◽

Continuous State ◽

On Line ◽

The Rich ◽

Action Spaces

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.

Download Full-text

A reinforcement learning algorithm developed to model GenCo strategic bidding behavior in multidimensional and continuous state and action spaces

2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2013.6614997 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alfred Yong Fu Lau ◽

Dipti Srinivasan ◽

Thomas Reindl

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Bidding Behavior ◽

Strategic Bidding ◽

Continuous State ◽

Action Spaces ◽

Reinforcement Learning Algorithm

Download Full-text

Receding-horizon planning using recursive Monte Carlo Tree Search with Sparse Action Sampling for continuous state and action spaces

2016 American Control Conference (ACC) ◽

10.1109/acc.2016.7526516 ◽

2016 ◽

Cited By ~ 1

Author(s):

Moritz Schneider

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Receding Horizon ◽

Monte Carlo Tree Search ◽

Continuous State ◽

Horizon Planning ◽

Action Spaces

Download Full-text

Continuous Reinforcement Algorithm and Robust Economic Dispatching-Based Spot Electricity Market Modeling considering Strategic Behaviors of Wind Power Producers and Other Participants

Journal of Electrical and Computer Engineering ◽

10.1155/2019/9406072 ◽

2019 ◽

Vol 2019 ◽

pp. 1-16

Author(s):

Zhenyu Zhao ◽

Shuguang Yuan ◽

Qingyun Nie ◽

Weishang Guo

Keyword(s):

Wind Power ◽

Electricity Market ◽

Gradient Descent ◽

Economic Dispatch ◽

Test System ◽

Modeling Approach ◽

Continuous State ◽

Market Operation ◽

System Operator ◽

Action Spaces

In a spot wholesale electricity market containing strategic bidding interactions among wind power producers and other participants such as fossil generation companies and distribution companies, the randomly fluctuating natures of wind power hinders not only the modeling and simulating of the dynamic bidding process and equilibrium of the electricity market but also the effectiveness about keeping economy and reliability in market clearing (economic dispatching) corresponding to the independent system operator. Because the gradient descent continuous actor-critic algorithm is demonstrated as an effective method in dealing with Markov’s decision-making problems with continuous state and action spaces and the robust economic dispatch model can optimize the permitted real-time wind power deviation intervals based on wind power producers’ bidding power output, in this paper, considering bidding interactions among wind power producers and other participants, we propose a gradient descent continuous actor-critic algorithm-based hour-ahead electricity market modeling approach with the robust economic dispatch model embedded. Simulations are implemented on the IEEE 30-bus test system, which, to some extent, verifies the market operation economy and the robustness against wind power fluctuations by using our proposed modeling approach.

Download Full-text

Docking Control of an Autonomous Underwater Vehicle Using Reinforcement Learning

Applied Sciences ◽

10.3390/app9173456 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3456 ◽

Cited By ~ 1

Author(s):

Enrico Anderlini ◽

Gordon G. Parker ◽

Giles Thomas

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Learning Strategies ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Computational Cost ◽

Control Input ◽

Control Effort ◽

Continuous State ◽

Action Spaces

To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) willneed to autonomously dock onto a charging station. Here, reinforcement learning strategies wereapplied for the first time to control the docking of an AUV onto a fixed platform in a simulationenvironment. Two reinforcement learning schemes were investigated: one with continuous stateand action spaces, deep deterministic policy gradient (DDPG), and one with continuous state butdiscrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as stepchanges in the control input signals. The performance of the reinforcement learning strategies wascompared with classical and optimal control techniques. The control actions selected by DDPG sufferfrom chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents thebest compromise between short docking time and low control effort, whilst meeting the dockingrequirements. Whereas the reinforcement learning algorithms present a very high computational costat training time, they are five orders of magnitude faster than optimal control at deployment time,thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performancesimilar to optimal control at a much lower computational cost at deployment, whilst also presentinga more general framework.

Download Full-text

Learning via human feedback in continuous state and action spaces

Applied Intelligence ◽

10.1007/s10489-012-0412-6 ◽

2013 ◽

Vol 39 (2) ◽

pp. 267-278 ◽

Cited By ~ 14

Author(s):

Ngo Anh Vien ◽

Wolfgang Ertel ◽

Tae Choong Chung

Keyword(s):

Continuous State ◽

Action Spaces

Download Full-text