Reinforcement learning with reference tracking control in continuous state spaces

Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input &#8211 serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm (PARP) [8], and PS-r* [9] are used. They are called PS-based methods. When applying reinforcement learning to actual problems, treatment of continuous-valued input is sometimes required. A method [10] based on RPM is proposed as a PS-based method corresponding to the continuous-valued input, but only rewards exist and penalties cannot be suitably handled. We studied the treatment of continuous-valued input suitable for a PS-based method in which the environment includes both rewards and penalties. Specifically, we propose having PARP correspond to continuous-valued input while simultaneously targeting the attainment of rewards and avoiding penalties. We applied our proposal to the pole-cart balancing problem and confirmed its validity.

Download Full-text

A State Space Compression Method Based on Multivariate Analysis for Reinforcement Learning in High-Dimensional Continuous State Spaces

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1093/ietfec/e89-a.8.2181 ◽

2006 ◽

Vol E89-A (8) ◽

pp. 2181-2191 ◽

Cited By ~ 2

Author(s):

H. SATOH

Keyword(s):

Multivariate Analysis ◽

Reinforcement Learning ◽

State Space ◽

High Dimensional ◽

Compression Method ◽

State Spaces ◽

Continuous State ◽

Space Compression

Download Full-text

Algorithm of stable state spaces in reinforcement learning

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.01328 ◽

2008 ◽

Vol 28 (5) ◽

pp. 1328-1330

Author(s):

Yu ZHENG

Keyword(s):

Reinforcement Learning ◽

Stable State ◽

State Spaces

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

Reinforcement learning versus swarm intelligence for autonomous multi-HAPS coordination

SN Applied Sciences ◽

10.1007/s42452-021-04658-6 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

Ogbonnaya Anicho ◽

Philip B. Charlesworth ◽

Gurvinder S. Baicher ◽

Atulya K. Nagar

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Swarm Intelligence ◽

Performance Indicators ◽

Convergence Rates ◽

Tuning Parameters ◽

Continuous State Space ◽

Continuous State ◽

User Coverage ◽

Better Than

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.

Download Full-text