Reinforcement learning-based optimization of locomotion controller using multiple coupled CPG oscillators for elongated undulating fin propulsion

<abstract> <p>This article proposes a locomotion controller inspired by black Knifefish for undulating elongated fin robot. The proposed controller is built by a modified CPG network using sixteen coupled Hopf oscillators with the feedback of the angle of each fin-ray. The convergence rate of the modified CPG network is optimized by a reinforcement learning algorithm. By employing the proposed controller, the undulating elongated fin robot can realize swimming pattern transformations naturally. Additionally, the proposed controller enables the configuration of the swimming pattern parameters known as the amplitude envelope, the oscillatory frequency to perform various swimming patterns. The implementation processing of the reinforcement learning-based optimization is discussed. The simulation and experimental results show the capability and effectiveness of the proposed controller through the performance of several swimming patterns in the varying oscillatory frequency and the amplitude envelope of each fin-ray.</p> </abstract>

Download Full-text

Operational “Feel” Adjustment by Reinforcement Learning for a Power-Assisted Positioning Task

International Journal of Automation Technology ◽

10.20965/ijat.2009.p0671 ◽

2009 ◽

Vol 3 (6) ◽

pp. 671-680 ◽

Cited By ~ 1

Author(s):

Tetsuya Morizono ◽

◽

Yoji Yamada ◽

Masatake Higashi ◽

◽

...

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Task Performance ◽

User Satisfaction ◽

Learning Algorithm ◽

Impedance Control ◽

Experimental Results ◽

Multiple Goals ◽

And Task

Controlling “feel” when operating a power-assist robot is important for improving robot operability, user satisfaction, and task performance efficiency. Autonomous adjustment of “feel” is considered with robots under impedance control, and reinforcement learning in adjustment when a task includes repetitive positioning is discussed. Experimental results demonstrate that an operational “feel” pattern appropriate for positioning at a goal is developed by adjustment. Adjustment assuming a single fixed goal is expanded to cases including multiple goals, in which it is assumed that one goal is chosen by a user in real time. To adjust operational “feel” to individual goals, an algorithm infers the goal. The same result as that for a single fixed goal is obtained in experiments, but experimental results suggest that design must be improved to where the accuracy of inference to the goal is taken into account by the adjustment learning algorithm.

Download Full-text

Approximate Value Iteration with Temporally Extended Actions

Journal of Artificial Intelligence Research ◽

10.1613/jair.4676 ◽

2015 ◽

Vol 53 ◽

pp. 375-438 ◽

Cited By ~ 3

Author(s):

Timothy A. Mann ◽

Shie Mannor ◽

Doina Precup

Keyword(s):

Reinforcement Learning ◽

Theoretical Analysis ◽

Convergence Rate ◽

Value Function ◽

Approximation Error ◽

Experimental Results ◽

Value Iteration ◽

Efficient Planning ◽

Approximate Value Iteration ◽

The Value Function

Temporally extended actions have proven useful for reinforcement learning, but their duration also makes them valuable for efficient planning. The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state-space. Next we consider the problem of generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at the landmark states. We analyze both FVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our experimental results in three different domains demonstrate the key properties from the analysis. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.

Download Full-text

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

Journal of Artificial Intelligence Research ◽

10.1613/jair.946 ◽

2002 ◽

Vol 16 ◽

pp. 259-292 ◽

Cited By ~ 74

Author(s):

X. Xu ◽

H. He ◽

D. Hu

Keyword(s):

Reinforcement Learning ◽

Markov Chains ◽

Least Squares ◽

Learning Algorithm ◽

Learning Control ◽

Experimental Results ◽

Recursive Least Squares ◽

Wide Range ◽

Adaptive Heuristic ◽

Data Efficiency

The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(lambda) are proved for ergodic Markov chains. Compared to the existing LS-TD(lambda) algorithm, RLS-TD(lambda) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(lambda) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(lambda) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(lambda). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(lambda) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.

Download Full-text

Model dependent reinforcement learning algorithm for reservoir operation stochastic optimization

International Journal of Hydrology ◽

10.15406/ijh.2018.02.00129 ◽

2018 ◽

Vol 2 (5) ◽

Author(s):

Li Wenwu

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Reservoir Operation ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Reinforcement learning algorithm for one-warehouse multi-retailer inventory problem

Automation, Mechanical and Electrical Engineering ◽

10.2495/amee140161 ◽

2014 ◽

Author(s):

C.Y. Li ◽

X.T. Wang ◽

T.W. Zhang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Inventory Problem ◽

Reinforcement Learning Algorithm

Download Full-text

Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning

Symmetry ◽

10.3390/sym13030471 ◽

2021 ◽

Vol 13 (3) ◽

pp. 471

Author(s):

Jai Hoon Park ◽

Kang Hoon Lee

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Design Space ◽

Learning Algorithm ◽

Computational Design ◽

Computational Method ◽

Learning Ability ◽

Modular Robots ◽

Control Mechanisms ◽

Candidate Structure

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.

Download Full-text

Intelligent Energy Management Strategy Based on an Improved Reinforcement Learning Algorithm With Exploration Factor for a Plug-in PHEV

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3085710 ◽

2021 ◽

pp. 1-11

Author(s):

Xinyou Lin ◽

Kuncheng Zhou ◽

Liping Mo ◽

Hailin Li

Keyword(s):

Reinforcement Learning ◽

Energy Management ◽

Management Strategy ◽

Learning Algorithm ◽

Energy Management Strategy ◽

Reinforcement Learning Algorithm

Download Full-text

A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds

Frontiers of Computer Science ◽

10.1007/s11704-020-9273-z ◽

2021 ◽

Vol 15 (5) ◽

Author(s):

Yao Qin ◽

Hua Wang ◽

Shanwen Yi ◽

Xiaole Li ◽

Linbo Zhai

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Multi Objective ◽

Reinforcement Learning Algorithm

Download Full-text

Optimization of PV Energy Conversion System Using Reinforcement Learning Algorithm

2020 20th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA) ◽

10.1109/sta50679.2020.9329331 ◽

2020 ◽

Author(s):

Mohamed Ali Zeddini ◽

Mourad Turki ◽

Mohamed Faouzi Mimoun

Keyword(s):

Reinforcement Learning ◽

Energy Conversion ◽

Learning Algorithm ◽

Conversion System ◽

Energy Conversion System ◽

Reinforcement Learning Algorithm

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text