Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of two-wheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and its corresponding weight vector, and propose a reward function with additional subgoal reward function. Finally, we give a hierarchical reinforcement learning algorithm for finding the optimal strategy. Simulation experiments show that, the proposed algorithm is more effectiveness than traditional reinforcement learning algorithm in convergent speed. So in our system, the robots can get self-balanced very quickly.

Download Full-text

A hierarchical reinforcement learning algorithm based on heuristic reward function

2010 2nd International Conference on Advanced Computer Control ◽

10.1109/icacc.2010.5486837 ◽

2010 ◽

Author(s):

Qicui Yan ◽

Quan Liu ◽

Daojing Hu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

An Extension of a Hierarchical Reinforcement Learning Algorithm for Multiagent Settings

Lecture Notes in Computer Science - Recent Advances in Reinforcement Learning ◽

10.1007/978-3-642-29946-9_26 ◽

2012 ◽

pp. 261-272

Author(s):

Ioannis Lambrou ◽

Vassilis Vassiliades ◽

Chris Christodoulou

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text

A Modular Hierarchical Reinforcement Learning Algorithm

Lecture Notes in Computer Science - Intelligent Computing Theories and Applications ◽

10.1007/978-3-642-31576-3_48 ◽

2012 ◽

pp. 375-382

Author(s):

Zhibin Liu ◽

Xiaoqin Zeng ◽

Huiyi Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Seizure Control in a Computational Model Using a Reinforcement Learning Stimulation Paradigm

International Journal of Neural Systems ◽

10.1142/s0129065717500125 ◽

2017 ◽

Vol 27 (07) ◽

pp. 1750012 ◽

Cited By ~ 8

Author(s):

Vivek Nagaraj ◽

Andrew Lamperski ◽

Theoden I Netoff

Keyword(s):

Reinforcement Learning ◽

Computational Model ◽

Therapeutic Efficacy ◽

Learning Algorithm ◽

Field Potential ◽

Stimulation Frequency ◽

Patient Specific ◽

Reward Function ◽

Wide Range ◽

Reinforcement Learning Algorithm

Neuromodulation technologies such as vagus nerve stimulation and deep brain stimulation, have shown some efficacy in controlling seizures in medically intractable patients. However, inherent patient-to-patient variability of seizure disorders leads to a wide range of therapeutic efficacy. A patient specific approach to determining stimulation parameters may lead to increased therapeutic efficacy while minimizing stimulation energy and side effects. This paper presents a reinforcement learning algorithm that optimizes stimulation frequency for controlling seizures with minimum stimulation energy. We apply our method to a computational model called the epileptor. The epileptor model simulates inter-ictal and ictal local field potential data. In order to apply reinforcement learning to the Epileptor, we introduce a specialized reward function and state-space discretization. With the reward function and discretization fixed, we test the effectiveness of the temporal difference reinforcement learning algorithm (TD(0)). For periodic pulsatile stimulation, we derive a relation that describes, for any stimulation frequency, the minimal pulse amplitude required to suppress seizures. The TD(0) algorithm is able to identify parameters that control seizures quickly. Additionally, our results show that the TD(0) algorithm refines the stimulation frequency to minimize stimulation energy thereby converging to optimal parameters reliably. An advantage of the TD(0) algorithm is that it is adaptive so that the parameters necessary to control the seizures can change over time. We show that the algorithm can converge on the optimal solution in simulation with slow and fast inter-seizure intervals.

Download Full-text

A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

IEEE Access ◽

10.1109/access.2018.2854283 ◽

2018 ◽

Vol 6 ◽

pp. 49089-49102 ◽

Cited By ~ 14

Author(s):

Tuyen P. Le ◽

Ngo Anh Vien ◽

TaeChoong Chung

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Learning Algorithm ◽

Decision Processes ◽

Hierarchical Reinforcement Learning ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Reinforcement Learning Algorithm

Download Full-text

Optimizing the Sensor Placement for Foot Plantar Center of Pressure without Prior Knowledge Using Deep Reinforcement Learning

Sensors ◽

10.3390/s20195588 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5588

Author(s):

Cheng-Wu Lin ◽

Shanq-Jang Ruan ◽

Wei-Chun Hsu ◽

Ya-Wen Tu ◽

Shao-Li Han

Keyword(s):

Reinforcement Learning ◽

Prior Knowledge ◽

Center Of Pressure ◽

Learning Algorithm ◽

Sensor Placement ◽

Ground Truth ◽

Reward System ◽

The Self ◽

Anatomical Area ◽

Reinforcement Learning Algorithm

We study the foot plantar sensor placement by a deep reinforcement learning algorithm without using any prior knowledge of the foot anatomical area. To apply a reinforcement learning algorithm, we propose a sensor placement environment and reward system that aims to optimize fitting the center of pressure (COP) trajectory during the self-selected speed running task. In this environment, the agent considers placing eight sensors within a 7 × 20 grid coordinate system, and then the final pattern becomes the result of sensor placement. Our results show that this method (1) can generate a sensor placement, which has a low mean square error in fitting ground truth COP trajectory, and (2) robustly discovers the optimal sensor placement in a large number of combinations, which is more than 116 quadrillion. This method is also feasible for solving different tasks, regardless of the self-selected speed running task.

Download Full-text