Algorithm of stable state spaces in reinforcement learning

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.

Download Full-text

Hierarchical Reinforcement Learning

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch122 ◽

2011 ◽

pp. 825-830

Author(s):

Carlos Diuk ◽

Michael Littman

Keyword(s):

Reinforcement Learning ◽

Learning Problems ◽

Underlying Structure ◽

Sequential Decision ◽

State Spaces ◽

Hierarchical Reinforcement Learning ◽

Markov Decision ◽

Finite Set ◽

State Abstraction ◽

Main Ideas

Reinforcement learning (RL) deals with the problem of an agent that has to learn how to behave to maximize its utility by its interactions with an environment (Sutton & Barto, 1998; Kaelbling, Littman & Moore, 1996). Reinforcement learning problems are usually formalized as Markov Decision Processes (MDP), which consist of a finite set of states and a finite number of possible actions that the agent can perform. At any given point in time, the agent is in a certain state and picks an action. It can then observe the new state this action leads to, and receives a reward signal. The goal of the agent is to maximize its long-term reward. In this standard formalization, no particular structure or relationship between states is assumed. However, learning in environments with extremely large state spaces is infeasible without some form of generalization. Exploiting the underlying structure of a problem can effect generalization and has long been recognized as an important aspect in representing sequential decision tasks (Boutilier et al., 1999). Hierarchical Reinforcement Learning is the subfield of RL that deals with the discovery and/or exploitation of this underlying structure. Two main ideas come into play in hierarchical RL. The first one is to break a task into a hierarchy of smaller subtasks, each of which can be learned faster and easier than the whole problem. Subtasks can also be performed multiple times in the course of achieving the larger task, reusing accumulated knowledge and skills. The second idea is to use state abstraction within subtasks: not every task needs to be concerned with every aspect of the state space, so some states can actually be abstracted away and treated as the same for the purpose of the given subtask.

Download Full-text

State Aggregation by Growing Neural Gas for Reinforcement Learning in Continuous State Spaces

2011 10th International Conference on Machine Learning and Applications and Workshops ◽

10.1109/icmla.2011.134 ◽

2011 ◽

Cited By ~ 7

Author(s):

Michael Baumann ◽

Hans Kleine Buning

Keyword(s):

Reinforcement Learning ◽

Growing Neural Gas ◽

State Spaces ◽

Neural Gas ◽

State Aggregation ◽

Continuous State

Download Full-text

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Machine Learning ◽

10.1007/bf00993591 ◽

1995 ◽

Vol 21 (3) ◽

pp. 199-233 ◽

Cited By ~ 86

Author(s):

Andrew W. Moore ◽

Christopher G. Atkeson

Keyword(s):

Reinforcement Learning ◽

State Spaces ◽

Variable Resolution

Download Full-text

Strategy Acquisition for Games Based on Simplified Reinforcement Learning Using a Strategy Network

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2005.p0203 ◽

2005 ◽

Vol 9 (2) ◽

pp. 203-210

Author(s):

Masaaki Kanakubo ◽

◽

Masafumi Hagiwara ◽

Keyword(s):

Reinforcement Learning ◽

Winning Strategy ◽

Evaluation Function ◽

State Spaces ◽

Large State ◽

Long Time ◽

Strategy Acquisition ◽

Game Board ◽

Gaussian Network ◽

Simplified Form

We propose a simplified form of reinforcement learning (RL) for game strategy acquisition using a strategy network. RL has been applied to a number of games, such as backgammon, checkers, etc. However, the application of RL to Othello or Shogi, which have very large state spaces, is more difficult because these games take a very long time to learning. The proposed strategy network is composed of N lines from N nodes on the game board with a single evaluation node as a 2-layer perceptron. These nodes denote all possible states of every square on the game board and can easily represent the evaluation function. Moreover, these nodes can also denote imaginary states, such as pieces that may exist in the next step, or denote every positional relation of two arbitrary pieces or other various board phases. After several thousands of games had been played, the strategy network quickly acquired a better evaluation function than that using a normalized Gaussian network. The computer player employing the strategy network beat a heuristic-based player that evaluates the values of pieces or places on the game board. The proposed strategy network was able to acquire good weightings of various features of game states. In addition, the player employing the strategy network for a 4×4 Othello task after co-evolutionary training acquired a winning strategy.

Download Full-text

Learning Representations in Model-Free Hierarchical Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110009 ◽

2019 ◽

Vol 33 ◽

pp. 10009-10010

Author(s):

Jacob Rafati ◽

David C. Noelle

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Large Scale ◽

Temporal Abstraction ◽

State Spaces ◽

Hierarchical Reinforcement Learning ◽

Model Free ◽

Small Set ◽

Multiple Levels ◽

Novel Model

Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. We present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences of the agent. When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on a variant of the rooms environment.

Download Full-text

A Reinforcement Learning Algorithm for Continuous State Spaces using Multiple Fuzzy-ART Networks

2006 SICE-ICASE International Joint Conference ◽

10.1109/sice.2006.315140 ◽

2006 ◽

Cited By ~ 6

Author(s):

Takeshi Tateyama ◽

Seiichi Kawata ◽

Yoshiki Shimomura

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

State Spaces ◽

Fuzzy Art ◽

Continuous State ◽

Reinforcement Learning Algorithm

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text