Learning Representations in Model-Free Hierarchical Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110009 ◽

2019 ◽

Vol 33 ◽

pp. 10009-10010

Author(s):

Jacob Rafati ◽

David C. Noelle

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Large Scale ◽

Temporal Abstraction ◽

State Spaces ◽

Hierarchical Reinforcement Learning ◽

Model Free ◽

Small Set ◽

Multiple Levels ◽

Novel Model

Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. We present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences of the agent. When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on a variant of the rooms environment.

Download Full-text

An Efficient Model-Free Approach for Controlling Large-scale Canals via Hierarchical Reinforcement Learning

IEEE Transactions on Industrial Informatics ◽

10.1109/tii.2020.3004857 ◽

2020 ◽

pp. 1-1

Author(s):

Tao Ren ◽

Jianwei Niu ◽

Xuefeng Liu ◽

Jiyan Wu ◽

Xiaohui Lei ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Hierarchical Reinforcement Learning ◽

Model Free ◽

Model Free Approach

Download Full-text

Sensors Integrated Control of PEMFC Gas Supply System Based on Large-Scale Deep Reinforcement Learning

Sensors ◽

10.3390/s21020349 ◽

2021 ◽

Vol 21 (2) ◽

pp. 349

Author(s):

Jiawen Li ◽

Tao Yu

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Integrated Control ◽

Supply System ◽

Proton Exchange ◽

Coordination Problem ◽

Hydrogen Flow ◽

Model Free ◽

Integrated Controller ◽

Gas Supply

In the proton exchange membrane fuel cell (PEMFC) system, the flow of air and hydrogen is the main factor influencing the output characteristics of PEMFC, and there is a coordination problem between their flow controls. Thus, the integrated controller of the PEMFC gas supply system based on distributed deep reinforcement learning (DDRL) is proposed to solve this problem, it combines the original airflow controller and hydrogen flow controller into one. Besides, edge-cloud collaborative multiple tricks distributed deep deterministic policy gradient (ECMTD-DDPG) algorithm is presented. In this algorithm, an edge exploration policy is adopted, suggesting that the edge explores including DDPG, soft actor-critic (SAC), and conventional control algorithm are employed to realize distributed exploration in the environment, and a classified experience replay mechanism is introduced to improve exploration efficiency. Moreover, various tricks are combined with the cloud centralized training policy to address the overestimation of Q-value in DDPG. Ultimately, a model-free integrated controller of the PEMFC gas supply system with better global searching ability and training efficiency is obtained. The simulation verifies that the controller enables the flows of air and hydrogen to respond more rapidly to the changing load.

Download Full-text

MODEL-FREE INTELLIGENT CONTROL USING REINFORCEMENT LEARNING AND TEMPORAL ABSTRACTION-APPLIED TO pH CONTROL

IFAC Proceedings Volumes ◽

10.3182/20050703-6-cz-1902.00242 ◽

2005 ◽

Vol 38 (1) ◽

pp. 127-132

Author(s):

S. Syafiie ◽

F. Tadeo ◽

E. Martinez

Keyword(s):

Reinforcement Learning ◽

Intelligent Control ◽

Ph Control ◽

Temporal Abstraction ◽

Model Free

Download Full-text

Hierarchical Reinforcement Learning

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch122 ◽

2011 ◽

pp. 825-830

Author(s):

Carlos Diuk ◽

Michael Littman

Keyword(s):

Reinforcement Learning ◽

Learning Problems ◽

Underlying Structure ◽

Sequential Decision ◽

State Spaces ◽

Hierarchical Reinforcement Learning ◽

Markov Decision ◽

Finite Set ◽

State Abstraction ◽

Main Ideas

Reinforcement learning (RL) deals with the problem of an agent that has to learn how to behave to maximize its utility by its interactions with an environment (Sutton & Barto, 1998; Kaelbling, Littman & Moore, 1996). Reinforcement learning problems are usually formalized as Markov Decision Processes (MDP), which consist of a finite set of states and a finite number of possible actions that the agent can perform. At any given point in time, the agent is in a certain state and picks an action. It can then observe the new state this action leads to, and receives a reward signal. The goal of the agent is to maximize its long-term reward. In this standard formalization, no particular structure or relationship between states is assumed. However, learning in environments with extremely large state spaces is infeasible without some form of generalization. Exploiting the underlying structure of a problem can effect generalization and has long been recognized as an important aspect in representing sequential decision tasks (Boutilier et al., 1999). Hierarchical Reinforcement Learning is the subfield of RL that deals with the discovery and/or exploitation of this underlying structure. Two main ideas come into play in hierarchical RL. The first one is to break a task into a hierarchy of smaller subtasks, each of which can be learned faster and easier than the whole problem. Subtasks can also be performed multiple times in the course of achieving the larger task, reusing accumulated knowledge and skills. The second idea is to use state abstraction within subtasks: not every task needs to be concerned with every aspect of the state space, so some states can actually be abstracted away and treated as the same for the purpose of the given subtask.

Download Full-text

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2019.2891792 ◽

2019 ◽

Vol 30 (11) ◽

pp. 3409-3418 ◽

Cited By ~ 7

Author(s):

Nat Dilokthanakul ◽

Christos Kaplanis ◽

Nick Pawlowski ◽

Murray Shanahan

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Hierarchical Reinforcement Learning

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text

Reinforcement Learning Based on Intrinsic Motivation and Temporal Abstraction via Transformation Invariance

TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series C ◽

10.1299/kikaic.79.289 ◽

2013 ◽

Vol 79 (798) ◽

pp. 289-303

Author(s):

Gakuto MASUYAMA ◽

Atsushi YAMASHITA ◽

Hajime ASAMA

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Temporal Abstraction

Download Full-text

A Novel Model-Free Actor-Critic Reinforcement Learning Approach for Dynamic Target Tracking

2020 IEEE Midwest Industry Conference (MIC) ◽

10.1109/mic50194.2020.9209618 ◽

2020 ◽

Author(s):

Amr Elhussein ◽

Md Suruz Miah

Keyword(s):

Reinforcement Learning ◽

Target Tracking ◽

Learning Approach ◽

Model Free ◽

Dynamic Target ◽

Novel Model

Download Full-text

Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile Edge Computing via Hierarchical Reinforcement Learning

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3071531 ◽

2021 ◽

pp. 1-1

Author(s):

Tao Ren ◽

Jianwei Niu ◽

Bin Dai ◽

Xuefeng Liu ◽

Zheyuan Hu ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Edge Computing ◽

Mobile Edge Computing ◽

Hierarchical Reinforcement Learning

Download Full-text

Automatic Hierarchical Reinforcement Learning for Efficient Large-Scale Service Composition

2016 IEEE International Conference on Web Services (ICWS) ◽

10.1109/icws.2016.17 ◽

2016 ◽

Cited By ~ 6

Author(s):

Hongbing Wang ◽

Guicheng Huang ◽

Qi Yu

Keyword(s):

Reinforcement Learning ◽

Service Composition ◽

Large Scale ◽

Hierarchical Reinforcement Learning

Download Full-text