Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/196 ◽

2017 ◽

Cited By ~ 5

Author(s):

Aijun Bai ◽

Stuart Russell

Keyword(s):

Reinforcement Learning ◽

Hierarchical Structure ◽

Learning Algorithm ◽

State Of The Art ◽

State Machines ◽

Learning To Learn ◽

Hierarchical Reinforcement Learning ◽

Abstract Machines ◽

Finite State ◽

Q Values

In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQ-INT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

Download Full-text

Hierarchical Reinforcement Learning for Course Recommendation in MOOCs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301435 ◽

2019 ◽

Vol 33 ◽

pp. 435-442 ◽

Cited By ~ 9

Author(s):

Jing Zhang ◽

Bowen Hao ◽

Bo Chen ◽

Cuiping Li ◽

Hong Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Online Courses ◽

Learning Algorithm ◽

State Of The Art ◽

Massive Open Online Courses ◽

User Profiles ◽

Massive Open Online ◽

Hierarchical Reinforcement Learning ◽

Proposed Model ◽

Recent Attention

The proliferation of massive open online courses (MOOCs) demands an effective way of personalized course recommendation. The recent attention-based recommendation models can distinguish the effects of different historical courses when recommending different target courses. However, when a user has interests in many different courses, the attention mechanism will perform poorly as the effects of the contributing courses are diluted by diverse historical courses. To address such a challenge, we propose a hierarchical reinforcement learning algorithm to revise the user profiles and tune the course recommendation model on the revised profiles.Systematically, we evaluate the proposed model on a real dataset consisting of 1,302 courses, 82,535 users and 458,454 user enrolled behaviors, which were collected from XuetangX—one of the largest MOOCs in China. Experimental results show that the proposed model significantly outperforms the state-of-the-art recommendation models (improving 5.02% to 18.95% in terms of HR@10).

Download Full-text

An Extension of a Hierarchical Reinforcement Learning Algorithm for Multiagent Settings

Lecture Notes in Computer Science - Recent Advances in Reinforcement Learning ◽

10.1007/978-3-642-29946-9_26 ◽

2012 ◽

pp. 261-272

Author(s):

Ioannis Lambrou ◽

Vassilis Vassiliades ◽

Chris Christodoulou

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning

Sensors ◽

10.3390/s19071576 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1576 ◽

Cited By ~ 1

Author(s):

Xiaomao Zhou ◽

Tao Bai ◽

Yanbin Gao ◽

Yuntao Han

Keyword(s):

Reinforcement Learning ◽

Unsupervised Learning ◽

Learning Algorithm ◽

Spatial Scales ◽

Learning Performance ◽

Head Direction ◽

Topological Map ◽

Topological Maps ◽

Hierarchical Reinforcement Learning ◽

Continuous State

Extensive studies have shown that many animals’ capability of forming spatial representations for self-localization, path planning, and navigation relies on the functionalities of place and head-direction (HD) cells in the hippocampus. Although there are numerous hippocampal modeling approaches, only a few span the wide functionalities ranging from processing raw sensory signals to planning and action generation. This paper presents a vision-based navigation system that involves generating place and HD cells through learning from visual images, building topological maps based on learned cell representations and performing navigation using hierarchical reinforcement learning. First, place and HD cells are trained from sequences of visual stimuli in an unsupervised learning fashion. A modified Slow Feature Analysis (SFA) algorithm is proposed to learn different cell types in an intentional way by restricting their learning to separate phases of the spatial exploration. Then, to extract the encoded metric information from these unsupervised learning representations, a self-organized learning algorithm is adopted to learn over the emerged cell activities and to generate topological maps that reveal the topology of the environment and information about a robot’s head direction, respectively. This enables the robot to perform self-localization and orientation detection based on the generated maps. Finally, goal-directed navigation is performed using reinforcement learning in continuous state spaces which are represented by the population activities of place cells. In particular, considering that the topological map provides a natural hierarchical representation of the environment, hierarchical reinforcement learning (HRL) is used to exploit this hierarchy to accelerate learning. The HRL works on different spatial scales, where a high-level policy learns to select subgoals and a low-level policy learns over primitive actions to specialize on the selected subgoals. Experimental results demonstrate that our system is able to navigate a robot to the desired position effectively, and the HRL shows a much better learning performance than the standard RL in solving our navigation tasks.

Download Full-text

High-Speed and Area-Efficient Reconfigurable Multiplexer Bank for RAM-Based Finite State Machine Implementations

Journal of Circuits System and Computers ◽

10.1142/s0218126615501017 ◽

2015 ◽

Vol 24 (07) ◽

pp. 1550101 ◽

Cited By ~ 4

Author(s):

Raouf Senhadji-Navaro ◽

Ignacio Garcia-Vargas

Keyword(s):

High Speed ◽

Finite State Machine ◽

State Of The Art ◽

Behavioral Model ◽

State Machine ◽

The State ◽

Experimental Results ◽

State Machines ◽

Finite State ◽

Area Efficient

This work is focused on the problem of designing efficient reconfigurable multiplexer banks for RAM-based implementations of reconfigurable state machines. We propose a new architecture (called combination-based reconfigurable multiplexer bank, CRMUX) that use multiplexers simpler than that of the state-of-the-art architecture (called variation-based reconfigurable multiplexer bank, VRMUX). The performance (in terms of speed, area and reconfiguration cost) of both architectures is compared. Experimental results from MCNC finite state machine (FSM) benchmarks show that CRMUX is faster and more area-efficient than VRMUX. The reconfiguration cost of both multiplexer banks is studied using a behavioral model of a reconfigurable state machine. The results show that the reconfiguration cost of CRMUX is lower than that of VRMUX in most cases.

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text

A hierarchical reinforcement learning algorithm based on heuristic reward function

2010 2nd International Conference on Advanced Computer Control ◽

10.1109/icacc.2010.5486837 ◽

2010 ◽

Author(s):

Qicui Yan ◽

Quan Liu ◽

Daojing Hu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Deep Reinforcement Learning on HVAC Control

Information Technology and Management Science ◽

10.7250/itms-2018-0004 ◽

2018 ◽

Vol 21 ◽

pp. 29-36 ◽

Cited By ~ 1

Author(s):

Ivars Namatēvs

Keyword(s):

Reinforcement Learning ◽

Predictive Control ◽

Learning Algorithm ◽

State Of The Art ◽

Building Energy ◽

Computing Power ◽

Smart Building ◽

Q Learning ◽

Sensory Inputs ◽

Q Function

Due to increase of computing power and innovative approaches of an end-to-end reinforcement learning (RL) that feed data from high-dimensional sensory inputs, it is now plausible to combine RL and Deep learning to perform Smart Building Energy Control (SBEC) systems. Deep reinforcement learning (DRL) revolutionizes existing Q-learning algorithm to Deep Q-learning (DQL) profited by artificial neural networks. Deep Neural Network (DNN) is well trained to calculate the Q-function. To create comprehensive SBEC system it is crucial to choose appropriate mathematical background and benchmark the best framework of a model based predictive control to manage the building heating, ventilation, and air condition (HVAC) system. The main contribution of this paper is to explore a state-of-the-art DRL methodology to smart building control.

Download Full-text

A Modular Hierarchical Reinforcement Learning Algorithm

Lecture Notes in Computer Science - Intelligent Computing Theories and Applications ◽

10.1007/978-3-642-31576-3_48 ◽

2012 ◽

pp. 375-382

Author(s):

Zhibin Liu ◽

Xiaoqin Zeng ◽

Huiyi Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Multi-Task Deep Reinforcement Learning with PopArt

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013796 ◽

2019 ◽

Vol 33 ◽

pp. 3796-3803 ◽

Cited By ~ 21

Author(s):

Matteo Hessel ◽

Hubert Soyer ◽

Lasse Espeholt ◽

Wojciech Czarnecki ◽

Simon Schmitt ◽

...

Keyword(s):

Reinforcement Learning ◽

Human Performance ◽

Learning Algorithm ◽

State Of The Art ◽

Single Agent ◽

Learning System ◽

Learning Platform ◽

Art Performance ◽

The One ◽

First Time

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

Download Full-text

Towards Autonomous Intra-cortical Brain Machine Interfaces: Applying Bandit Algorithms for Online Reinforcement Learning

10.1101/2020.01.08.899641 ◽

2020 ◽

Author(s):

Shoeb Shaikh ◽

Rosa So ◽

Tafadzwa Sibindi ◽

Camilo Libedinsky ◽

Arindam Basu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

State Of The Art ◽

Discrete Control ◽

Brain Machine Interface ◽

Control Task ◽

Brain Machine Interfaces ◽

Discrete State ◽

Supervised Methods ◽

Machine Interface

AbstractThis paper presents application of Banditron - an online reinforcement learning algorithm (RL) in a discrete state intra-cortical Brain Machine Interface (iBMI) setting. We have analyzed two datasets from non-human primates (NHPs) - NHP A and NHP B each performing a 4-option discrete control task over a total of 8 days. Results show average improvements of ≈ 15%, 6% in NHP A and 15%, 21% in NHP B over state of the art algorithms - Hebbian Reinforcement Learning (HRL) and Attention Gated Reinforcement Learning (AGREL) respectively. Apart from yielding a superior decoding performance, Banditron is also the most computationally friendly as it requires two orders of magnitude less multiply-and-accumulate operations than HRL and AGREL. Furthermore, Banditron provides average improvements of at least 40%, 15% in NHPs A, B respectively compared to popularly employed supervised methods - LDA, SVM across test days. These results pave the way towards an alternate paradigm of temporally robust hardware friendly reinforcement learning based iBMIs.

Download Full-text