Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

Author(s):  
Aijun Bai ◽  
Stuart Russell

In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQ-INT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

Author(s):  
Jing Zhang ◽  
Bowen Hao ◽  
Bo Chen ◽  
Cuiping Li ◽  
Hong Chen ◽  
...  

The proliferation of massive open online courses (MOOCs) demands an effective way of personalized course recommendation. The recent attention-based recommendation models can distinguish the effects of different historical courses when recommending different target courses. However, when a user has interests in many different courses, the attention mechanism will perform poorly as the effects of the contributing courses are diluted by diverse historical courses. To address such a challenge, we propose a hierarchical reinforcement learning algorithm to revise the user profiles and tune the course recommendation model on the revised profiles.Systematically, we evaluate the proposed model on a real dataset consisting of 1,302 courses, 82,535 users and 458,454 user enrolled behaviors, which were collected from XuetangX—one of the largest MOOCs in China. Experimental results show that the proposed model significantly outperforms the state-of-the-art recommendation models (improving 5.02% to 18.95% in terms of HR@10).


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1576 ◽  
Author(s):  
Xiaomao Zhou ◽  
Tao Bai ◽  
Yanbin Gao ◽  
Yuntao Han

Extensive studies have shown that many animals’ capability of forming spatial representations for self-localization, path planning, and navigation relies on the functionalities of place and head-direction (HD) cells in the hippocampus. Although there are numerous hippocampal modeling approaches, only a few span the wide functionalities ranging from processing raw sensory signals to planning and action generation. This paper presents a vision-based navigation system that involves generating place and HD cells through learning from visual images, building topological maps based on learned cell representations and performing navigation using hierarchical reinforcement learning. First, place and HD cells are trained from sequences of visual stimuli in an unsupervised learning fashion. A modified Slow Feature Analysis (SFA) algorithm is proposed to learn different cell types in an intentional way by restricting their learning to separate phases of the spatial exploration. Then, to extract the encoded metric information from these unsupervised learning representations, a self-organized learning algorithm is adopted to learn over the emerged cell activities and to generate topological maps that reveal the topology of the environment and information about a robot’s head direction, respectively. This enables the robot to perform self-localization and orientation detection based on the generated maps. Finally, goal-directed navigation is performed using reinforcement learning in continuous state spaces which are represented by the population activities of place cells. In particular, considering that the topological map provides a natural hierarchical representation of the environment, hierarchical reinforcement learning (HRL) is used to exploit this hierarchy to accelerate learning. The HRL works on different spatial scales, where a high-level policy learns to select subgoals and a low-level policy learns over primitive actions to specialize on the selected subgoals. Experimental results demonstrate that our system is able to navigate a robot to the desired position effectively, and the HRL shows a much better learning performance than the standard RL in solving our navigation tasks.


2015 ◽  
Vol 24 (07) ◽  
pp. 1550101 ◽  
Author(s):  
Raouf Senhadji-Navaro ◽  
Ignacio Garcia-Vargas

This work is focused on the problem of designing efficient reconfigurable multiplexer banks for RAM-based implementations of reconfigurable state machines. We propose a new architecture (called combination-based reconfigurable multiplexer bank, CRMUX) that use multiplexers simpler than that of the state-of-the-art architecture (called variation-based reconfigurable multiplexer bank, VRMUX). The performance (in terms of speed, area and reconfiguration cost) of both architectures is compared. Experimental results from MCNC finite state machine (FSM) benchmarks show that CRMUX is faster and more area-efficient than VRMUX. The reconfiguration cost of both multiplexer banks is studied using a behavioral model of a reconfigurable state machine. The results show that the reconfiguration cost of CRMUX is lower than that of VRMUX in most cases.


2014 ◽  
Vol 2014 ◽  
pp. 1-6
Author(s):  
Yuchen Fu ◽  
Quan Liu ◽  
Xionghong Ling ◽  
Zhiming Cui

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.


2018 ◽  
Vol 21 ◽  
pp. 29-36 ◽  
Author(s):  
Ivars Namatēvs

Due to increase of computing power and innovative approaches of an end-to-end reinforcement learning (RL) that feed data from high-dimensional sensory inputs, it is now plausible to combine RL and Deep learning to perform Smart Building Energy Control (SBEC) systems. Deep reinforcement learning (DRL) revolutionizes existing Q-learning algorithm to Deep Q-learning (DQL) profited by artificial neural networks. Deep Neural Network (DNN) is well trained to calculate the Q-function. To create comprehensive SBEC system it is crucial to choose appropriate mathematical background and benchmark the best framework of a model based predictive control to manage the building heating, ventilation, and air condition (HVAC) system. The main contribution of this paper is to explore a state-of-the-art DRL methodology to smart building control.


Author(s):  
Matteo Hessel ◽  
Hubert Soyer ◽  
Lasse Espeholt ◽  
Wojciech Czarnecki ◽  
Simon Schmitt ◽  
...  

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.


2020 ◽  
Author(s):  
Shoeb Shaikh ◽  
Rosa So ◽  
Tafadzwa Sibindi ◽  
Camilo Libedinsky ◽  
Arindam Basu

AbstractThis paper presents application of Banditron - an online reinforcement learning algorithm (RL) in a discrete state intra-cortical Brain Machine Interface (iBMI) setting. We have analyzed two datasets from non-human primates (NHPs) - NHP A and NHP B each performing a 4-option discrete control task over a total of 8 days. Results show average improvements of ≈ 15%, 6% in NHP A and 15%, 21% in NHP B over state of the art algorithms - Hebbian Reinforcement Learning (HRL) and Attention Gated Reinforcement Learning (AGREL) respectively. Apart from yielding a superior decoding performance, Banditron is also the most computationally friendly as it requires two orders of magnitude less multiply-and-accumulate operations than HRL and AGREL. Furthermore, Banditron provides average improvements of at least 40%, 15% in NHPs A, B respectively compared to popularly employed supervised methods - LDA, SVM across test days. These results pave the way towards an alternate paradigm of temporally robust hardware friendly reinforcement learning based iBMIs.


Sign in / Sign up

Export Citation Format

Share Document