Reinforcement Learning based Evolutionary Metric Filtering for High Dimensional Problems

Author(s):  
Bassel Ali ◽  
Koichi Moriyama ◽  
Masayuki Numao ◽  
Ken-ichi Fukui
2017 ◽  
Vol 1 (1) ◽  
pp. 98-103 ◽  
Author(s):  
Junfei Xie ◽  
Yan Wan ◽  
Kevin Mills ◽  
James J. Filliben ◽  
F. L. Lewis

Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Bo Liu ◽  
Daesub Yoon

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.


2011 ◽  
Vol 23 (11) ◽  
pp. 2798-2832 ◽  
Author(s):  
Hirotaka Hachiya ◽  
Jan Peters ◽  
Masashi Sugiyama

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R[Formula: see text]), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .)


Sign in / Sign up

Export Citation Format

Share Document