scholarly journals Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Author(s):  
Nils Müller ◽  
Tobias Glasmachers
2019 ◽  
Vol 27 (4) ◽  
pp. 699-725 ◽  
Author(s):  
Hao Wang ◽  
Michael Emmerich ◽  
Thomas Bäck

Generating more evenly distributed samples in high dimensional search spaces is the major purpose of the recently proposed mirrored sampling technique for evolution strategies. The diversity of the mutation samples is enlarged and the convergence rate is therefore improved by the mirrored sampling. Motivated by the mirrored sampling technique, this article introduces a new derandomized sampling technique called mirrored orthogonal sampling. The performance of this new technique is both theoretically analyzed and empirically studied on the sphere function. In particular, the mirrored orthogonal sampling technique is applied to the well-known Covariance Matrix Adaptation Evolution Strategy (CMA-ES). The resulting algorithm is experimentally tested on the well-known Black-Box Optimization Benchmark (BBOB). By comparing the results from the benchmark, mirrored orthogonal sampling is found to outperform both the standard CMA-ES and its variant using mirrored sampling.


2017 ◽  
Vol 1 (1) ◽  
pp. 98-103 ◽  
Author(s):  
Junfei Xie ◽  
Yan Wan ◽  
Kevin Mills ◽  
James J. Filliben ◽  
F. L. Lewis

Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Bo Liu ◽  
Daesub Yoon

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document