scholarly journals Can Meta-Interpretive Learning outperform Deep Reinforcement Learning of Evaluable Game strategies?

Author(s):  
Céline Hocquette

World-class human players have been outperformed in a number of complex two person games such as Go by Deep Reinforcement Learning systems GO. However, several drawbacks can be identified for these systems: 1) The data efficiency is unclear given they appear to require far more training games to achieve such performance than any human player might experience in a lifetime. 2) These systems are not easily interpretable as they provide limited explanation about how decisions are made. 3) These systems do not provide transferability of the learned strategies to other games. We study in this work how an explicit logical representation can overcome these limitations and introduce a new logical system called MIGO designed for learning two player game optimal strategies. It benefits from a strong inductive bias which provides the capability to learn efficiently from a few examples of games played. Additionally, MIGO's learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning.

2020 ◽  
Vol 42 (15) ◽  
pp. 2919-2928
Author(s):  
He Ren ◽  
Jing Dai ◽  
Huaguang Zhang ◽  
Kun Zhang

Benefitting from the technology of integral reinforcement learning, the nonzero sum (NZS) game for distributed parameter systems is effectively solved in this paper when the information of system dynamics are unavailable. The Karhunen-Loève decomposition (KLD) is employed to convert the partial differential equation (PDE) systems into high-order ordinary differential equation (ODE) systems. Moreover, the off-policy IRL technology is introduced to design the optimal strategies for the NZS game. To confirm that the presented algorithm will converge to the optimal value functions, the traditional adaptive dynamic programming (ADP) method is first discussed. Then, the equivalence between the traditional ADP method and the presented off-policy method is proved. For implementing the presented off-policy IRL method, actor and critic neural networks are utilized to approach the value functions and control strategies in the iteration process, individually. Finally, a numerical simulation is shown to illustrate the effectiveness of the proposal off-policy algorithm.


1977 ◽  
Vol 14 (4) ◽  
pp. 795-805 ◽  
Author(s):  
Ernst–Erich Doberkat

A dynamic programming approach for the investigation of learning systems is taken. Making use of one-stage decision models and dynamic programs, respectively, two learning models are formulated and the existence of optimal strategies for learning in the respective models is proven.


Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Bo Liu ◽  
Daesub Yoon

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.


2008 ◽  
Vol 74 (739) ◽  
pp. 692-701 ◽  
Author(s):  
Takeshi TATEYAMA ◽  
Seiichi KAWATA ◽  
Yoshiki SHIMOMURA

Sign in / Sign up

Export Citation Format

Share Document