Can Meta-Interpretive Learning outperform Deep Reinforcement Learning of Evaluable Game strategies?

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/909 ◽

2019 ◽

Author(s):

Céline Hocquette

Keyword(s):

Reinforcement Learning ◽

Optimal Strategies ◽

Learning Systems ◽

Logical System ◽

Inductive Bias ◽

Player Game ◽

World Class ◽

Logical Representation ◽

Data Efficiency ◽

Human Player

World-class human players have been outperformed in a number of complex two person games such as Go by Deep Reinforcement Learning systems GO. However, several drawbacks can be identified for these systems: 1) The data efficiency is unclear given they appear to require far more training games to achieve such performance than any human player might experience in a lifetime. 2) These systems are not easily interpretable as they provide limited explanation about how decisions are made. 3) These systems do not provide transferability of the learned strategies to other games. We study in this work how an explicit logical representation can overcome these limitations and introduce a new logical system called MIGO designed for learning two player game optimal strategies. It benefits from a strong inductive bias which provides the capability to learn efficiently from a few examples of games played. Additionally, MIGO's learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning.

Get full-text (via PubEx)

Accelerated Sim-to-Real Deep Reinforcement Learning: Learning Collision Avoidance from Human Player

2021 IEEE/SICE International Symposium on System Integration (SII) ◽

10.1109/ieeeconf49454.2021.9382693 ◽

2021 ◽

Author(s):

Hanlin Niu ◽

Ze Ji ◽

Farshad Arvin ◽

Barry Lennox ◽

Hujun Yin ◽

...

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Human Player

Get full-text (via PubEx)

Adaptive Client Selection in Resource Constrained Federated Learning Systems: A Deep Reinforcement Learning Approach

IEEE Access ◽

10.1109/access.2021.3095915 ◽

2021 ◽

pp. 1-1

Author(s):

Hangjia Zhang ◽

Zhijun Xie ◽

Roozbeh Zarei ◽

Tao Wu ◽

Kewei Chen

Keyword(s):

Reinforcement Learning ◽

Learning Systems ◽

Learning Approach ◽

Resource Constrained

Get full-text (via PubEx)

A new approach for structural credit assignment in distributed reinforcement learning systems

2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422) ◽

10.1109/robot.2003.1241758 ◽

2004 ◽

Author(s):

Zhong Yu ◽

Gu Guochang ◽

Zhang Rubo

Keyword(s):

Reinforcement Learning ◽

Learning Systems ◽

Credit Assignment ◽

New Approach ◽

Distributed Reinforcement

Get full-text (via PubEx)

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Special Section on Deep Reinforcement Learning and Adaptive Dynamic Programming

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2017.2655663 ◽

2017 ◽

Vol 28 (3) ◽

pp. 772-772

Keyword(s):

Neural Networks ◽

Dynamic Programming ◽

Reinforcement Learning ◽

Special Section ◽

Learning Systems ◽

Adaptive Dynamic Programming ◽

Adaptive Dynamic

Get full-text (via PubEx)

Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220932634 ◽

2020 ◽

Vol 42 (15) ◽

pp. 2919-2928

Author(s):

He Ren ◽

Jing Dai ◽

Huaguang Zhang ◽

Kun Zhang

Keyword(s):

Differential Equation ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Control Strategies ◽

Iteration Process ◽

Optimal Strategies ◽

Distributed Parameter Systems ◽

Adaptive Dynamic Programming ◽

Distributed Parameter ◽

Value Functions

Benefitting from the technology of integral reinforcement learning, the nonzero sum (NZS) game for distributed parameter systems is effectively solved in this paper when the information of system dynamics are unavailable. The Karhunen-Loève decomposition (KLD) is employed to convert the partial differential equation (PDE) systems into high-order ordinary differential equation (ODE) systems. Moreover, the off-policy IRL technology is introduced to design the optimal strategies for the NZS game. To confirm that the presented algorithm will converge to the optimal value functions, the traditional adaptive dynamic programming (ADP) method is first discussed. Then, the equivalence between the traditional ADP method and the presented off-policy method is proved. For implementing the presented off-policy IRL method, actor and critic neural networks are utilized to approach the value functions and control strategies in the iteration process, individually. Finally, a numerical simulation is shown to illustrate the effectiveness of the proposal off-policy algorithm.

Get full-text (via PubEx)

Optimal strategies in measurable learning systems on metric spaces

Journal of Applied Probability ◽

10.2307/3213352 ◽

1977 ◽

Vol 14 (4) ◽

pp. 795-805 ◽

Cited By ~ 6

Author(s):

Ernst–Erich Doberkat

Keyword(s):

Dynamic Programming ◽

Metric Spaces ◽

Optimal Strategies ◽

Learning Systems ◽

Decision Models ◽

Programming Approach ◽

Learning Models ◽

Dynamic Programming Approach ◽

One Stage ◽

Dynamic Programs

A dynamic programming approach for the investigation of learning systems is taken. Making use of one-stage decision models and dynamic programs, respectively, two learning models are formulated and the existence of optimal strategies for learning in the respective models is proven.

Get full-text (via PubEx)

Logic-Based Sequential Decision-Making

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019995 ◽

2019 ◽

Vol 33 ◽

pp. 9995-9996

Author(s):

Daoming Lyu ◽

Fangkai Yang ◽

Bo Liu ◽

Daesub Yoon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

High Dimensional ◽

Great Success ◽

Sequential Decision ◽

Sensory Inputs ◽

Hierarchical Decision ◽

High Level ◽

Data Efficiency ◽

Symbolic Planning

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

Get full-text (via PubEx)

Computer simulation of heuristic reinforcement-learning systems for nuclear power plant load changes control

Computer Physics Communications ◽

10.1016/0010-4655(79)90004-3 ◽

1979 ◽

Vol 18 (3) ◽

pp. 339-352 ◽

Cited By ~ 3

Author(s):

Jacek Kitowski ◽

Jacek Mościński

Keyword(s):

Computer Simulation ◽

Reinforcement Learning ◽

Power Plant ◽

Nuclear Power Plant ◽

Nuclear Power ◽

Learning Systems

Get full-text (via PubEx)

Parallel Reinforcement Learning Systems Using Exploration Agents

TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series C ◽

10.1299/kikaic.74.692 ◽

2008 ◽

Vol 74 (739) ◽

pp. 692-701 ◽

Cited By ~ 1

Author(s):

Takeshi TATEYAMA ◽

Seiichi KAWATA ◽

Yoshiki SHIMOMURA

Keyword(s):

Reinforcement Learning ◽

Learning Systems

Get full-text (via PubEx)

Development of an Omni-Directional Electric Wheelchair using Simplified Mechanism and Implementation of a Support System Avoiding Obstacles using Reinforcement Learning Systems

The Proceedings of the JSME Symposium on Welfare Engineering ◽

10.1299/jsmewes.2003.3.55 ◽

2003 ◽

Vol 2003.3 (0) ◽

pp. 55-58

Author(s):

Ryota Kurozumi ◽

Koji Suyama ◽

Toru Yamamoto

Keyword(s):

Reinforcement Learning ◽

Support System ◽

Learning Systems ◽

Electric Wheelchair ◽

Simplified Mechanism

Get full-text (via PubEx)