Adaptive Agents in Minecraft: A Hybrid Paradigm for Combining Domain Knowledge with Reinforcement Learning

Robotic Information Gathering With Reinforcement Learning Assisted by Domain Knowledge: An Application to Gas Source Localization

IEEE Access ◽

10.1109/access.2021.3052024 ◽

2021 ◽

Vol 9 ◽

pp. 13159-13172

Author(s):

Thomas Wiedemann ◽

Cosmin Vlaicu ◽

Josip Josifovski ◽

Alberto Viseras

Keyword(s):

Reinforcement Learning ◽

Source Localization ◽

Domain Knowledge ◽

Information Gathering ◽

Gas Source ◽

Gas Source Localization

Download Full-text

Leveraging Domain Knowledge for Robust Deep Reinforcement Learning in Networking

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications ◽

10.1109/infocom42981.2021.9488863 ◽

2021 ◽

Author(s):

Ying Zheng ◽

Haoyu Chen ◽

Qingyang Duan ◽

Lixiang Lin ◽

Yiyang Shao ◽

...

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge

Download Full-text

A Deep Reinforcement Learning Method for Freight Train Driving Based on Domain Knowledge and Mass Estimation Network

10.1145/3468891.3468898 ◽

2021 ◽

Author(s):

Wenlong Feng ◽

Wei Dong ◽

Shouchao Zhai ◽

Guohua Zhang ◽

Xinya Sun ◽

...

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge ◽

Learning Method ◽

Freight Train ◽

Mass Estimation

Download Full-text

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Science ◽

10.1126/science.aar6404 ◽

2018 ◽

Vol 362 (6419) ◽

pp. 1140-1144 ◽

Cited By ~ 388

Author(s):

David Silver ◽

Thomas Hubert ◽

Julian Schrittwieser ◽

Ioannis Antonoglou ◽

Matthew Lai ◽

...

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Domain Knowledge ◽

Learning Algorithm ◽

Search Techniques ◽

Domain Specific ◽

Evaluation Functions ◽

History Of ◽

World Champion ◽

Reinforcement Learning Algorithm

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Download Full-text

Verifiable and Interpretable Reinforcement Learning through Program Synthesis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019902 ◽

2019 ◽

Vol 33 ◽

pp. 9902-9903

Author(s):

Abhinav Verma

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Reinforcement Learning ◽

Programming Languages ◽

Formal Methods ◽

Domain Knowledge ◽

Policy Search ◽

Safety Critical ◽

Symbolic Methods

We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.

Download Full-text

Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301881 ◽

2019 ◽

Vol 33 ◽

pp. 881-889 ◽

Cited By ~ 1

Author(s):

Michael Dann ◽

Fabio Zambetta ◽

John Thangarajah

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge ◽

State Of The Art ◽

Significant Challenge ◽

Intrinsic Reward ◽

Art Methods ◽

Efficient Exploration

Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari games with sparse rewards. Our agent’s performance is comparable to that of state-of-the-art methods, demonstrating the usefulness of the subgoals found.

Download Full-text

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Sensors ◽

10.3390/s19183837 ◽

2019 ◽

Vol 19 (18) ◽

pp. 3837 ◽

Cited By ~ 7

Author(s):

Junjie Zeng ◽

Rusheng Ju ◽

Long Qin ◽

Yue Hu ◽

Quanjun Yin ◽

...

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge ◽

Moving Objects ◽

Dynamic Environment ◽

Dynamic Environments ◽

Continuous Control ◽

Complex Environments ◽

Reward Function ◽

Knowledge Based ◽

Task Architecture

In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot’s capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands.

Download Full-text

Batch Reinforcement Learning of Feasible Trajectories in a Ship Maneuvering Simulator

10.5753/eniac.2018.4422 ◽

2018 ◽

Author(s):

José Amendola ◽

Eduardo A. Tannuri ◽

Fabio G. Cozman ◽

Anna H. Reali

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Domain Knowledge ◽

State Space Model ◽

Ship Maneuvering ◽

Space Model ◽

Control Signals ◽

Batch Reinforcement Learning ◽

Ship Control ◽

Compact State

Ship control in port channels is a challenging problem that has resisted automated solutions. In this paper we focus on reinforcement learning of control signals so as to steer ships in their maneuvers. The learning process uses fitted Q iteration together with a Ship Maneuvering Simulator. Domain knowledge is used to develop a compact state-space model; we show how this model and the learning process lead to ship maneuvering under difficult conditions.

Download Full-text