Learning Partially Observable Deterministic Action Models

We present exact algorithms for identifying deterministic-actions' effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model(the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenarios are common in real world applications. They are challenging for AI tasks because traditional domain structures that underly tractability (e.g., conditional independence) fail there (e.g., world features become correlated). Our work departs from traditional assumptions about partial observations and action models. In particular, it focuses on problems in which actions are deterministic of simple logical structure and observation models have all features observed with some frequency. We yield tractable algorithms for the modified problem for such domains. Our algorithms take sequences of partial observations over time as input, and output deterministic action models that could have lead to those observations. The algorithms output all or one of those models (depending on our choice), and are exact in that no model is misclassified given the observations. Our algorithms take polynomial time in the number of time steps and state features for some traditional action classes examined in the AI-planning literature, e.g., STRIPS actions. In contrast, traditional approaches for HMMs and Reinforcement Learning are inexact and exponentially intractable for such domains. Our experiments verify the theoretical tractability guarantees, and show that we identify action models exactly. Several applications in planning, autonomous exploration, and adventure-game playing already use these results. They are also promising for probabilistic settings, partially observable reinforcement learning, and diagnosis.

Download Full-text

ARMS: an automatic knowledge engineering tool for learning action models for AI planning

The Knowledge Engineering Review ◽

10.1017/s0269888907001087 ◽

2007 ◽

Vol 22 (2) ◽

pp. 135-152 ◽

Cited By ~ 8

Author(s):

KANGHENG WU ◽

QIANG YANG ◽

YUNFEI JIANG

Keyword(s):

Knowledge Engineering ◽

Learning System ◽

Propositional Satisfiability ◽

Ai Planning ◽

Model Learning ◽

Frequent Sets ◽

Action Model ◽

Action Models ◽

Definition Of ◽

Modelling System

AbstractWe present an action model learning system known as ARMS (Action-Relation Modelling System) for automatically discovering action models from a set of successfully observed plans. Current artificial intelligence (AI) planners show impressive performance in many real world and artificial domains, but they all require the definition of an action model. ARMS is aimed at automatically learning action models from observed example plans, where each example plan is a sequence of action traces. These action models can then be used by the human editors to refine. The expectation is that this system will lessen the burden of the human editors in designing action models from scratch. In this paper, we describe the ARMS in detail. To learn action models, ARMS gathers knowledge on the statistical distribution of frequent sets of actions in the example plans. It then builds a weighted propositional satisfiability (weighted SAT) problem and solves it using a weighted MAXSAT solver. Furthermore, we show empirical evidence that ARMS can indeed learn a good approximation of the finally action models effectively.

Download Full-text

Benefits of combining dimensional attention and working memory for partially observable reinforcement learning problems

Proceedings of the 2021 ACM Southeast Conference ◽

10.1145/3409334.3452072 ◽

2021 ◽

Author(s):

Ngozi Omatu ◽

Joshua L. Phillips

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Learning Problems ◽

Partially Observable

Download Full-text

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

Machine Learning ◽

10.1007/s10994-021-05969-w ◽

2021 ◽

Author(s):

Wenjie Shang ◽

Qingyang Li ◽

Zhiwei Qin ◽

Yang Yu ◽

Yiping Meng ◽

...

Keyword(s):

Reinforcement Learning ◽

Partially Observable

Download Full-text

On Thompson Sampling and Asymptotic Optimality

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/688 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jan Leike ◽

Tor Lattimore ◽

Laurent Orseau ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Asymptotic Optimality ◽

Thompson Sampling ◽

Stochastic Environments ◽

Optimal Value ◽

Partially Observable ◽

General Stochastic

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Download Full-text

Search-Based Planning and Reinforcement Learning for Autonomous Systems and Robotics

10.36227/techrxiv.11607348.v1 ◽

2020 ◽

Author(s):

Than Le

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Optimal Trajectory ◽

Autonomous Systems ◽

Autonomous Mobile Robots ◽

Artificial Intelligent ◽

Unstructured Environment ◽

And Robotics ◽

Over Time ◽

Basic Prerequisite

<p>In this chapter, we address the competent Autonomous Vehicles should have the ability to analyze the structure and unstructured environments and then to localize itself relative to surrounding things, where GPS, RFID or other similar means cannot give enough information about the location. Reliable SLAM is the most basic prerequisite for any further artificial intelligent tasks of an autonomous mobile robots. The goal of this paper is to simulate a SLAM process on the advanced software development. The model represents the system itself, whereas the simulation represents the operation of the system over time. And the software architecture will help us to focus our work to realize our wish with least trivial work. It is an open-source meta-operating system, which provides us tremendous tools for robotics related problems.</p> <p>Specifically, we address the advanced vehicles should have the ability to analyze the structured and unstructured environment based on solving the search-based planning and then we move to discuss interested in reinforcement learning-based model to optimal trajectory in order to apply to autonomous systems.</p>

Download Full-text

A Menu of Designs for Reinforcement Learning Over Time

Neural Networks for Control ◽

10.7551/mitpress/4939.003.0007 ◽

1991 ◽

Keyword(s):

Reinforcement Learning ◽

Over Time

Download Full-text

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

Algorithms ◽

10.3390/a13110307 ◽

2020 ◽

Vol 13 (11) ◽

pp. 307

Author(s):

Luca Pasqualini ◽

Maurizio Parton

Keyword(s):

Reinforcement Learning ◽

Random Number ◽

Short Term Memory ◽

Random Number Generator ◽

Random Number Generation ◽

Time Step ◽

Software Applications ◽

Pseudo Random Number ◽

Markov Decision ◽

Partially Observable

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

Download Full-text

The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments With Partial Observations

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2020.2984227 ◽

2020 ◽

Vol 19 (7) ◽

pp. 4494-4506 ◽

Cited By ~ 1

Author(s):

Yue Xu ◽

Jianyuan Yu ◽

R. Michael Buehrer

Keyword(s):

Reinforcement Learning ◽

Heterogeneous Environments ◽

Spectrum Access ◽

Partial Observations

Download Full-text

Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5587 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2128-2135

Author(s):

Yang Liu ◽

Qi Liu ◽

Hongke Zhao ◽

Zhen Pan ◽

Chuanren Liu

Keyword(s):

Reinforcement Learning ◽

Trading Strategies ◽

Financial Data ◽

Imitation Learning ◽

Market Condition ◽

Exploration And Exploitation ◽

Markov Decision ◽

Trading Model ◽

Trading Agent ◽

Partially Observable

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

Download Full-text

Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2012.161 ◽

2012 ◽

Cited By ~ 1

Author(s):

Erkin Cilden ◽

Faruk Polat

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Extended Sequence ◽

Partially Observable

Download Full-text