scholarly journals Learning Partially Observable Deterministic Action Models

2008 ◽  
Vol 33 ◽  
pp. 349-402 ◽  
Author(s):  
E. Amir ◽  
A. Chang

We present exact algorithms for identifying deterministic-actions' effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model(the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenarios are common in real world applications. They are challenging for AI tasks because traditional domain structures that underly tractability (e.g., conditional independence) fail there (e.g., world features become correlated). Our work departs from traditional assumptions about partial observations and action models. In particular, it focuses on problems in which actions are deterministic of simple logical structure and observation models have all features observed with some frequency. We yield tractable algorithms for the modified problem for such domains. Our algorithms take sequences of partial observations over time as input, and output deterministic action models that could have lead to those observations. The algorithms output all or one of those models (depending on our choice), and are exact in that no model is misclassified given the observations. Our algorithms take polynomial time in the number of time steps and state features for some traditional action classes examined in the AI-planning literature, e.g., STRIPS actions. In contrast, traditional approaches for HMMs and Reinforcement Learning are inexact and exponentially intractable for such domains. Our experiments verify the theoretical tractability guarantees, and show that we identify action models exactly. Several applications in planning, autonomous exploration, and adventure-game playing already use these results. They are also promising for probabilistic settings, partially observable reinforcement learning, and diagnosis.

2007 ◽  
Vol 22 (2) ◽  
pp. 135-152 ◽  
Author(s):  
KANGHENG WU ◽  
QIANG YANG ◽  
YUNFEI JIANG

AbstractWe present an action model learning system known as ARMS (Action-Relation Modelling System) for automatically discovering action models from a set of successfully observed plans. Current artificial intelligence (AI) planners show impressive performance in many real world and artificial domains, but they all require the definition of an action model. ARMS is aimed at automatically learning action models from observed example plans, where each example plan is a sequence of action traces. These action models can then be used by the human editors to refine. The expectation is that this system will lessen the burden of the human editors in designing action models from scratch. In this paper, we describe the ARMS in detail. To learn action models, ARMS gathers knowledge on the statistical distribution of frequent sets of actions in the example plans. It then builds a weighted propositional satisfiability (weighted SAT) problem and solves it using a weighted MAXSAT solver. Furthermore, we show empirical evidence that ARMS can indeed learn a good approximation of the finally action models effectively.


2021 ◽  
Author(s):  
Wenjie Shang ◽  
Qingyang Li ◽  
Zhiwei Qin ◽  
Yang Yu ◽  
Yiping Meng ◽  
...  

Author(s):  
Jan Leike ◽  
Tor Lattimore ◽  
Laurent Orseau ◽  
Marcus Hutter

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.


2020 ◽  
Author(s):  
Than Le

<p>In this chapter, we address the competent Autonomous Vehicles should have the ability to analyze the structure and unstructured environments and then to localize itself relative to surrounding things, where GPS, RFID or other similar means cannot give enough information about the location. Reliable SLAM is the most basic prerequisite for any further artificial intelligent tasks of an autonomous mobile robots. The goal of this paper is to simulate a SLAM process on the advanced software development. The model represents the system itself, whereas the simulation represents the operation of the system over time. And the software architecture will help us to focus our work to realize our wish with least trivial work. It is an open-source meta-operating system, which provides us tremendous tools for robotics related problems.</p> <p>Specifically, we address the advanced vehicles should have the ability to analyze the structured and unstructured environment based on solving the search-based planning and then we move to discuss interested in reinforcement learning-based model to optimal trajectory in order to apply to autonomous systems.</p>


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 307
Author(s):  
Luca Pasqualini ◽  
Maurizio Parton

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.


2020 ◽  
Vol 34 (02) ◽  
pp. 2128-2135
Author(s):  
Yang Liu ◽  
Qi Liu ◽  
Hongke Zhao ◽  
Zhen Pan ◽  
Chuanren Liu

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.


Sign in / Sign up

Export Citation Format

Share Document