ADAPTIVE LEARNING OF DECISION-THEORETIC SEARCH CONTROL KNOWLEDGE

Author(s):  
Eric H. Wefald ◽  
Stuart J. Russell
2006 ◽  
Vol 25 ◽  
pp. 17-74 ◽  
Author(s):  
S. Thiebaux ◽  
C. Gretton ◽  
J. Slaney ◽  
D. Price ◽  
F. Kabanza

A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decision-theoretic planning, where many desirable behaviours are more naturally expressed as properties of execution sequences rather than as properties of states, NMRDPs form a more natural model than the commonly adopted fully Markovian decision process (MDP) model. While the more tractable solution methods developed for MDPs do not directly apply in the presence of non-Markovian rewards, a number of solution methods for NMRDPs have been proposed in the literature. These all exploit a compact specification of the non-Markovian reward function in temporal logic, to automatically translate the NMRDP into an equivalent MDP which is solved using efficient MDP solution methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process Planner), a software platform for the development and experimentation of methods for decision-theoretic planning with non-Markovian rewards. The current version of NMRDPP implements, under a single interface, a family of methods based on existing as well as new approaches which we describe in detail. These include dynamic programming, heuristic search, and structured methods. Using NMRDPP, we compare the methods and identify certain problem features that affect their performance. NMRDPP's treatment of non-Markovian rewards is inspired by the treatment of domain-specific search control knowledge in the TLPlan planner, which it incorporates as a special case. In the First International Probabilistic Planning Competition, NMRDPP was able to compete and perform well in both the domain-independent and hand-coded tracks, using search control knowledge in the latter.


Author(s):  
Xu Lu ◽  
Cong Tian ◽  
Zhenhua Duan

Temporal logics are widely adopted in Artificial Intelligence (AI) planning for specifying Search Control Knowledge (SCK). However, traditional temporal logics are limited in expressive power since they are unable to express spatial constraints which are as important as temporal ones in many planning domains. To this end, we propose a two-dimensional (spatial and temporal) logic namely PPTL^SL by temporalising separation logic with Propositional Projection Temporal Logic (PPTL). The new logic is well-suited for specifying SCK containing both spatial and temporal constraints which are useful in AI planning. We show that PPTL^SL is decidable and present a decision procedure. With this basis, a planner namely S-TSolver for computing plans based on the spatio-temporal SCK expressed in PPTL^SL formulas is developed. Evaluation on some selected benchmark domains shows the effectiveness of S-TSolver.


2000 ◽  
Vol 116 (1-2) ◽  
pp. 123-191 ◽  
Author(s):  
Fahiem Bacchus ◽  
Froduald Kabanza

1991 ◽  
Vol 6 (2) ◽  
pp. 197-204
Author(s):  
Roland J. Zito-Wolf

Author(s):  
TOSHIKAZU TANAKA ◽  
TOM M. MITCHELL

This research seeks to .incorporate machine learning capabilities within a general-purpose frame-based architecture. We describe CHUNKER, an explanation-based chunking mechanism built on top of THEO, a software framework to support development of self-modifying problem solving systems. CHUNKER forms rules that improve problem solving efficiency, by generalizing and compressing the chains of inference which THEO produces during problem solving. After presenting the learning algorithm used by CHUNKER, we illustrate its application to learning search control knowledge, discuss its relationship to THEO’s other three learning mechanisms, and consider the relationship between architectural features of THEO and the effectiveness of CHUNKER.


Author(s):  
Pascal Bercher ◽  
Gregor Behnke ◽  
Daniel Höller ◽  
Susanne Biundo

Hierarchical task network (HTN) planning is well-known for being an efficient planning approach. This is mainly due to the success of the HTN planning system SHOP2. However, its performance depends on hand-designed search control knowledge. At the time being, there are only very few domain-independent heuristics, which are designed for differing hierarchical planning formalisms. Here, we propose an admissible heuristic for standard HTN planning, which allows to find optimal solutions heuristically. It bases upon the so-called task decomposition graph (TDG), a data structure reflecting reachable parts of the task hierarchy. We show (both in theory and empirically) that rebuilding it during planning can improve heuristic accuracy thereby decreasing the explored search space. The evaluation further studies the heuristic both in terms of plan quality and coverage.


Sign in / Sign up

Export Citation Format

Share Document