scholarly journals Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

Author(s):  
Dieqiao Feng ◽  
Carla Gomes ◽  
Bart Selman

Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.

2021 ◽  
Vol 01 ◽  
Author(s):  
Ying Li ◽  
Chubing Guo ◽  
Jianshe Wu ◽  
Xin Zhang ◽  
Jian Gao ◽  
...  

Background: Unmanned systems have been widely used in multiple fields. Many algorithms have been proposed to solve path planning problems. Each algorithm has its advantages and defects and cannot adapt to all kinds of requirements. An appropriate path planning method is needed for various applications. Objective: To select an appropriate algorithm fastly in a given application. This could be helpful for improving the efficiency of path planning for Unmanned systems. Methods: This paper proposes to represent and quantify the features of algorithms based on the physical indicators of results. At the same time, an algorithmic collaborative scheme is developed to search the appropriate algorithm according to the requirement of the application. As an illustration of the scheme, four algorithms, including the A-star (A*) algorithm, reinforcement learning, genetic algorithm, and ant colony optimization algorithm, are implemented in the representation of their features. Results: In different simulations, the algorithmic collaborative scheme can select an appropriate algorithm in a given application based on the representation of algorithms. And the algorithm could plan a feasible and effective path. Conclusion: An algorithmic collaborative scheme is proposed, which is based on the representation of algorithms and requirement of the application. The simulation results prove the feasibility of the scheme and the representation of algorithms.


2020 ◽  
Vol 34 (06) ◽  
pp. 9835-9842
Author(s):  
Daniel Fišer

In this paper, we focus on the inference of mutex groups in the lifted (PDDL) representation. We formalize the inference and prove that the most commonly used translator from the Fast Downward (FD) planning system infers a certain subclass of mutex groups, called fact-alternating mutex groups (fam-groups). Based on that, we show that the previously proposed fam-groups-based pruning techniques for the STRIPS representation can be utilized during the grounding process with lifted fam-groups, i.e., before the full STRIPS representation is known. Furthermore, we propose an improved inference algorithm for lifted fam-groups that produces a richer set of fam-groups than the FD translator and we demonstrate a positive impact on the number of pruned operators and overall coverage.


2001 ◽  
Vol 14 ◽  
pp. 1-28 ◽  
Author(s):  
R. I. Brafman

In recent years, there is a growing awareness of the importance of reachability and relevance-based pruning techniques for planning, but little work specifically targets these techniques. In this paper, we compare the ability of two classes of algorithms to propagate and discover reachability and relevance constraints in classical planning problems. The first class of algorithms operates on SAT encoded planning problems obtained using the linear and Graphplan encoding schemes. It applies unit-propagation and more general resolution steps (involving larger clauses) to these plan encodings. The second class operates at the plan level and contains two families of pruning algorithms: Reachable-k and Relevant-k. Reachable-k provides a coherent description of a number of existing forward pruning techniques used in numerous algorithms, while Relevant-k captures different grades of backward pruning. Our results shed light on the ability of different plan-encoding schemes to propagate information forward and backward and on the relative merit of plan-level and SAT-level pruning methods.


2020 ◽  
Vol 34 (06) ◽  
pp. 9883-9891 ◽  
Author(s):  
Daniel Höller ◽  
Gregor Behnke ◽  
Pascal Bercher ◽  
Susanne Biundo ◽  
Humbert Fiorino ◽  
...  

The research in hierarchical planning has made considerable progress in the last few years. Many recent systems do not rely on hand-tailored advice anymore to find solutions, but are supposed to be domain-independent systems that come with sophisticated solving techniques. In principle, this development would make the comparison between systems easier (because the domains are not tailored to a single system anymore) and – much more important – also the integration into other systems, because the modeling process is less tedious (due to the lack of advice) and there is no (or less) commitment to a certain planning system the model is created for. However, these advantages are destroyed by the lack of a common input language and feature set supported by the different systems. In this paper, we propose an extension to PDDL, the description language used in non-hierarchical planning, to the needs of hierarchical planning systems.


2021 ◽  
Vol 14 (11) ◽  
pp. 2563-2575
Author(s):  
Junwen Yang ◽  
Yeye He ◽  
Surajit Chaudhuri

Recent work has made significant progress in helping users to automate single data preparation steps, such as string-transformations and table-manipulation operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data-pipelines with both string-transformations and table-manipulation operators. We propose a novel by-target paradigm that allows users to easily specify the desired pipeline, which is a significant departure from the traditional by-example paradigm. Using by-target, users would provide input tables (e.g., csv or json files), and point us to a "target table" (e.g., an existing database table or BI dashboard) to demonstrate how the output from the desired pipeline would schematically "look like". While the problem is seemingly under-specified, our unique insight is that implicit table constraints such as FDs and keys can be exploited to significantly constrain the space and make the problem tractable. We develop an AUTO-PIPELINE system that learns to synthesize pipelines using deep reinforcement-learning (DRL) and search. Experiments using a benchmark of 700 real pipelines crawled from GitHub and commercial vendors suggest that AUTO-PIPELINE can successfully synthesize around 70% of complex pipelines with up to 10 steps.


Science ◽  
2018 ◽  
Vol 362 (6419) ◽  
pp. 1140-1144 ◽  
Author(s):  
David Silver ◽  
Thomas Hubert ◽  
Julian Schrittwieser ◽  
Ioannis Antonoglou ◽  
Matthew Lai ◽  
...  

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.


Author(s):  
Sebastian Blank ◽  
Florian Wilhelm ◽  
Hans-Peter Zorn ◽  
Achim Rettinger

Almost all of today’s knowledge is stored in databases and thus can only be accessed with the help of domain specific query languages, strongly limiting the number of people which can access the data. In this work, we demonstrate an end-to-end trainable question answering (QA) system that allows a user to query an external NoSQL database by using natural language. A major challenge of such a system is the non-differentiability of database operations which we overcome by applying policy-based reinforcement learning. We evaluate our approach on Facebook’s bAbI Movie Dialog dataset and achieve a competitive score of 84.2% compared to several benchmark models. We conclude that our approach excels with regard to real-world scenarios where knowledge resides in external databases and intermediate labels are too costly to gather for non-end-to-end trainable QA systems.


1992 ◽  
Vol 01 (03n04) ◽  
pp. 411-449 ◽  
Author(s):  
LEE SPECTOR ◽  
JAMES HENDLER

For intelligent systems to interact with external agents and changing domains, they must be able to perceive and to affect their environments while computing long term projection (planning) of future states. This paper describes and demonstrates the supervenience architecture, a multilevel architecture for integrating planning and reacting in complex, dynamic environments. We briefly review the underlying concept of supervenience, a form of abstraction with affinities both to abstraction in AI planning systems, and to knowledge-partitioning schemes in hierarchical control systems. We show how this concept can be distilled into a strong constraint on the design of dynamic-world planning systems. We then describe the supervenience architecture and an implementation of the architecture called APE (for Abstraction-Partitioned Evaluator). The application of APE to the HomeBot domain is used to demonstrate the capabilities of the architecture.


Author(s):  
Lin Lan ◽  
Zhenguo Li ◽  
Xiaohong Guan ◽  
Pinghui Wang

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.


2003 ◽  
Vol 19 ◽  
pp. 25-71 ◽  
Author(s):  
T. Eiter ◽  
W. Faber ◽  
N. Leone ◽  
G. Pfeifer ◽  
A. Polleres

Recently, planning based on answer set programming has been proposed as an approach towards realizing declarative planning systems. In this paper, we present the language Kc, which extends the declarative planning language K by action costs. Kc provides the notion of admissible and optimal plans, which are plans whose overall action costs are within a given limit resp. minimum over all plans (i.e., cheapest plans). As we demonstrate, this novel language allows for expressing some nontrivial planning tasks in a declarative way. Furthermore, it can be utilized for representing planning problems under other optimality criteria, such as computing ``shortest'' plans (with the least number of steps), and refinement combinations of cheapest and fastest plans. We study complexity aspects of the language Kc and provide a transformation to logic programs, such that planning problems are solved via answer set programming. Furthermore, we report experimental results on selected problems. Our experience is encouraging that answer set planning may be a valuable approach to expressive planning systems in which intricate planning problems can be naturally specified and solved.


Sign in / Sign up

Export Citation Format

Share Document