Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/304 ◽

2020 ◽

Author(s):

Dieqiao Feng ◽

Carla Gomes ◽

Bart Selman

Keyword(s):

Reinforcement Learning ◽

Training Curriculum ◽

Ai Planning ◽

Significant Progress ◽

Planning Systems ◽

Domain Specific ◽

Proper Training ◽

Planning Task ◽

Planning Problems ◽

Pruning Techniques

Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.

Download Full-text

Collaborative Scheduling of Algorithms for Path Planning of Unmanned Systems

Current Chinese Science ◽

10.2174/2210298101666210211094253 ◽

2021 ◽

Vol 01 ◽

Author(s):

Ying Li ◽

Chubing Guo ◽

Jianshe Wu ◽

Xin Zhang ◽

Jian Gao ◽

...

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Path Planning ◽

Unmanned Systems ◽

Ant Colony Optimization Algorithm ◽

A Algorithm ◽

Collaborative Scheduling ◽

Simulation Results ◽

Planning Problems ◽

Effective Path

Background: Unmanned systems have been widely used in multiple fields. Many algorithms have been proposed to solve path planning problems. Each algorithm has its advantages and defects and cannot adapt to all kinds of requirements. An appropriate path planning method is needed for various applications. Objective: To select an appropriate algorithm fastly in a given application. This could be helpful for improving the efficiency of path planning for Unmanned systems. Methods: This paper proposes to represent and quantify the features of algorithms based on the physical indicators of results. At the same time, an algorithmic collaborative scheme is developed to search the appropriate algorithm according to the requirement of the application. As an illustration of the scheme, four algorithms, including the A-star (A*) algorithm, reinforcement learning, genetic algorithm, and ant colony optimization algorithm, are implemented in the representation of their features. Results: In different simulations, the algorithmic collaborative scheme can select an appropriate algorithm in a given application based on the representation of algorithms. And the algorithm could plan a feasible and effective path. Conclusion: An algorithmic collaborative scheme is proposed, which is based on the representation of algorithms and requirement of the application. The simulation results prove the feasibility of the scheme and the representation of algorithms.

Download Full-text

Lifted Fact-Alternating Mutex Groups and Pruned Grounding of Classical Planning Problems

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6536 ◽

2020 ◽

Vol 34 (06) ◽

pp. 9835-9842

Author(s):

Daniel Fišer

Keyword(s):

Positive Impact ◽

Planning System ◽

Inference Algorithm ◽

Planning Problems ◽

Pruning Techniques

In this paper, we focus on the inference of mutex groups in the lifted (PDDL) representation. We formalize the inference and prove that the most commonly used translator from the Fast Downward (FD) planning system infers a certain subclass of mutex groups, called fact-alternating mutex groups (fam-groups). Based on that, we show that the previously proposed fam-groups-based pruning techniques for the STRIPS representation can be utilized during the grounding process with lifted fam-groups, i.e., before the full STRIPS representation is known. Furthermore, we propose an improved inference algorithm for lifted fam-groups that produces a richer set of fam-groups than the FD translator and we demonstrate a positive impact on the number of pruned operators and overall coverage.

Download Full-text

On Reachability, Relevance, and Resolution in the Planning as Satisfiability Approach

Journal of Artificial Intelligence Research ◽

10.1613/jair.737 ◽

2001 ◽

Vol 14 ◽

pp. 1-28 ◽

Cited By ~ 2

Author(s):

R. I. Brafman

Keyword(s):

Relative Merit ◽

Pruning Algorithms ◽

Unit Propagation ◽

Planning Problems ◽

Pruning Techniques ◽

Shed Light ◽

Pruning Methods

In recent years, there is a growing awareness of the importance of reachability and relevance-based pruning techniques for planning, but little work specifically targets these techniques. In this paper, we compare the ability of two classes of algorithms to propagate and discover reachability and relevance constraints in classical planning problems. The first class of algorithms operates on SAT encoded planning problems obtained using the linear and Graphplan encoding schemes. It applies unit-propagation and more general resolution steps (involving larger clauses) to these plan encodings. The second class operates at the plan level and contains two families of pruning algorithms: Reachable-k and Relevant-k. Reachable-k provides a coherent description of a number of existing forward pruning techniques used in numerous algorithms, while Relevant-k captures different grades of backward pruning. Our results shed light on the ability of different plan-encoding schemes to propagate information forward and backward and on the relative merit of plan-level and SAT-level pruning methods.

Download Full-text

HDDL: An Extension to PDDL for Expressing Hierarchical Planning Problems

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6542 ◽

2020 ◽

Vol 34 (06) ◽

pp. 9883-9891 ◽

Cited By ~ 2

Author(s):

Daniel Höller ◽

Gregor Behnke ◽

Pascal Bercher ◽

Susanne Biundo ◽

Humbert Fiorino ◽

...

Keyword(s):

Planning System ◽

Hierarchical Planning ◽

Planning Systems ◽

Description Language ◽

Input Language ◽

Common Input ◽

Modeling Process ◽

Planning Problems ◽

Domain Independent ◽

Tailored Advice

The research in hierarchical planning has made considerable progress in the last few years. Many recent systems do not rely on hand-tailored advice anymore to find solutions, but are supposed to be domain-independent systems that come with sophisticated solving techniques. In principle, this development would make the comparison between systems easier (because the domains are not tailored to a single system anymore) and – much more important – also the integration into other systems, because the modeling process is less tedious (due to the lack of advice) and there is no (or less) commitment to a certain planning system the model is created for. However, these advantages are destroyed by the lack of a common input language and feature set supported by the different systems. In this paper, we propose an extension to PDDL, the description language used in non-hierarchical planning, to the needs of hierarchical planning systems.

Download Full-text

Auto-pipeline

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476303 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2563-2575

Author(s):

Junwen Yang ◽

Yeye He ◽

Surajit Chaudhuri

Keyword(s):

Reinforcement Learning ◽

Recent Work ◽

Pipeline System ◽

Complex Data ◽

Data Preparation ◽

Significant Progress ◽

Database Table ◽

End To End

Recent work has made significant progress in helping users to automate single data preparation steps, such as string-transformations and table-manipulation operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data-pipelines with both string-transformations and table-manipulation operators. We propose a novel by-target paradigm that allows users to easily specify the desired pipeline, which is a significant departure from the traditional by-example paradigm. Using by-target, users would provide input tables (e.g., csv or json files), and point us to a "target table" (e.g., an existing database table or BI dashboard) to demonstrate how the output from the desired pipeline would schematically "look like". While the problem is seemingly under-specified, our unique insight is that implicit table constraints such as FDs and keys can be exploited to significantly constrain the space and make the problem tractable. We develop an AUTO-PIPELINE system that learns to synthesize pipelines using deep reinforcement-learning (DRL) and search. Experiments using a benchmark of 700 real pipelines crawled from GitHub and commercial vendors suggest that AUTO-PIPELINE can successfully synthesize around 70% of complex pipelines with up to 10 steps.

Download Full-text

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Science ◽

10.1126/science.aar6404 ◽

2018 ◽

Vol 362 (6419) ◽

pp. 1140-1144 ◽

Cited By ~ 388

Author(s):

David Silver ◽

Thomas Hubert ◽

Julian Schrittwieser ◽

Ioannis Antonoglou ◽

Matthew Lai ◽

...

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Domain Knowledge ◽

Learning Algorithm ◽

Search Techniques ◽

Domain Specific ◽

Evaluation Functions ◽

History Of ◽

World Champion ◽

Reinforcement Learning Algorithm

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Download Full-text

Querying NoSQL with Deep Learning to Answer Natural Language Questions

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019416 ◽

2019 ◽

Vol 33 ◽

pp. 9416-9421

Author(s):

Sebastian Blank ◽

Florian Wilhelm ◽

Hans-Peter Zorn ◽

Achim Rettinger

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Natural Language ◽

Question Answering ◽

Query Languages ◽

Domain Specific ◽

Nosql Database ◽

End To End ◽

Database Operations ◽

Almost All

Almost all of today’s knowledge is stored in databases and thus can only be accessed with the help of domain specific query languages, strongly limiting the number of people which can access the data. In this work, we demonstrate an end-to-end trainable question answering (QA) system that allows a user to query an external NoSQL database by using natural language. A major challenge of such a system is the non-differentiability of database operations which we overcome by applying policy-based reinforcement learning. We evaluate our approach on Facebook’s bAbI Movie Dialog dataset and achieve a competitive score of 84.2% compared to several benchmark models. We conclude that our approach excels with regard to real-world scenarios where knowledge resides in external databases and intermediate labels are too costly to gather for non-end-to-end trainable QA systems.

Download Full-text

PLANNING AND REACTING ACROSS SUPERVENIENT LEVELS OF REPRESENTATION

International Journal of Cooperative Information Systems ◽

10.1142/s0218215792000118 ◽

1992 ◽

Vol 01 (03n04) ◽

pp. 411-449 ◽

Cited By ~ 2

Author(s):

LEE SPECTOR ◽

JAMES HENDLER

Keyword(s):

Control Systems ◽

Intelligent Systems ◽

Hierarchical Control ◽

Dynamic Environments ◽

Ai Planning ◽

Planning Systems ◽

Complex Dynamic ◽

Strong Constraint ◽

External Agents

For intelligent systems to interact with external agents and changing domains, they must be able to perceive and to affect their environments while computing long term projection (planning) of future states. This paper describes and demonstrates the supervenience architecture, a multilevel architecture for integrating planning and reacting in complex, dynamic environments. We briefly review the underlying concept of supervenience, a form of abstraction with affinities both to abstraction in AI planning systems, and to knowledge-partitioning schemes in hierarchical control systems. We show how this concept can be distilled into a strong constraint on the design of dynamic-world planning systems. We then describe the supervenience architecture and an implementation of the architecture called APE (for Abstraction-Partitioned Evaluator). The application of APE to the HomeBot domain is used to demonstrate the capabilities of the architecture.

Download Full-text

Meta Reinforcement Learning with Task Embedding and Shared Policy

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/387 ◽

2019 ◽

Cited By ~ 2

Author(s):

Lin Lan ◽

Zhenguo Li ◽

Xiaohong Guan ◽

Pinghui Wang

Keyword(s):

Reinforcement Learning ◽

The Other ◽

Specific Information ◽

Significant Progress ◽

Learning To Learn ◽

Learning Capacity ◽

Shared Information ◽

Meta Learning ◽

The One ◽

High Level

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.

Download Full-text

Answer Set Planning Under Action Costs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1148 ◽

2003 ◽

Vol 19 ◽

pp. 25-71 ◽

Cited By ~ 18

Author(s):

T. Eiter ◽

W. Faber ◽

N. Leone ◽

G. Pfeifer ◽

A. Polleres

Keyword(s):

Answer Set Programming ◽

Optimality Criteria ◽

Experimental Results ◽

Logic Programs ◽

Planning Systems ◽

Planning Problems ◽

Answer Set

Recently, planning based on answer set programming has been proposed as an approach towards realizing declarative planning systems. In this paper, we present the language Kc, which extends the declarative planning language K by action costs. Kc provides the notion of admissible and optimal plans, which are plans whose overall action costs are within a given limit resp. minimum over all plans (i.e., cheapest plans). As we demonstrate, this novel language allows for expressing some nontrivial planning tasks in a declarative way. Furthermore, it can be utilized for representing planning problems under other optimality criteria, such as computing ``shortest'' plans (with the least number of steps), and refinement combinations of cheapest and fastest plans. We study complexity aspects of the language Kc and provide a transformation to logic programs, such that planning problems are solved via answer set programming. Furthermore, we report experimental results on selected problems. Our experience is encouraging that answer set planning may be a valuable approach to expressive planning systems in which intricate planning problems can be naturally specified and solved.

Download Full-text