Boundary Extension Features for Width-Based Planning with Simulators on Continuous-State Domains

Width-based planning algorithms have been demonstrated to be competitive with state-of-the-art heuristic search and SAT-based approaches, without requiring access to a model of action effects and preconditions, just access to a black-box simulator. Width-based planners search is guided by a measure of the novelty of states, that requires observations on simulator states to be given as a set of features. This paper proposes agnostic feature mapping mechanisms that define the features online, as exploration progresses and the domain of continuous state variables is revealed. We demonstrate the effectiveness of these features on the OpenAI gym "classical control" suite of benchmarks. We compare our online planners with state-of-the-art deep reinforcement learning algorithms, and show that width-based planners using our features can find policies of the same quality with significantly less computational resources.

Download Full-text

Subgoaling Techniques for Satisficing and Optimal Numeric Planning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11875 ◽

2020 ◽

Vol 68 ◽

pp. 691-752

Author(s):

Enrico Scala ◽

Patrik Haslum ◽

Sylvie Thiébaux ◽

Miquel Ramirez

Keyword(s):

State Space ◽

Heuristic Search ◽

State Of The Art ◽

Planning Problem ◽

State Variables ◽

The Core ◽

State Space Search ◽

Theoretical Assumptions ◽

Good Trade ◽

Core Idea

This paper studies novel subgoaling relaxations for automated planning with propositional and numeric state variables. Subgoaling relaxations address one source of complexity of the planning problem: the requirement to satisfy conditions simultaneously. The core idea is to relax this requirement by recursively decomposing conditions into atomic subgoals that are considered in isolation. Such relaxations are typically used for pruning, or as the basis for computing admissible or inadmissible heuristic estimates to guide optimal or satis_cing heuristic search planners. In the last decade or so, the subgoaling principle has underpinned the design of an abundance of relaxation-based heuristics whose formulations have greatly extended the reach of classical planning. This paper extends subgoaling relaxations to support numeric state variables and numeric conditions. We provide both theoretical and practical results, with the aim of reaching a good trade-o_ between accuracy and computation costs within a heuristic state-space search planner. Our experimental results validate the theoretical assumptions, and indicate that subgoaling substantially improves on the state of the art in optimal and satisficing numeric planning via forward state-space search.

Download Full-text

Efficient Black-Box Planning Using Macro-Actions with Focused Effects

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/554 ◽

2021 ◽

Author(s):

Cameron Allen ◽

Michael Katz ◽

Tim Klinger ◽

George Konidaris ◽

Matthew Riemer ◽

...

Keyword(s):

State Of The Art ◽

Search Tree ◽

Black Box ◽

Domain Model ◽

State Variables ◽

Explicit Model ◽

Wide Range ◽

Full Domain

The difficulty of deterministic planning increases exponentially with search-tree depth. Black-box planning presents an even greater challenge, since planners must operate without an explicit model of the domain. Heuristics can make search more efficient, but goal-aware heuristics for black-box planning usually rely on goal counting, which is often quite uninformative. In this work, we show how to overcome this limitation by discovering macro-actions that make the goal-count heuristic more accurate. Our approach searches for macro-actions with focused effects (i.e. macros that modify only a small number of state variables), which align well with the assumptions made by the goal-count heuristic. Focused macros dramatically improve black-box planning efficiency across a wide range of planning domains, sometimes beating even state-of-the-art planners with access to a full domain model.

Download Full-text

Model-based deep reinforcement learning with heuristic search for satellite attitude control

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-05-2018-0086 ◽

2019 ◽

Vol 46 (3) ◽

pp. 415-420

Author(s):

Ke Xu ◽

Fengge Wu ◽

Junsuo Zhao

Keyword(s):

Reinforcement Learning ◽

Heuristic Search ◽

Attitude Control ◽

Local Optima ◽

Content Type ◽

Model Based ◽

Model Free ◽

Satellite Attitude ◽

Satellite Attitude Control ◽

Classical Control

Purpose Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional – integral – derivative and slide mode control as major solutions, facing problems with adaptability and automation. Design/methodology/approach In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms. Findings Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning. Originality/value This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.

Download Full-text

A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains

Journal of Artificial Intelligence Research ◽

10.1613/jair.2529 ◽

2009 ◽

Vol 34 ◽

pp. 27-59 ◽

Cited By ~ 11

Author(s):

N. Meuleau ◽

E. Benazera ◽

R. I. Brafman ◽

E. A. Hansen ◽

Mausam

Keyword(s):

State Space ◽

Heuristic Search ◽

Resource Constraints ◽

Computational Effort ◽

The State ◽

Continuous Variables ◽

State Variables ◽

Heuristic Search Algorithms ◽

Continuous State ◽

Stochastic Domains

We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO* algorithm, a generalization of the AO* algorithm that performs search in a hybrid state space that is modeled using both discrete and continuous state variables, where the continuous variables represent monotonic resources. Like other heuristic search algorithms, HAO* leverages knowledge of the start state and an admissible heuristic to focus computational effort on those parts of the state space that could be reached from the start state by following an optimal policy. We show that this approach is especially effective when resource constraints limit how much of the state space is reachable. Experimental results demonstrate its effectiveness in the domain that motivates our research: automated planning for planetary exploration rovers.

Download Full-text

Exploring Parameter Space in Reinforcement Learning

Paladyn Journal of Behavioral Robotics ◽

10.2478/s13230-010-0002-4 ◽

2010 ◽

Vol 1 (1) ◽

Cited By ~ 18

Author(s):

Thomas Rückstieß ◽

Frank Sehnke ◽

Tom Schaul ◽

Daan Wierstra ◽

Yi Sun ◽

...

Keyword(s):

Reinforcement Learning ◽

Parameter Space ◽

Robot Control ◽

State Of The Art ◽

Black Box ◽

General Function ◽

Natural Evolution ◽

State Dependent ◽

Learning Parameter ◽

Function Approximator

AbstractThis paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration. Both outperform state-of-the-art algorithms in several complex high-dimensional tasks commonly found in robot control. Furthermore, we describe how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space.

Download Full-text

Evaluation of a "Black-Box" State-of-the-Art Vision-Based Navigation Algorithm for GPS-Denied Navigation

Proceedings of the 33rd International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2020) ◽

10.33012/2020.17765 ◽

2020 ◽

Author(s):

Simone B. Bortolami ◽

Helen Webb ◽

Michael Richman ◽

Peter Norton

Keyword(s):

State Of The Art ◽

Black Box ◽

Navigation Algorithm ◽

Vision Based Navigation ◽

Gps Denied Navigation

Download Full-text

Review of the Applications of Deep Learning in Bioinformatics

Current Bioinformatics ◽

10.2174/1574893615999200711165743 ◽

2021 ◽

Vol 15 (8) ◽

pp. 898-911

Author(s):

Yongqing Zhang ◽

Jianrong Yan ◽

Siyu Chen ◽

Meiqin Gong ◽

Dongrui Gao ◽

...

Keyword(s):

Deep Learning ◽

Drug Discovery ◽

Biomedical Imaging ◽

State Of The Art ◽

Black Box ◽

Medical Data ◽

Biological Data ◽

High Dimensional ◽

Biological Research ◽

Process Data

Rapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Drone Deep Reinforcement Learning: A Review

Electronics ◽

10.3390/electronics10090999 ◽

2021 ◽

Vol 10 (9) ◽

pp. 999

Author(s):

Ahmad Taher Azar ◽

Anis Koubaa ◽

Nada Ali Mohamed ◽

Habiba A. Ibrahim ◽

Zahra Fathy Ibrahim ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Real Life ◽

Environment Monitoring ◽

Simulated Environments ◽

Infrastructure Inspection ◽

Remote Sensing Mapping ◽

And Control ◽

The Military ◽

Uav Navigation

Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios.

Download Full-text