Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning

We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.

Download Full-text

Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017941 ◽

2019 ◽

Vol 33 ◽

pp. 7941-7948

Author(s):

Thomy Phan ◽

Lenz Belzner ◽

Marie Kiermeier ◽

Markus Friedrich ◽

Kyrill Schmid ◽

...

Keyword(s):

State Of The Art ◽

Open Loop ◽

Benchmark Problems ◽

Tree Search ◽

Computationally Efficient ◽

Fixed Size ◽

Thompson Sampling ◽

Partially Observable ◽

Memory Resources ◽

Stochastic Tree

State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to openloop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performancememory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.

Download Full-text

On Thompson Sampling and Asymptotic Optimality

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/688 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jan Leike ◽

Tor Lattimore ◽

Laurent Orseau ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Asymptotic Optimality ◽

Thompson Sampling ◽

Stochastic Environments ◽

Optimal Value ◽

Partially Observable ◽

General Stochastic

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Download Full-text

Artificial Intelligence for Extended Software Robots, Applications, Algorithms, and Simulators

AI and Big Data’s Potential for Disruptive Innovation - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9687-5.ch003 ◽

2020 ◽

pp. 71-92

Author(s):

Gayathri Rajendran ◽

Uma Vijayasundaram

Keyword(s):

Artificial Intelligence ◽

Path Planning ◽

Domain Knowledge ◽

Robot Navigation ◽

Machine Learning Algorithms ◽

Computational Time ◽

Advanced Technique ◽

Navigation Data ◽

Planning Algorithms

Robotics has become a rapidly emerging branch of science, addressing the needs of humankind by way of advanced technique, like artificial intelligence (AI). This chapter gives detailed explanation about the background knowledge required in implementing the software robots. This chapter has an in-depth explanation about different types of software robots with respect to different applications. This chapter would also highlight some of the important contributions made in this field. Path planning algorithms are required for performing robot navigation efficiently. This chapter discusses several robot path planning algorithms which help in utilizing the domain knowledge, avoiding the possible obstacles, and successfully accomplishing the tasks in lesser computational time. This chapter would also provide a case study on robot navigation data and explain the significant of machine learning algorithms in decision making. This chapter would also discuss some of the potential simulators used in implementing software robots.

Download Full-text

Uniqueness conditions for the infinite-planning horizon Open-Loop Linear Quadratic Differential Game.

Proceedings of the 44th IEEE Conference on Decision and Control ◽

10.1109/cdc.2005.1582705 ◽

2006 ◽

Cited By ~ 5

Author(s):

J. Engwerda

Keyword(s):

Differential Game ◽

Quadratic Differential ◽

Planning Horizon ◽

Open Loop ◽

Linear Quadratic ◽

Infinite Planning Horizon

Download Full-text

Open Loop Execution of Tree-Search Algorithms

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/327 ◽

2018 ◽

Cited By ~ 1

Author(s):

Erwan Lecarpentier ◽

Guillaume Infantes ◽

Charles Lesire ◽

Emmanuel Rachelson

Keyword(s):

Search Algorithms ◽

Open Loop ◽

Tree Search ◽

Open Loop Control ◽

Time Step ◽

Line Planning ◽

Loop Control ◽

Stochastic Planning ◽

On Line ◽

Planning Algorithms

In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain.

Download Full-text

Learning and planning in partially observable environments without prior domain knowledge

International Journal of Approximate Reasoning ◽

10.1016/j.ijar.2021.12.004 ◽

2022 ◽

Vol 142 ◽

pp. 147-160

Author(s):

Yunlong Liu ◽

Jianyang Zheng ◽

Fangfang Chang

Keyword(s):

Domain Knowledge ◽

Partially Observable

Download Full-text

Online Network Revenue Management Using Thompson Sampling

Operations Research ◽

10.1287/opre.2018.1755 ◽

2018 ◽

Vol 66 (6) ◽

pp. 1586-1602 ◽

Cited By ~ 29

Author(s):

Kris Johnson Ferreira ◽

David Simchi-Levi ◽

He Wang

Keyword(s):

Revenue Management ◽

Dynamic Pricing ◽

Domain Knowledge ◽

Poor Performance ◽

Management Problem ◽

Network Revenue Management ◽

Thompson Sampling ◽

Main Challenge ◽

Inventory Constraints ◽

Network Revenue

Thompson sampling is a randomized Bayesian machine learning method, whose original motivation was to sequentially evaluate treatments in clinical trials. In recent years, this method has drawn wide attention, as Internet companies have successfully implemented it for online ad display. In “Online network revenue management using Thompson sampling,” K. Ferreira, D. Simchi-Levi, and H. Wang propose using Thompson sampling for a revenue management problem where the demand function is unknown. A main challenge to adopt Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a linear program formulation to include inventory constraints. The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results. Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge. Finally, the proposed dynamic pricing algorithm is highly flexible and is applicable in a range of industries, from airlines and internet advertising all the way to online retailing.

Download Full-text

Open-Loop Nash Equilibria in the Non-cooperative Infinite-planning Horizon LQ Game

IFAC Proceedings Volumes ◽

10.3182/20120913-4-it-4027.00015 ◽

2012 ◽

Vol 45 (25) ◽

pp. 51-55 ◽

Cited By ~ 1

Author(s):

J.C. Engwerda

Keyword(s):

Nash Equilibria ◽

Planning Horizon ◽

Open Loop ◽

Infinite Planning Horizon

Download Full-text

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

IEEE Control Systems Letters ◽

10.1109/lcsys.2021.3137269 ◽

2021 ◽

pp. 1-1

Author(s):

Hongju Park ◽

Mohamad Kazem Shirani Faradonbeh

Keyword(s):

Thompson Sampling ◽

Partially Observable

Download Full-text

Approximability of Constant-horizon Constrained POMDP

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/775 ◽

2019 ◽

Author(s):

Majid Khonji ◽

Ashkan Jasour ◽

Brian Williams

Keyword(s):

Polynomial Time ◽

Decision Process ◽

Approximation Scheme ◽

Planning Horizon ◽

Probability Of Failure ◽

Decision Making Under Uncertainty ◽

Polynomial Time Approximation Scheme ◽

Time Approximation ◽

Markov Decision ◽

Partially Observable

Partially Observable Markov Decision Process (POMDP) is a fundamental framework for planning and decision making under uncertainty. POMDP is known to be intractable to solve or even approximate when the planning horizon is long (i.e., within a polynomial number of time steps). Constrained POMDP (C-POMDP) allows constraints to be specified on some aspects of the policy in addition to the objective function. When the constraints involve bounding the probability of failure, the problem is called Chance-Constrained POMDP (CC-POMDP). Our first contribution is a reduction from CC-POMDP to C-POMDP and a novel Integer Linear Programming (ILP) formulation. Thus, any algorithm for the later problem can be utilized to solve any instance of the former. Second, we show that unlike POMDP, when the length of the planning horizon is constant, (C)C-POMDP is NP-Hard. Third, we present the first Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies for constant-horizon (C)C-POMDP in polynomial time.

Download Full-text