Point-Based Monte Carto Online Planning in POMDPs

The online planning and learning in partially observable Markov decision processes are often intractable because belief states space has two curses: dimensionality and history. In order to address this problem, this paper proposes a point-based Monte Carto online planning approach in POMDPs. This approach involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes. Then Monte Carlo tree search algorithm is exploited to share the value of actions across each subtree of the search tree so as to minimise the mean squared error. The experimental results show that the proposed algorithm is effective in real-time system.

Download Full-text

Simple Regret Optimization in Online Planning for Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.4432 ◽

2014 ◽

Vol 51 ◽

pp. 165-205 ◽

Cited By ~ 5

Author(s):

Z. Feldman ◽

C. Domshlak

Keyword(s):

Markov Decision Processes ◽

State Of The Art ◽

Search Algorithm ◽

Empirical Evaluation ◽

Decision Processes ◽

Monte Carlo Tree Search ◽

Performance Loss ◽

Online Planning ◽

Markov Decision ◽

High Level

We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. Formally, the performance of algorithms for online planning is assessed in terms of simple regret, the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate and smooth reduction of simple regret. At a high level, BRUE is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. We further extend BRUE with a variant of ``learning by forgetting.'' The resulting parametrized algorithm, BRUE(alpha), exhibits even more attractive formal guarantees than BRUE. Our empirical evaluation shows that both BRUE and its generalization, BRUE(alpha), are also very effective in practice and compare favorably to the state-of-the-art.

Download Full-text

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Applied Sciences ◽

10.3390/app11031291 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1291

Author(s):

Bonwoo Gu ◽

Yunsick Sung

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Classification Criteria ◽

Tree Search ◽

Learning Method ◽

Board Game ◽

Ancient China ◽

Monte Carlo Tree Search ◽

High Level ◽

Tree Search Algorithm

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

Download Full-text

Deep learning inspired routing in ICN using Monte Carlo Tree Search algorithm

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2020.12.014 ◽

2021 ◽

Author(s):

Nitul Dutta ◽

Shobhit K. Patel ◽

Vadim Samusenkov ◽

Vigneswaran D.

Keyword(s):

Monte Carlo ◽

Deep Learning ◽

Search Algorithm ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Tree Search Algorithm

Download Full-text

Gesture commands for controlling high-level UAV behavior

SN Applied Sciences ◽

10.1007/s42452-021-04583-8 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

John Akagi ◽

T. Devon Morris ◽

Brady Moon ◽

Xingguang Chen ◽

Cameron K. Peterson

Keyword(s):

Search Algorithm ◽

Dynamic Environment ◽

List Type ◽

Hardware In The Loop ◽

Monte Carlo Tree Search ◽

Natural Interface ◽

Constrained Environments ◽

Novel Variant ◽

High Level ◽

Tree Search Algorithm

Abstract Directing groups of unmanned air vehicles (UAVs) is a task that typically requires the full attention of several operators. This can be prohibitive in situations where an operator must pay attention to their surroundings. In this paper we present a gesture device that assists operators in commanding UAVs in focus-constrained environments. The operator influences the UAVs’ behavior by using intuitive hand gesture movements. Gestures are captured using an accelerometer and gyroscope and then classified using a logistic regression model. Ten gestures were chosen to provide behaviors for a group of fixed-wing UAVs. These behaviors specified various searching, following, and tracking patterns that could be used in a dynamic environment. A novel variant of the Monte Carlo Tree Search algorithm was developed to autonomously plan the paths of the cooperating UAVs. These autonomy algorithms were executed when their corresponding gesture was recognized by the gesture device. The gesture device was trained to classify the ten gestures and accurately identified them 95% of the time. Each of the behaviors associated with the gestures was tested in hardware-in-the-loop simulations and the ability to dynamically switch between them was demonstrated. The results show that the system can be used as a natural interface to assist an operator in directing a fleet of UAVs. Article highlights A gesture device was created that enables operators to command a group of UAVs in focus-constrained environments. Each gesture triggers high-level commands that direct a UAV group to execute complex behaviors. Software simulations and hardware-in-the-loop testing shows the device is effective in directing UAV groups.

Download Full-text

Development of rehabilitation system (RehabGame) through Monte-Carlo tree search algorithm using kinect and Myo sensor interface

2017 Computing Conference ◽

10.1109/sai.2017.8252217 ◽

2017 ◽

Cited By ~ 3

Author(s):

Shabnam Sadeghi Esfahlani ◽

George Wilson

Keyword(s):

Monte Carlo ◽

Search Algorithm ◽

Tree Search ◽

Sensor Interface ◽

Monte Carlo Tree Search ◽

Rehabilitation System ◽

Tree Search Algorithm

Download Full-text

Adjustment of Difficulty Level on Wobble Board-Based Game Using Monte Carlo Tree Search Algorithm

2018 5th International Conference on Data and Software Engineering (ICoDSE) ◽

10.1109/icodse.2018.8705843 ◽

2018 ◽

Author(s):

Adi Purnama ◽

Saiful Akbar ◽

Dody Dharma

Keyword(s):

Monte Carlo ◽

Search Algorithm ◽

Difficulty Level ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Tree Search Algorithm

Download Full-text

A modified Monte-Carlo Tree Search Algorithm for Two-sided Assembly Line Balancing Problem

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2019.11.483 ◽

2019 ◽

Vol 52 (13) ◽

pp. 1920-1924

Author(s):

Chuanxun Wu ◽

Xiaofeng Hu ◽

Yahui Zhang ◽

Pengfei Wang

Keyword(s):

Monte Carlo ◽

Assembly Line ◽

Search Algorithm ◽

Assembly Line Balancing ◽

Line Balancing ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Assembly Line Balancing Problem ◽

Tree Search Algorithm

Download Full-text

Heuristic Model Checking using a Monte-Carlo Tree Search Algorithm

Proceedings of the 2015 on Genetic and Evolutionary Computation Conference - GECCO '15 ◽

10.1145/2739480.2754767 ◽

2015 ◽

Cited By ~ 5

Author(s):

Simon Poulding ◽

Robert Feldt

Keyword(s):

Monte Carlo ◽

Model Checking ◽

Search Algorithm ◽

Tree Search ◽

Heuristic Model ◽

Monte Carlo Tree Search ◽

Tree Search Algorithm

Download Full-text

Importance sampling for online planning under uncertainty

The International Journal of Robotics Research ◽

10.1177/0278364918780322 ◽

2018 ◽

Vol 38 (2-3) ◽

pp. 162-181 ◽

Cited By ~ 2

Author(s):

Yuanfu Luo ◽

Haoyu Bai ◽

David Hsu ◽

Wee Sun Lee

Keyword(s):

Importance Sampling ◽

Autonomous Vehicles ◽

State Of The Art ◽

Monte Carlo Sampling ◽

Planning Under Uncertainty ◽

Online Planning ◽

Markov Decision ◽

Partially Observable ◽

Robotic Tasks ◽

General Method

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.

Download Full-text

A Framework for Sequential Planning in Multi-Agent Settings

Journal of Artificial Intelligence Research ◽

10.1613/jair.1579 ◽

2005 ◽

Vol 24 ◽

pp. 49-79 ◽

Cited By ~ 93

Author(s):

P. J. Gmytrasiewicz ◽

P. Doshi

Keyword(s):

Traditional Approach ◽

Value Functions ◽

Value Iteration ◽

Markov Decision ◽

Multi Agent ◽

Carry Over ◽

The Cost ◽

Partially Observable ◽

Belief States ◽

Do So

This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian updates to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents' autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piece-wise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be non-unique and do not capture off-equilibrium behaviors. We do so at the cost of having to represent, process and continuously revise models of other agents. Since the agent's beliefs may be arbitrarily nested, the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions.

Download Full-text