ATSIS: Achieving the Ad hoc Teamwork by Sub-task Inference and Selection

In an ad hoc teamwork setting, the team needs to coordinate their activities to perform a task without prior agreement on how to achieve it. The ad hoc agent cannot communicate with its teammates but it can observe their behaviour and plan accordingly. To do so, the existing approaches rely on the teammates' behaviour models. However, the models may not be accurate, which can compromise teamwork. For this reason, we present Ad Hoc Teamwork by Sub-task Inference and Selection (ATSIS) algorithm that uses a sub-task inference without relying on teammates' models. First, the ad hoc agent observes its teammates to infer which sub-tasks they are handling. Based on that, it selects its own sub-task using a partially observable Markov decision process that handles the uncertainty of the sub-task inference. Last, the ad hoc agent uses the Monte Carlo tree search to find the set of actions to perform the sub-task. Our experiments show the benefits of ATSIS for robust teamwork.

Download Full-text

An Extensible and Modular Design and Implementation of Monte Carlo Tree Search for the JVM

10.20944/preprints202107.0622.v1 ◽

2021 ◽

Author(s):

Larkin Liu ◽

Jun Tao Luo

Keyword(s):

Monte Carlo ◽

Markov Decision Process ◽

Decision Process ◽

Modular Design ◽

Object Oriented Programming ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Domain Specific ◽

Markov Decision ◽

Domain Specific Knowledge

Flexible implementations of Monte Carlo Tree Search (MCTS), combined with domain specific knowledge and hybridization with other search algorithms, can be a very powerful for the solution of problems in complex planning. We introduce mctreesearch4j, a standard MCTS implementation written as a standard JVM library following key design principles of object oriented programming. We define key class abstractions allowing the MCTS library to flexibly adapt to any well defined Markov Decision Process or turn-based adversarial game. Furthermore, our library is designed to be modular and extensible, utilizing class inheritance and generic typing to standardize custom algorithm definitions. We demon- strate that the design of the MCTS implementation provides ease of adaptation for unique heuristics and customization across varying Markov Decision Process (MDP) domains. In addition, the implementation is reasonably performant and accurate for standard MDP’s. In addition, via the implementation of mctreesearch4j, the nuances of different types of MCTS algorithms are discussed.

Download Full-text

Monte-Carlo-based partially observable Markov decision process approximations for adaptive sensing

2008 9th International Workshop on Discrete Event Systems ◽

10.1109/wodes.2008.4605941 ◽

2008 ◽

Cited By ~ 9

Author(s):

Edwin K. P. Chong ◽

Christopher M. Kreucher ◽

Alfred O. Hero

Keyword(s):

Monte Carlo ◽

Markov Decision Process ◽

Decision Process ◽

Adaptive Sensing ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Generalized Mean Estimation in Monte-Carlo Tree Search

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/332 ◽

2020 ◽

Author(s):

Tuan Dam ◽

Pascal Klink ◽

Carlo D'Eramo ◽

Jan Peters ◽

Joni Pajarinen

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Power Mean ◽

Monte Carlo Tree Search ◽

Average Value ◽

Mean Estimation ◽

Markov Decision ◽

Speed Up ◽

Upper Confidence Bound ◽

Partially Observable

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

Download Full-text

A constraint partially observable semi-Markov decision process for the attack–defence relationships in various critical infrastructures

Cyber-Physical Systems ◽

10.1080/23335777.2021.1879935 ◽

2021 ◽

pp. 1-26

Author(s):

Nadia Niknami ◽

Jie Wu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Critical Infrastructures ◽

Markov Decision ◽

Partially Observable

Download Full-text

A Partially Observable Markov Decision Process-Based Blackboard Architecture for Cognitive Agents in Partially Observable Environments

IEEE Transactions on Cognitive and Developmental Systems ◽

10.1109/tcds.2020.3034428 ◽

2020 ◽

pp. 1-1

Author(s):

Hideaki Itoh ◽

Hidehiko Nakano ◽

Ryota Tokushima ◽

Hisao Fukumoto ◽

Hiroshi Wakuya

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Cognitive Agents ◽

Blackboard Architecture ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Operational State Evaluation and Maintenance Decision-making Method for Multi-state CNC Machine Tools based on Partially Observable Markov Decision Process

2020 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC) ◽

10.1109/sdpc49476.2020.9353134 ◽

2020 ◽

Author(s):

Fang Zixuan ◽

Wang Xiaodong ◽

Wang Lifang

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Machine Tools ◽

Cnc Machine Tools ◽

Cnc Machine ◽

State Evaluation ◽

Markov Decision ◽

Maintenance Decision ◽

Partially Observable

Download Full-text

An Evacuation Decision Making Model for Firefighters in Ad Hoc Robot Network

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.504 ◽

2013 ◽

Vol 756-759 ◽

pp. 504-508

Author(s):

De Min Li ◽

Jian Zou ◽

Kai Kai Yue ◽

Hong Yun Guan ◽

Jia Cun Wang

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Ad Hoc ◽

Decision Making Process ◽

Decision Method ◽

Markov Decision ◽

Critical Problems ◽

Fire Scene ◽

Decision Making Model

Evacuation for a firefighter in complex fire scene is challenge problem. In this paper, we discuss a firefighters evacuation decision making model in ad hoc robot network on fire scene. Due to the dynamics on fire scene, we know that the sensed information in ad hoc robot network is also dynamically variance. So in this paper, we adapt dynamic decision method, Markov decision process, to model the firefighters decision making process for evacuation from fire scene. In firefighting decision making process, we know that the critical problems are how to define action space and evaluate the transition law in Markov decision process. In this paper, we discuss those problems according to the triangular sensors situation in ad hoc robot network and describe a decision making model for a firefighters evacuation the in the end.

Download Full-text

Partially observable Markov Decision Process to prioritize software defects

10.32920/ryerson.14638470 ◽

2021 ◽

Author(s):

Shirin Akbarinasaji

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Tracking System ◽

Dependency Graph ◽

Relative Importance ◽

Bug Reports ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Issue Tracking System

Background: Bug tracking systems receive many bug reports daily. Although the software quality team aims to identify and resolve these bugs, they are never able to fix all of the reported bugs in the issue tracking system before the release deadline. However, postponing the bug fixing may have some consequences. Prioritization of bug reports will help the software manager decide which bugs to fix and which bugs to postpone. Typically, bug reports are prioritized based on the severity, priority, time and effort for fixing, customer pressure, etc. Aim: Previous studies have shown that these factors may not be appropriate for prioritization. Therefore, relying on them to automate bug prioritization might be misleading. In this dissertation, we aim to prioritize bug reports with respect to the consequence of not fixing the bugs in terms of their relative importance in the issue tracking system. Method: In order to measure the relative importance of bugs in the issue tracking system, we propose the construction of a dependency graph based on the reported dependency-blocking information in the issue tracking system. Two metrics, namely depth and degree, are used to measure the relative importance of the bugs. However, there is uncertainty in the dependency graph structure as the dependency information is discovered manually and gradually. Owing to this uncertainty, prioritization of bugs in the descending order of depth and degree may be misleading. To handle the uncertainty, we propose a novel approach of a partially observable Markov decision process (POMDP) and partially observable Monte Carlo planning (POMCP). Result: To check the feasibility of the proposed approach, we analyzed seven years of data from an open source project, Firefox, and a commercial project. We compared the proposed policy with the developer policy, maximum policy, and random policy. Conclusion: The results suggest that software practitioners do not consider the relative importance of bugs in their current practice. The proposed framework can be combined with practitioners’ expertise to prioritize bugs more effectively and take the depth and degree of bugs into account. In practice, the POMDP framework with the POMCP planner can help practitioners sequentially select bugs to minimize the connectivity of the dependency graph.

Download Full-text