A pause control approach to the value iteration scheme in average Markov decision processes

1998 ◽  
Vol 33 (4) ◽  
pp. 209-219 ◽  
Author(s):  
Rolando Cavazos-Cadena
2018 ◽  
Vol 10 (2) ◽  
pp. 1-22
Author(s):  
Sanaa Chafik ◽  
Abdelhadi Larach ◽  
Cherki Daoui

The standard Value Iteration (VI) algorithm, referred to as Value Iteration Pre-Jacobi (PJ-VI) algorithm, is the simplest Value Iteration scheme, and the well-known algorithm for solving Markov Decision Processes (MDPs). In the literature, several versions of VI algorithm were developed in order to reduce the number of iterations: the VI Jacobi (VI-J) algorithm, the Value Iteration Pre-Gauss-Seidel (VI-PGS) algorithm and the VI Gauss-Seidel (VI-GS) algorithm. In this article, the authors combine the advantages of VI Pre Gauss-Seidel algorithm, the decomposition technique and the parallelism in order to propose a new Parallel Hierarchical VI Pre-Gauss-Seidel algorithm. Experimental results show that their approach performs better than the traditional VI schemes in the case where the global problem can be decomposed into smaller problems.


2015 ◽  
Vol 13 (3) ◽  
pp. 47-57 ◽  
Author(s):  
Sanaa Chafik ◽  
Cherki Daoui

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.


Author(s):  
Mahsa Ghasemi ◽  
Ufuk Topcu

In conventional partially observable Markov decision processes, the observations that the agent receives originate from fixed known distributions. However, in a variety of real-world scenarios, the agent has an active role in its perception by selecting which observations to receive. We avoid combinatorial expansion of the action space from integration of planning and perception decisions, through a greedy strategy for observation selection that minimizes an information-theoretic measure of the state uncertainty. We develop a novel point-based value iteration algorithm that incorporates this greedy strategy to pick perception actions for each sampled belief point in each iteration. As a result, not only the solver requires less belief points to approximate the reachable subspace of the belief simplex, but it also requires less computation per iteration. Further, we prove that the proposed algorithm achieves a near-optimal guarantee on value function with respect to an optimal perception strategy, and demonstrate its performance empirically.


2020 ◽  
Vol 34 (04) ◽  
pp. 6778-6785
Author(s):  
Li Zhang ◽  
Xin Li ◽  
Sen Chen ◽  
Hongyu Zang ◽  
Jie Huang ◽  
...  

In this paper, we first formally define the problem set of spatially invariant Markov Decision Processes (MDPs), and show that Value Iteration Networks (VIN) and its extensions are computationally bounded to it due to the use of the convolution kernel. To generalize VIN to spatially variant MDPs, we propose Universal Value Iteration Networks (UVIN). In comparison with VIN, UVIN automatically learns a flexible but compact network structure to encode the transition dynamics of the problems and support the differentiable planning module. We evaluate UVIN with both spatially invariant and spatially variant tasks, including navigation in regular maze, chessboard maze, and Mars, and Minecraft item syntheses. Results show that UVIN can achieve similar performance as VIN and its extensions on spatially invariant tasks, and significantly outperforms other models on more general problems.


Sign in / Sign up

Export Citation Format

Share Document