Construction of Semi-Markov Decision Process Models of Continuous State Space Environments Using Growing Cell Structures and Multiagentk-Certainty Exploration Method

Author(s):  
Takeshi Tateyama ◽  
◽  
Seiichi Kawata ◽  
Yoshiki Shimomura ◽  
◽  
...  

k-certainty exploration method, an efficient reinforcement learning algorithm, is not applied to environments whose state space is continuous because continuous state space must be changed to discrete state space. Our purpose is to construct discrete semi-Markov decision process (SMDP) models of such environments using growing cell structures to autonomously divide continuous state space then usingk-certainty exploration method to construct SMDP models. Multiagentk-certainty exploration method is then used to improve exploration efficiency. Mobile robot simulation demonstrated our proposal's usefulness and efficiency.

1985 ◽  
Vol 17 (02) ◽  
pp. 424-442 ◽  
Author(s):  
A. Federgruen ◽  
P. Zipkin

Special algorithms have been developed to compute an optimal (s, S) policy for an inventory model with discrete demand and under standard assumptions (stationary data, a well-behaved one-period cost function, full backlogging and the average cost criterion). We present here an iterative algorithm for continuous demand distributions which avoids any form of prior discretization. The method can be viewed as a modified form of policy iteration applied to a Markov decision process with continuous state space. For phase-type distributions, the calculations can be done in closed form.


1985 ◽  
Vol 17 (2) ◽  
pp. 424-442 ◽  
Author(s):  
A. Federgruen ◽  
P. Zipkin

Special algorithms have been developed to compute an optimal (s, S) policy for an inventory model with discrete demand and under standard assumptions (stationary data, a well-behaved one-period cost function, full backlogging and the average cost criterion). We present here an iterative algorithm for continuous demand distributions which avoids any form of prior discretization. The method can be viewed as a modified form of policy iteration applied to a Markov decision process with continuous state space. For phase-type distributions, the calculations can be done in closed form.


1974 ◽  
Vol 11 (04) ◽  
pp. 669-677 ◽  
Author(s):  
D. R. Grey

Results on the behaviour of Markov branching processes as time goes to infinity, hitherto obtained for models which assume a discrete state-space or discrete time or both, are here generalised to a model with both state-space and time continuous. The results are similar but the methods not always so.


1996 ◽  
Vol 06 (12a) ◽  
pp. 2375-2388 ◽  
Author(s):  
MARKUS LOHMANN ◽  
JAN WENZELBURGER

This paper introduces a statistical method for detecting cycles in discrete time dynamical systems. The continuous state space is replaced by a discrete one consisting of cells. Hashing is used to represent the cells in the computer’s memory. An algorithm for a two-parameter bifurcation analysis is presented which uses the statistical method to detect cycles in the discrete state space. The output of this analysis is a colored cartogram where parameter regions are marked according to the long-term behavior of the system. Moreover, the algorithm allows the computation of basins of attraction of cycles.


Author(s):  
Hyungseok Song ◽  
Hyeryung Jang ◽  
Hai H. Tran ◽  
Se-eun Yoon ◽  
Kyunghwan Son ◽  
...  

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explosion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which provably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.


Sign in / Sign up

Export Citation Format

Share Document