Interactive spoken content retrieval by extended query model and continuous state space Markov Decision Process

k-certainty exploration method, an efficient reinforcement learning algorithm, is not applied to environments whose state space is continuous because continuous state space must be changed to discrete state space. Our purpose is to construct discrete semi-Markov decision process (SMDP) models of such environments using growing cell structures to autonomously divide continuous state space then usingk-certainty exploration method to construct SMDP models. Multiagentk-certainty exploration method is then used to improve exploration efficiency. Mobile robot simulation demonstrated our proposal's usefulness and efficiency.

Download Full-text

Computing optimal (s, S) policies in inventory models with continuous demands

Advances in Applied Probability ◽

10.1017/s0001867800015056 ◽

1985 ◽

Vol 17 (02) ◽

pp. 424-442 ◽

Cited By ~ 1

Author(s):

A. Federgruen ◽

P. Zipkin

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Inventory Models ◽

Cost Criterion ◽

Continuous State Space ◽

Continuous State ◽

Phase Type ◽

Phase Type Distributions ◽

Markov Decision ◽

Continuous Demand

Special algorithms have been developed to compute an optimal (s, S) policy for an inventory model with discrete demand and under standard assumptions (stationary data, a well-behaved one-period cost function, full backlogging and the average cost criterion). We present here an iterative algorithm for continuous demand distributions which avoids any form of prior discretization. The method can be viewed as a modified form of policy iteration applied to a Markov decision process with continuous state space. For phase-type distributions, the calculations can be done in closed form.

Download Full-text

Computing optimal (s, S) policies in inventory models with continuous demands

Advances in Applied Probability ◽

10.2307/1427149 ◽

1985 ◽

Vol 17 (2) ◽

pp. 424-442 ◽

Cited By ~ 19

Author(s):

A. Federgruen ◽

P. Zipkin

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Inventory Models ◽

Cost Criterion ◽

Continuous State Space ◽

Continuous State ◽

Phase Type ◽

Phase Type Distributions ◽

Markov Decision ◽

Continuous Demand

Special algorithms have been developed to compute an optimal (s, S) policy for an inventory model with discrete demand and under standard assumptions (stationary data, a well-behaved one-period cost function, full backlogging and the average cost criterion). We present here an iterative algorithm for continuous demand distributions which avoids any form of prior discretization. The method can be viewed as a modified form of policy iteration applied to a Markov decision process with continuous state space. For phase-type distributions, the calculations can be done in closed form.

Download Full-text

Semiparametric Estimation of Markov Decision Processes with Continuous State Space

SSRN Electronic Journal ◽

10.2139/ssrn.1654335 ◽

2010 ◽

Cited By ~ 1

Author(s):

Sorawoot Srisuma ◽

Oliver B. Linton

Keyword(s):

State Space ◽

Markov Decision Processes ◽

Semiparametric Estimation ◽

Decision Processes ◽

Continuous State Space ◽

Continuous State ◽

Markov Decision

Download Full-text

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning ◽

10.1109/adprl.2009.4927527 ◽

2009 ◽

Cited By ~ 7

Author(s):

Jun Ma ◽

Warren B. Powell

Keyword(s):

Least Squares ◽

Markov Decision Process ◽

Decision Process ◽

Recursive Least Squares ◽

Iteration Algorithm ◽

Continuous State ◽

Markov Decision ◽

Approximate Policy Iteration ◽

Policy Iteration Algorithm ◽

Action Spaces

Download Full-text

Blackwell optimal policies in a Markov decision process with a Borel state space

Mathematical Methods of Operations Research ◽

10.1007/bf01432969 ◽

1994 ◽

Vol 40 (3) ◽

pp. 253-288 ◽

Cited By ~ 8

Author(s):

A. A. Yushkevich

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Borel State Space ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Strong 0-discount optimal policies in a Markov decision process with a Borel state space

Mathematical Methods of Operations Research ◽

10.1007/bf01415675 ◽

1995 ◽

Vol 42 (1) ◽

pp. 93-108 ◽

Cited By ~ 3

Author(s):

A. A. Yushkevich

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Borel State Space ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Solving Continual Combinatorial Selection via Deep Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/481 ◽

2019 ◽

Author(s):

Hyungseok Song ◽

Hyeryung Jang ◽

Hai H. Tran ◽

Se-eun Yoon ◽

Kyunghwan Son ◽

...

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Joint Action ◽

Expressive Power ◽

State Space Explosion ◽

Exponential Increase ◽

Markov Decision ◽

Action Spaces

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explosion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which provably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.

Download Full-text