Interactive spoken content retrieval by extended query model and continuous state space Markov Decision Process

Author(s):  
Tsung-Hsien Wen ◽  
Hung-yi Lee ◽  
Pei-hao Su ◽  
Lin-Shan Lee
Author(s):  
Takeshi Tateyama ◽  
◽  
Seiichi Kawata ◽  
Yoshiki Shimomura ◽  
◽  
...  

k-certainty exploration method, an efficient reinforcement learning algorithm, is not applied to environments whose state space is continuous because continuous state space must be changed to discrete state space. Our purpose is to construct discrete semi-Markov decision process (SMDP) models of such environments using growing cell structures to autonomously divide continuous state space then usingk-certainty exploration method to construct SMDP models. Multiagentk-certainty exploration method is then used to improve exploration efficiency. Mobile robot simulation demonstrated our proposal's usefulness and efficiency.


1985 ◽  
Vol 17 (02) ◽  
pp. 424-442 ◽  
Author(s):  
A. Federgruen ◽  
P. Zipkin

Special algorithms have been developed to compute an optimal (s, S) policy for an inventory model with discrete demand and under standard assumptions (stationary data, a well-behaved one-period cost function, full backlogging and the average cost criterion). We present here an iterative algorithm for continuous demand distributions which avoids any form of prior discretization. The method can be viewed as a modified form of policy iteration applied to a Markov decision process with continuous state space. For phase-type distributions, the calculations can be done in closed form.


1985 ◽  
Vol 17 (2) ◽  
pp. 424-442 ◽  
Author(s):  
A. Federgruen ◽  
P. Zipkin

Special algorithms have been developed to compute an optimal (s, S) policy for an inventory model with discrete demand and under standard assumptions (stationary data, a well-behaved one-period cost function, full backlogging and the average cost criterion). We present here an iterative algorithm for continuous demand distributions which avoids any form of prior discretization. The method can be viewed as a modified form of policy iteration applied to a Markov decision process with continuous state space. For phase-type distributions, the calculations can be done in closed form.


Author(s):  
Hyungseok Song ◽  
Hyeryung Jang ◽  
Hai H. Tran ◽  
Se-eun Yoon ◽  
Kyunghwan Son ◽  
...  

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explosion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which provably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.


2010 ◽  
Vol 190 (1) ◽  
pp. 289-309 ◽  
Author(s):  
Lars Relund Nielsen ◽  
Erik Jørgensen ◽  
Søren Højsgaard

Sign in / Sign up

Export Citation Format

Share Document