optimality equation
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 3)

H-INDEX

13
(FIVE YEARS 1)

2020 ◽  
Vol 44 (1) ◽  
pp. 53-58
Author(s):  
AAK Majumdar

This paper considers a variant of the Reve’s puzzle with n (≥1) discs and an evildoer, which can be placed directly on top of a smaller disc any number of times. Denoting by E(n) the minimum number of moves required to solve the new variant, we give a scheme to find the optimality equation satisfied by E(n). We then find an explicit form of E(n). Journal of Bangladesh Academy of Sciences, Vol. 44, No. 1, 53-58, 2020


2020 ◽  
Vol 43 (2) ◽  
pp. 205-209
Author(s):  
AAK Majumdar

This paper deals with a variant of the classical Tower of Hanoi problem with n ( ≥ 1) discs, of which r discs are evildoers, each of which can be placed directly on top of a smaller disc any number of times. Denoting by E(n, r) the minimum number of moves required to solve the new variant, is given a scheme find the optimality equation satisfied by E(n, r). An explicit form of E(n, r) is then obtained. Journal of Bangladesh Academy of Sciences, Vol. 43, No. 2, 205-209, 2019


2018 ◽  
Vol 42 (2) ◽  
pp. 191-199
Author(s):  
AAK Majumdar

The 4-peg Tower of Hanoi problem, commonly known as the Reve’s puzzle, is well-known. Motivated by the optimality equation satisfied by the optimal value function M(n) satisfied in case of the Reve’s puzzle, (Matsuura et al. 2008) posed the following generalized recurrence relation T(n, a) = min {aT(n-t, a)+S(t,3)}             1≤ t ≤ n where n ≥ 1 and a ≥ 2 are integers, and S(t, 3) = 2t – 1 is the solution of the 3-peg Tower of Hanoi problem with t discs. Some local-value relationships are satisfied by T(n, a) (Majumdar et al. 2016). This paper studies the properties of  T(n+1, a) – T(n, a) more closely for the case when a is an integer not of the form 2i for any integer i ≥ 2. Journal of Bangladesh Academy of Sciences, Vol. 42, No. 2, 191-199, 2018


2015 ◽  
Vol 47 (4) ◽  
pp. 1064-1087 ◽  
Author(s):  
Xianping Guo ◽  
Xiangxiang Huang ◽  
Yonghui Huang

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).


2015 ◽  
Vol 47 (04) ◽  
pp. 1064-1087 ◽  
Author(s):  
Xianping Guo ◽  
Xiangxiang Huang ◽  
Yonghui Huang

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).


2015 ◽  
Vol 52 (2) ◽  
pp. 419-440
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-De-Oca ◽  
Karel Sladký

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.


2015 ◽  
Vol 52 (02) ◽  
pp. 419-440 ◽  
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-De-Oca ◽  
Karel Sladký

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.


Sign in / Sign up

Export Citation Format

Share Document