Mirror decent algorithm for a multi-armed bandit governed by a stationary finite state Markov chain

Author(s):  
Alexander Nazin ◽  
Boris Miller
Keyword(s):  
2003 ◽  
Vol 17 (4) ◽  
pp. 487-501 ◽  
Author(s):  
Yang Woo Shin ◽  
Bong Dae Choi

We consider a single-server queue with exponential service time and two types of arrivals: positive and negative. Positive customers are regular ones who form a queue and a negative arrival has the effect of removing a positive customer in the system. In many applications, it might be more appropriate to assume the dependence between positive arrival and negative arrival. In order to reflect the dependence, we assume that the positive arrivals and negative arrivals are governed by a finite-state Markov chain with two absorbing states, say 0 and 0′. The epoch of absorption to the states 0 and 0′ corresponds to an arrival of positive and negative customers, respectively. The Markov chain is then instantly restarted in a transient state, where the selection of the new state is allowed to depend on the state from which absorption occurred.The Laplace–Stieltjes transforms (LSTs) of the sojourn time distribution of a customer, jointly with the probability that the customer completes his service without being removed, are derived under the combinations of service disciplines FCFS and LCFS and the removal strategies RCE and RCH. The service distribution of phase type is also considered.


2014 ◽  
Vol 51 (4) ◽  
pp. 1114-1132 ◽  
Author(s):  
Bernhard C. Geiger ◽  
Christoph Temmel

A lumping of a Markov chain is a coordinatewise projection of the chain. We characterise the entropy rate preservation of a lumping of an aperiodic and irreducible Markov chain on a finite state space by the random growth rate of the cardinality of the realisable preimage of a finite-length trajectory of the lumped chain and by the information needed to reconstruct original trajectories from their lumped images. Both are purely combinatorial criteria, depending only on the transition graph of the Markov chain and the lumping function. A lumping is strongly k-lumpable, if and only if the lumped process is a kth-order Markov chain for each starting distribution of the original Markov chain. We characterise strong k-lumpability via tightness of stationary entropic bounds. In the sparse setting, we give sufficient conditions on the lumping to both preserve the entropy rate and be strongly k-lumpable.


2005 ◽  
Vol 37 (4) ◽  
pp. 1015-1034 ◽  
Author(s):  
Saul D. Jacka ◽  
Zorana Lazic ◽  
Jon Warren

Let (Xt)t≥0 be a continuous-time irreducible Markov chain on a finite state space E, let v be a map v: E→ℝ\{0}, and let (φt)t≥0 be an additive functional defined by φt=∫0tv(Xs)d s. We consider the case in which the process (φt)t≥0 is oscillating and that in which (φt)t≥0 has a negative drift. In each of these cases, we condition the process (Xt,φt)t≥0 on the event that (φt)t≥0 is nonnegative until time T and prove weak convergence of the conditioned process as T→∞.


ETRI Journal ◽  
2017 ◽  
Vol 39 (5) ◽  
pp. 718-728 ◽  
Author(s):  
Ahmed Abdul Salam ◽  
Ray Sheriff ◽  
Saleh Al-Araji ◽  
Kahtan Mezher ◽  
Qassim Nasir

2019 ◽  
Vol 22 (08) ◽  
pp. 1950047 ◽  
Author(s):  
TAK KUEN SIU ◽  
ROBERT J. ELLIOTT

The hedging of a European-style contingent claim is studied in a continuous-time doubly Markov-modulated financial market, where the interest rate of a bond is modulated by an observable, continuous-time, finite-state, Markov chain and the appreciation rate of a risky share is modulated by a continuous-time, finite-state, hidden Markov chain. The first chain describes the evolution of credit ratings of the bond over time while the second chain models the evolution of the hidden state of an underlying economy over time. Stochastic flows of diffeomorphisms are used to derive some hedge quantities, or Greeks, for the claim. A mixed filter-based and regime-switching Black–Scholes partial differential equation is obtained governing the price of the claim. It will be shown that the delta hedge ratio process obtained from stochastic flows is a risk-minimizing, admissible mean-self-financing portfolio process. Both the first-order and second-order Greeks will be considered.


2019 ◽  
Vol 2019 (1) ◽  
Author(s):  
Ruijuan Deng ◽  
Yong Ren

AbstractThe paper considers a class of multi-valued backward stochastic differential equations with subdifferential of a lower semi-continuous convex function with regime switching, whose generator is a continuous-time Markov chain with a finite state space. Firstly, we get the existence and uniqueness of the solution by the penalization method. Secondly, we prove that the solution of the original system is weakly convergent. Finally, we give an application to the homogenization of a class of multi-valued PDEs with Markov chain.


2004 ◽  
Vol 2004 (3) ◽  
pp. 197-208 ◽  
Author(s):  
Thordur Runolfsson

We study systems that are subject to sudden structural changes due to either changes in the operational mode of the system or failure. We consider linear dynamicalsystems that depend on a modal variable which is either modeled as a finite-state Markov chain or generated by an automaton that is subject to an external disturbance. In the Markov chain case, the objective of the control is to minimize a risk-sensitive cost functional. The risk-sensitive cost functional measures the risk sensitivity of the system to transitions caused by the random modal variable. In the case when a disturbed automaton describes the modal variable, the objective of the control is to make the system as robust to changes in the external disturbance as possible. Optimality conditions for both problems are derived and it is shown that the disturbance rejection problem is closely related to a certain risk-sensitive control problem for the hybrid system.


1982 ◽  
Vol 19 (02) ◽  
pp. 272-288 ◽  
Author(s):  
P. J. Brockwell ◽  
S. I. Resnick ◽  
N. Pacheco-Santiago

A study is made of the maximum, minimum and range on [0,t] of the integral processwhereSis a finite state-space Markov chain. Approximate results are derived by establishing weak convergence of a sequence of such processes to a Wiener process. For a particular family of two-state stationary Markov chains we show that the corresponding centered integral processes exhibit the Hurst phenomenon to a remarkable degree in their pre-asymptotic behaviour.


Sign in / Sign up

Export Citation Format

Share Document