Optimal Occupation in the Complete Graph

We consider N sites (N ≤ ∞), each of which may be either occupied or unoccupied. Time is discrete, and at each time unit a set of occupied sites may attempt to capture a previously unoccupied site. The attempt will be successful with a probability that depends on the number of sites making the attempt, in which case the new site will also be occupied. A benefit is gained when new sites are occupied, but capture attempts are costly. The problem of optimal occupation is formulated as a Markov decision process in which the admissible actions are occupation strategies and the cost is a function of the strategy and the number of occupied sites. A partial order on the state-action pairs is used to obtain a comparison result for stationary policies and qualitative results concerning monotonicity of the value function for the n-stage problem (n ≤ ∞). The optimal policies are partially characterized when the cost depends on the action only through the total number of occupation attempts made.

Download Full-text

How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?

Mathematics of Operations Research ◽

10.1287/moor.22.4.872 ◽

1997 ◽

Vol 22 (4) ◽

pp. 872-885 ◽

Cited By ~ 20

Author(s):

Alfred Müller

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Transition Probabilities ◽

Markov Decision ◽

The Value Function

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1017/s0001867800006960 ◽

2014 ◽

Vol 46 (01) ◽

pp. 121-138 ◽

Cited By ~ 1

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1239/aap/1396360106 ◽

2014 ◽

Vol 46 (1) ◽

pp. 121-138 ◽

Cited By ~ 2

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

Download Full-text

An Offloading Algorithm based on Markov Decision Process in Mobile Edge Computing System

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.15 ◽

2022 ◽

Vol 16 ◽

pp. 115-121

Author(s):

Bingxin Yao ◽

Bin Wu ◽

Siyun Wu ◽

Yin Ji ◽

Danggui Chen ◽

...

Keyword(s):

Energy Consumption ◽

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Wireless Channel ◽

Edge Computing ◽

Iteration Algorithm ◽

Mobile Edge Computing ◽

Markov Decision ◽

The Value Function

In this paper, an offloading algorithm based on Markov Decision Process (MDP) is proposed to solve the multi-objective offloading decision problem in Mobile Edge Computing (MEC) system. The feature of the algorithm is that MDP is used to make offloading decision. The number of tasks in the task queue, the number of accessible edge clouds and Signal-Noise-Ratio (SNR) of the wireless channel are taken into account in the state space of the MDP model. The offloading delay and energy consumption are considered to define the value function of the MDP model, i.e. the objective function. To maximize the value function, Value Iteration Algorithm is used to obtain the optimal offloading policy. According to the policy, tasks of mobile terminals (MTs) are offloaded to the edge cloud or central cloud, or executed locally. The simulation results show that the proposed algorithm can effectively reduce the offloading delay and energy consumption.

Download Full-text

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017949 ◽

2019 ◽

Vol 33 ◽

pp. 7949-7956

Author(s):

Silviu Pitis

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Discount Factor ◽

Theoretic Approach ◽

Decision Theoretic Approach ◽

Markov Decision ◽

The Value Function

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor γ

Download Full-text

A Markov Decision Process to Determine Optimal Policies in Moving Target

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security ◽

10.1145/3243734.3278489 ◽

2018 ◽

Cited By ~ 5

Author(s):

Jianjun Zheng ◽

Akbar Siami Namin

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Moving Target ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Blackwell optimal policies in a Markov decision process with a Borel state space

Mathematical Methods of Operations Research ◽

10.1007/bf01432969 ◽

1994 ◽

Vol 40 (3) ◽

pp. 253-288 ◽

Cited By ~ 8

Author(s):

A. A. Yushkevich

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Borel State Space ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Strong 0-discount optimal policies in a Markov decision process with a Borel state space

Mathematical Methods of Operations Research ◽

10.1007/bf01415675 ◽

1995 ◽

Vol 42 (1) ◽

pp. 93-108 ◽

Cited By ~ 3

Author(s):

A. A. Yushkevich

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Borel State Space ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Reinforcement Learning for Optimizing Driving Policies on Cruising Taxis Services

Sustainability ◽

10.3390/su12218883 ◽

2020 ◽

Vol 12 (21) ◽

pp. 8883

Author(s):

Kun Jin ◽

Wei Wang ◽

Xuedong Hua ◽

Wei Zhou

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

State Action ◽

Future Reward ◽

Long Run ◽

Markov Decision ◽

Action Value ◽

Data Expansion ◽

Taking Action ◽

The Value Function

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.

Download Full-text

Optimal policies based on QoS for adaptive communication system with Markov Decision Process

2008 2nd International Conference on Anti-counterfeiting, Security and Identification ◽

10.1109/iwasid.2008.4688369 ◽

2008 ◽

Author(s):

Yongxiang Wu ◽

Shengbo Hu

Keyword(s):

Markov Decision Process ◽

Communication System ◽

Decision Process ◽

Adaptive Communication ◽

Optimal Policies ◽

Markov Decision

Download Full-text