An Offloading Algorithm based on Markov Decision Process in Mobile Edge Computing System

As transportation becomes more convenient and efficient, users move faster and faster. When a user leaves the service range of the original edge server, the original edge server needs to migrate the tasks offloaded by the user to other edge servers. An effective task migration strategy needs to fully consider the location of users, the load status of edge servers, and energy consumption, which make designing an effective task migration strategy a challenge. In this paper, we innovatively proposed a mobile edge computing (MEC) system architecture consisting of multiple smart mobile devices (SMDs), multiple unmanned aerial vehicle (UAV), and a base station (BS). Moreover, we establish the model of the Markov decision process with unknown rewards (MDPUR) based on the traditional Markov decision process (MDP), which comprehensively considers the three aspects of the migration distance, the residual energy status of the UAVs, and the load status of the UAVs. Based on the MDPUR model, we propose a advantage-based value iteration (ABVI) algorithm to obtain the effective task migration strategy, which can help the UAV group to achieve load balancing and reduce the total energy consumption of the UAV group under the premise of ensuring user service quality. Finally, the results of simulation experiments show that the ABVI algorithm is effective. In particular, the ABVI algorithm has better performance than the traditional value iterative algorithm. And in a dynamic environment, the ABVI algorithm is also very robust.

Download Full-text

A Service Migration Algorithm Based on Fuzzy Logic and Markov Decision Process for Mobile Edge Computing

10.1145/3469968.3469997 ◽

2021 ◽

Author(s):

Xinyi Huang ◽

Xiaojun Wu ◽

Sheng Yuan ◽

Zerui Dang ◽

Tao Li

Keyword(s):

Fuzzy Logic ◽

Markov Decision Process ◽

Decision Process ◽

Edge Computing ◽

Mobile Edge Computing ◽

Markov Decision ◽

Service Migration

Download Full-text

How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?

Mathematics of Operations Research ◽

10.1287/moor.22.4.872 ◽

1997 ◽

Vol 22 (4) ◽

pp. 872-885 ◽

Cited By ~ 20

Author(s):

Alfred Müller

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Transition Probabilities ◽

Markov Decision ◽

The Value Function

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1017/s0001867800006960 ◽

2014 ◽

Vol 46 (01) ◽

pp. 121-138 ◽

Cited By ~ 1

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text

Dynamic Service Migration in Mobile Edge Computing Based on Markov Decision Process

IEEE/ACM Transactions on Networking ◽

10.1109/tnet.2019.2916577 ◽

2019 ◽

Vol 27 (3) ◽

pp. 1272-1288 ◽

Cited By ~ 26

Author(s):

Shiqiang Wang ◽

Rahul Urgaonkar ◽

Murtaza Zafer ◽

Ting He ◽

Kevin Chan ◽

...

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Edge Computing ◽

Mobile Edge Computing ◽

Markov Decision ◽

Service Migration

Download Full-text

On Optimal Terminal Wealth Problems with Random Trading Times and Drawdown Constraints

Advances in Applied Probability ◽

10.1239/aap/1396360106 ◽

2014 ◽

Vol 46 (1) ◽

pp. 121-138 ◽

Cited By ~ 2

Author(s):

Ulrich Rieder ◽

Marc Wittlinger

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Stationary Policy ◽

General Utility ◽

Investment Problem ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Running Maximum ◽

The Value Function

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.

Download Full-text

Task Offloading with Power Control for Mobile Edge Computing Using Reinforcement Learning-Based Markov Decision Process

Mobile Information Systems ◽

10.1155/2020/7630275 ◽

2020 ◽

Vol 2020 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Bingxin Zhang ◽

Guopeng Zhang ◽

Weice Sun ◽

Kun Yang

Keyword(s):

Reinforcement Learning ◽

Power Control ◽

Markov Decision Process ◽

Decision Process ◽

Search Algorithm ◽

Edge Computing ◽

Mobile Edge Computing ◽

Multiple User ◽

Markov Decision ◽

Task Offloading

This paper proposes an efficient computation task offloading mechanism for mobile edge computing (MEC) systems. The studied MEC system consists of multiple user equipment (UEs) and multiple radio interfaces. In order to maximize the number of UEs benefitting from the MEC, the task offloading and power control strategy for a UE is optimized in a joint manner. However, the problem of finding the optimal solution is NP-hard. We then reformulate the problem as a Markov decision process (MDP) and develop a reinforcement learning- (RL-) based algorithm to solve the MDP. Simulation results show that the proposed RL-based algorithm achieves a near-optimal performance compared to the exhaustive search algorithm, and it also outperforms the received signal strength- (RSS-) based method no matter from the standpoint of the system (as it leads to a larger number of beneficial UEs) or an individual (as it generates a lower computation overhead for a UE).

Download Full-text

Offloading Time Optimization via Markov Decision Process in Mobile Edge Computing

IEEE Internet of Things Journal ◽

10.1109/jiot.2020.3033285 ◽

2020 ◽

pp. 1-1

Author(s):

Guisong Yang ◽

Ling Hou ◽

Xingyu He ◽

Daojing He ◽

Sammy Chan ◽

...

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Edge Computing ◽

Mobile Edge Computing ◽

Time Optimization ◽

Markov Decision

Download Full-text

Optimal Occupation in the Complete Graph

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800002989 ◽

1993 ◽

Vol 7 (3) ◽

pp. 369-385 ◽

Cited By ~ 1

Author(s):

Kyle Siegrist

Keyword(s):

Markov Decision Process ◽

Complete Graph ◽

Decision Process ◽

Value Function ◽

Comparison Result ◽

State Action ◽

Optimal Policies ◽

Markov Decision ◽

The Cost ◽

The Value Function

We consider N sites (N ≤ ∞), each of which may be either occupied or unoccupied. Time is discrete, and at each time unit a set of occupied sites may attempt to capture a previously unoccupied site. The attempt will be successful with a probability that depends on the number of sites making the attempt, in which case the new site will also be occupied. A benefit is gained when new sites are occupied, but capture attempts are costly. The problem of optimal occupation is formulated as a Markov decision process in which the admissible actions are occupation strategies and the cost is a function of the strategy and the number of occupied sites. A partial order on the state-action pairs is used to obtain a comparison result for stationary policies and qualitative results concerning monotonicity of the value function for the n-stage problem (n ≤ ∞). The optimal policies are partially characterized when the cost depends on the action only through the total number of occupation attempts made.

Download Full-text

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017949 ◽

2019 ◽

Vol 33 ◽

pp. 7949-7956

Author(s):

Silviu Pitis

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Value Function ◽

Discount Factor ◽

Theoretic Approach ◽

Decision Theoretic Approach ◽

Markov Decision ◽

The Value Function

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor γ

Download Full-text