scholarly journals An Offloading Algorithm based on Markov Decision Process in Mobile Edge Computing System

Author(s):  
Bingxin Yao ◽  
Bin Wu ◽  
Siyun Wu ◽  
Yin Ji ◽  
Danggui Chen ◽  
...  

In this paper, an offloading algorithm based on Markov Decision Process (MDP) is proposed to solve the multi-objective offloading decision problem in Mobile Edge Computing (MEC) system. The feature of the algorithm is that MDP is used to make offloading decision. The number of tasks in the task queue, the number of accessible edge clouds and Signal-Noise-Ratio (SNR) of the wireless channel are taken into account in the state space of the MDP model. The offloading delay and energy consumption are considered to define the value function of the MDP model, i.e. the objective function. To maximize the value function, Value Iteration Algorithm is used to obtain the optimal offloading policy. According to the policy, tasks of mobile terminals (MTs) are offloaded to the edge cloud or central cloud, or executed locally. The simulation results show that the proposed algorithm can effectively reduce the offloading delay and energy consumption.

Electronics ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 190
Author(s):  
Wu Ouyang ◽  
Zhigang Chen ◽  
Jia Wu ◽  
Genghua Yu ◽  
Heng Zhang

As transportation becomes more convenient and efficient, users move faster and faster. When a user leaves the service range of the original edge server, the original edge server needs to migrate the tasks offloaded by the user to other edge servers. An effective task migration strategy needs to fully consider the location of users, the load status of edge servers, and energy consumption, which make designing an effective task migration strategy a challenge. In this paper, we innovatively proposed a mobile edge computing (MEC) system architecture consisting of multiple smart mobile devices (SMDs), multiple unmanned aerial vehicle (UAV), and a base station (BS). Moreover, we establish the model of the Markov decision process with unknown rewards (MDPUR) based on the traditional Markov decision process (MDP), which comprehensively considers the three aspects of the migration distance, the residual energy status of the UAVs, and the load status of the UAVs. Based on the MDPUR model, we propose a advantage-based value iteration (ABVI) algorithm to obtain the effective task migration strategy, which can help the UAV group to achieve load balancing and reduce the total energy consumption of the UAV group under the premise of ensuring user service quality. Finally, the results of simulation experiments show that the ABVI algorithm is effective. In particular, the ABVI algorithm has better performance than the traditional value iterative algorithm. And in a dynamic environment, the ABVI algorithm is also very robust.


2014 ◽  
Vol 46 (01) ◽  
pp. 121-138 ◽  
Author(s):  
Ulrich Rieder ◽  
Marc Wittlinger

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.


2019 ◽  
Vol 27 (3) ◽  
pp. 1272-1288 ◽  
Author(s):  
Shiqiang Wang ◽  
Rahul Urgaonkar ◽  
Murtaza Zafer ◽  
Ting He ◽  
Kevin Chan ◽  
...  

2014 ◽  
Vol 46 (1) ◽  
pp. 121-138 ◽  
Author(s):  
Ulrich Rieder ◽  
Marc Wittlinger

We consider an investment problem where observing and trading are only possible at random times. In addition, we introduce drawdown constraints which require that the investor's wealth does not fall under a prior fixed percentage of its running maximum. The financial market consists of a riskless bond and a stock which is driven by a Lévy process. Moreover, a general utility function is assumed. In this setting we solve the investment problem using a related limsup Markov decision process. We show that the value function can be characterized as the unique fixed point of the Bellman equation and verify the existence of an optimal stationary policy. Under some mild assumptions the value function can be approximated by the value function of a contracting Markov decision process. We are able to use Howard's policy improvement algorithm for computing the value function as well as an optimal policy. These results are illustrated in a numerical example.


2020 ◽  
Vol 2020 ◽  
pp. 1-6 ◽  
Author(s):  
Bingxin Zhang ◽  
Guopeng Zhang ◽  
Weice Sun ◽  
Kun Yang

This paper proposes an efficient computation task offloading mechanism for mobile edge computing (MEC) systems. The studied MEC system consists of multiple user equipment (UEs) and multiple radio interfaces. In order to maximize the number of UEs benefitting from the MEC, the task offloading and power control strategy for a UE is optimized in a joint manner. However, the problem of finding the optimal solution is NP-hard. We then reformulate the problem as a Markov decision process (MDP) and develop a reinforcement learning- (RL-) based algorithm to solve the MDP. Simulation results show that the proposed RL-based algorithm achieves a near-optimal performance compared to the exhaustive search algorithm, and it also outperforms the received signal strength- (RSS-) based method no matter from the standpoint of the system (as it leads to a larger number of beneficial UEs) or an individual (as it generates a lower computation overhead for a UE).


Author(s):  
Guisong Yang ◽  
Ling Hou ◽  
Xingyu He ◽  
Daojing He ◽  
Sammy Chan ◽  
...  

1993 ◽  
Vol 7 (3) ◽  
pp. 369-385 ◽  
Author(s):  
Kyle Siegrist

We consider N sites (N ≤ ∞), each of which may be either occupied or unoccupied. Time is discrete, and at each time unit a set of occupied sites may attempt to capture a previously unoccupied site. The attempt will be successful with a probability that depends on the number of sites making the attempt, in which case the new site will also be occupied. A benefit is gained when new sites are occupied, but capture attempts are costly. The problem of optimal occupation is formulated as a Markov decision process in which the admissible actions are occupation strategies and the cost is a function of the strategy and the number of occupied sites. A partial order on the state-action pairs is used to obtain a comparison result for stationary policies and qualitative results concerning monotonicity of the value function for the n-stage problem (n ≤ ∞). The optimal policies are partially characterized when the cost depends on the action only through the total number of occupation attempts made.


Author(s):  
Silviu Pitis

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor γ


Sign in / Sign up

Export Citation Format

Share Document