Horizontal Auto-Scaling for Multi-Access Edge Computing Using Safe Reinforcement Learning

Multi-Access Edge Computing (MEC) has emerged as a promising new paradigm allowing low latency access to services deployed on edge servers to avert network latencies often encountered in accessing cloud services. A key component of the MEC environment is an auto-scaling policy which is used to decide the overall management and scaling of container instances corresponding to individual services deployed on MEC servers to cater to traffic fluctuations. In this work, we propose a Safe Reinforcement Learning (RL)-based auto-scaling policy agent that can efficiently adapt to traffic variations to ensure adherence to service specific latency requirements. We model the MEC environment using a Markov Decision Process (MDP). We demonstrate how latency requirements can be formally expressed in Linear Temporal Logic (LTL). The LTL specification acts as a guide to the policy agent to automatically learn auto-scaling decisions that maximize the probability of satisfying the LTL formula. We introduce a quantitative reward mechanism based on the LTL formula to tailor service specific latency requirements. We prove that our reward mechanism ensures convergence of standard Safe-RL approaches. We present experimental results in practical scenarios on a test-bed setup with real-world benchmark applications to show the effectiveness of our approach in comparison to other state-of-the-art methods in literature. Furthermore, we perform extensive simulated experiments to demonstrate the effectiveness of our approach in large scale scenarios.

Download Full-text

Decentralized Offloading Strategies Based on Reinforcement Learning for Multi-Access Edge Computing

Information ◽

10.3390/info12090343 ◽

2021 ◽

Vol 12 (9) ◽

pp. 343

Author(s):

Chunyang Hu ◽

Jingchen Li ◽

Haobin Shi ◽

Bin Ning ◽

Qiong Gu

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Model ◽

Learning Technologies ◽

Edge Computing ◽

Gradient Algorithm ◽

Computing Systems ◽

Decentralized Execution ◽

Multi Access ◽

Reinforcement Learning Model

Using reinforcement learning technologies to learn offloading strategies for multi-access edge computing systems has been developed by researchers. However, large-scale systems are unsuitable for reinforcement learning, due to their huge state spaces and offloading behaviors. For this reason, this work introduces the centralized training and decentralized execution mechanism, designing a decentralized reinforcement learning model for multi-access edge computing systems. Considering a cloud server and several edge servers, we separate the training and execution in the reinforcement learning model. The execution happens in edge devices of the system, and edge servers need no communication. Conversely, the training process occurs at the cloud device, which causes a lower transmission latency. The developed method uses a deep deterministic policy gradient algorithm to optimize offloading strategies. The simulated experiment shows that our method can learn the offloading strategy for each edge device efficiently.

Download Full-text

Large-Scale Computation Offloading Using a Multi-Agent Reinforcement Learning in Heterogeneous Multi-access Edge Computing

IEEE Transactions on Mobile Computing ◽

10.1109/tmc.2022.3141080 ◽

2022 ◽

pp. 1-1

Author(s):

Zhen Gao ◽

Lei Yang ◽

Yu Dai

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Edge Computing ◽

Computation Offloading ◽

Multi Agent ◽

Multi Access

Download Full-text

Deep reinforcement learning-based resource allocation and seamless handover in multi-access edge computing based on SDN

Knowledge and Information Systems ◽

10.1007/s10115-021-01590-4 ◽

2021 ◽

Author(s):

Chunlin Li ◽

Yong Zhang ◽

Youlong Luo

Keyword(s):

Resource Allocation ◽

Reinforcement Learning ◽

Edge Computing ◽

Seamless Handover ◽

Multi Access

Download Full-text

A Multi-Step Reinforcement Learning Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3611 ◽

2010 ◽

Vol 44-47 ◽

pp. 3611-3615 ◽

Cited By ~ 1

Author(s):

Zhi Cong Zhang ◽

Kai Shun Hu ◽

Hui Yu Huang ◽

Shuai Li ◽

Shao Yong Zhao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Method ◽

Learning Method ◽

K Value ◽

Markov Decision ◽

Action Value

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.

Download Full-text

Reinforcement Learning Applied to a Differential Game

Adaptive Behavior ◽

10.1177/105971239500400102 ◽

1995 ◽

Vol 4 (1) ◽

pp. 3-28 ◽

Cited By ~ 15

Author(s):

Mance E. Harmon ◽

Leemon C. Baird ◽

A. Harry Klopf

Keyword(s):

Reinforcement Learning ◽

Differential Game ◽

Learning Algorithm ◽

Learning System ◽

Test Bed ◽

Linear Quadratic ◽

Time Step ◽

Q Learning ◽

Step Duration ◽

Markov Decision

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.

Download Full-text

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i5.pp2415-2424 ◽

2016 ◽

Vol 6 (5) ◽

pp. 2415

Author(s):

Bahar Asgari ◽

Mostafa Ghobaei Arani ◽

Sam Jabbehdari

Keyword(s):

Service Level Agreement ◽

Service Level ◽

Resource Provisioning ◽

Cloud Services ◽

Computing Environment ◽

Cloud Computing Environment ◽

Sla Violation ◽

Markov Decision ◽

Cloud Environments ◽

Auto Scaling

<p>Cloud services have become more popular among users these days. Automatic resource provisioning for cloud services is one of the important challenges in cloud environments. In the cloud computing environment, resource providers shall offer required resources to users automatically without any limitations. It means whenever a user needs more resources, the required resources should be dedicated to the users without any problems. On the other hand, if resources are more than user’s needs extra resources should be turn off temporarily and turn back on whenever they needed. In this paper, we propose an automatic resource provisioning approach based on reinforcement learning for auto-scaling resources according to Markov Decision Process (MDP). Simulation Results show that the rate of Service Level Agreement (SLA) violation and stability that the proposed approach better performance compared to the similar approaches.</p>

Download Full-text