scholarly journals Decentralized Offloading Strategies Based on Reinforcement Learning for Multi-Access Edge Computing

Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 343
Author(s):  
Chunyang Hu ◽  
Jingchen Li ◽  
Haobin Shi ◽  
Bin Ning ◽  
Qiong Gu

Using reinforcement learning technologies to learn offloading strategies for multi-access edge computing systems has been developed by researchers. However, large-scale systems are unsuitable for reinforcement learning, due to their huge state spaces and offloading behaviors. For this reason, this work introduces the centralized training and decentralized execution mechanism, designing a decentralized reinforcement learning model for multi-access edge computing systems. Considering a cloud server and several edge servers, we separate the training and execution in the reinforcement learning model. The execution happens in edge devices of the system, and edge servers need no communication. Conversely, the training process occurs at the cloud device, which causes a lower transmission latency. The developed method uses a deep deterministic policy gradient algorithm to optimize offloading strategies. The simulated experiment shows that our method can learn the offloading strategy for each edge device efficiently.

2021 ◽  
Vol 20 (6) ◽  
pp. 1-33
Author(s):  
Kaustabha Ray ◽  
Ansuman Banerjee

Multi-Access Edge Computing (MEC) has emerged as a promising new paradigm allowing low latency access to services deployed on edge servers to avert network latencies often encountered in accessing cloud services. A key component of the MEC environment is an auto-scaling policy which is used to decide the overall management and scaling of container instances corresponding to individual services deployed on MEC servers to cater to traffic fluctuations. In this work, we propose a Safe Reinforcement Learning (RL)-based auto-scaling policy agent that can efficiently adapt to traffic variations to ensure adherence to service specific latency requirements. We model the MEC environment using a Markov Decision Process (MDP). We demonstrate how latency requirements can be formally expressed in Linear Temporal Logic (LTL). The LTL specification acts as a guide to the policy agent to automatically learn auto-scaling decisions that maximize the probability of satisfying the LTL formula. We introduce a quantitative reward mechanism based on the LTL formula to tailor service specific latency requirements. We prove that our reward mechanism ensures convergence of standard Safe-RL approaches. We present experimental results in practical scenarios on a test-bed setup with real-world benchmark applications to show the effectiveness of our approach in comparison to other state-of-the-art methods in literature. Furthermore, we perform extensive simulated experiments to demonstrate the effectiveness of our approach in large scale scenarios.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Chit Wutyee Zaw ◽  
Shashi Raj Pandey ◽  
Kitae Kim ◽  
Choong Seon Hong

2020 ◽  
Author(s):  
Ben Lonnqvist ◽  
Micha Elsner ◽  
Amelia R. Hunt ◽  
Alasdair D F Clarke

Experiments on the efficiency of human search sometimes reveal large differences between individual participants. We argue that reward-driven task-specific learning may account for some of this variation. In a computational reinforcement learning model of this process, a wide variety of strategies emerge, despite all simulated participants having the same visual acuity. We conduct a visual search experiment, and replicate previous findings that participant preferences about where to search are highly varied, with a distribution comparable to the simulated results. Thus, task-specific learning is an under-explored mechanism by which large inter-participant differences can arise.


Sign in / Sign up

Export Citation Format

Share Document