scholarly journals A Smart Cache Content Update Policy Based on Deep Reinforcement Learning

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Lincan Li ◽  
Chiew Foong Kwong ◽  
Qianyu Liu ◽  
Jing Wang

This paper proposes a DRL-based cache content update policy in the cache-enabled network to improve the cache hit ratio and reduce the average latency. In contrast to the existing policies, a more practical cache scenario is considered in this work, in which the content requests vary by both time and location. Considering the constraint of the limited cache capacity, the dynamic content update problem is modeled as a Markov decision process (MDP). Besides that, the deep Q-learning network (DQN) algorithm is utilised to solve the MDP problem. Specifically, the neural network is optimised to approximate the Q value where the training data are chosen from the experience replay memory. The DQN agent derives the optimal policy for the cache decision. Compared with the existing policies, the simulation results show that our proposed policy is 56%–64% improved in terms of the cache hit ratio and 56%–59% decreased in terms of the average latency.

2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Mohammed El Habib Souidi ◽  
Songhao Piao

Game Theory is a promising approach to acquire coalition formations in multiagent systems. This paper is focused on the importance of the distributed computation and the dynamic formation and reformation of pursuit groups in pursuit-evasion problems. In order to address this task, we propose a decentralized coalition formation algorithm based on the Iterated Elimination of Dominated Strategies (IEDS). This Game Theory process is common to solve problems requiring the withdrawal of dominated strategies iteratively. Furthermore, we have used the Markov Decision Process (MDP) principles to control the motion strategy of the agents in the environment. The simulation results demonstrate the feasibility and the validity of the given approach in comparison with different decentralized methods.


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


2021 ◽  
Vol 11 (16) ◽  
pp. 7351
Author(s):  
Hussain Ahmad ◽  
Muhammad Zubair Islam ◽  
Rashid Ali ◽  
Amir Haider ◽  
Hyungseok Kim

The fifth-generation (5G) mobile network services are currently being made available for different use case scenarios like enhanced mobile broadband, ultra-reliable and low latency communication, and massive machine-type communication. The ever-increasing data requests from the users have shifted the communication paradigm to be based on the type of the requested data content or the so-called information-centric networking (ICN). The ICN primarily aims to enhance the performance of the network infrastructure in terms of the stretch to opt for the best routing path. Reduction in stretch merely reduces the end-to-end (E2E) latency to ensure the requirements of the 5G-enabled tactile internet (TI) services. The foremost challenge tackled by the ICN-based system is to minimize the stretch while selecting an optimal routing path. Therefore, in this work, a reinforcement learning-based intelligent stretch optimization (ISO) strategy has been proposed to reduce stretch and obtain an optimal routing path in ICN-based systems for the realization of 5G-enabled TI services. A Q-learning algorithm is utilized to explore and exploit the different routing paths within the ICN infrastructure. The problem is designed as a Markov decision process and solved with the help of the Q-learning algorithm. The simulation results indicate that the proposed strategy finds the optimal routing path for the delay-sensitive haptic-driven services of 5G-enabled TI based upon their stretch profile over ICN, such as the augmented reality /virtual reality applications. Moreover, we compare and evaluate the simulation results of propsoed ISO strategy with random routing strategy and history aware routing protocol (HARP). The proposed ISO strategy reduces 33.33% and 33.69% delay as compared to random routing and HARP, respectively. Thus, the proposed strategy suggests an optimal routing path with lesser stretch to minimize the E2E latency.


2020 ◽  
Vol 34 (04) ◽  
pp. 3980-3987
Author(s):  
Maor Gaon ◽  
Ronen Brafman

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.


2021 ◽  
Vol 10 (2) ◽  
pp. 110
Author(s):  
Ruy Lopez-Rios

The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulated as a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presents a procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical results of the problem are provided.


2021 ◽  
Vol 10 (2) ◽  
pp. 109
Author(s):  
Ruy Lopez-Rios

The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulated as a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presents a procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical results of the problem are provided.


1970 ◽  
Vol 108 (2) ◽  
pp. 39-42
Author(s):  
Z. Velickovic ◽  
M. Jevtovic

In order to satisfy QoS demands of wireless multimedia application it is necessary to make an optimization on several ISO-OSI layers in the protocol stack. In this paper an optimization cross-layer algorithm has been applied based on Markov decision process (MDP). The wireless communication system with one user has been optimized by the transmitting policies in order to maximize the throughput along with the optimization of the average value of the engaged power, satisfying the demanded BER and the average value of rejected packets. Simulation results show that the application of cross-layer design based on MDP is justified. Ill. 2, bibl. 9 (in English; abstracts in English and Lithuanian).http://dx.doi.org/10.5755/j01.eee.108.2.141


2018 ◽  
Vol 10 (8) ◽  
pp. 75
Author(s):  
Jianqiang Liu ◽  
Shuai Huo ◽  
Yi Wang

Overloading of IP address semantics appeals for a new network architecture based on Identifier (ID)/Locator separation. The challenge of Identifier (ID)/Locator separation is how to solve the scalability and efficiency challenges of identity-to-location resolution. By analyzing the requirements of the Identifier (ID)/Locator separation protocol, this paper proposes a hierarchical mapping architecture on active-degree (HMAA). This HMAA was divided into three levels: active local level, neutral transfer level, and inert global level. Each mapping item is dynamically allocated to different levels to ensure minimizing delay according to its activity characteristics. The top layer CHORD is constructed by the Markov Decision Process, which can keep consistency between the physical topology and the logical topology. The simulation results on delay time show that HMAA can satisfy the scalability and efficiency requirements of an Identifier (ID)/Locator separation network.


2021 ◽  
Vol 11 (19) ◽  
pp. 8823
Author(s):  
Shicheng Zhou ◽  
Jingju Liu ◽  
Dongdong Hou ◽  
Xiaofeng Zhong ◽  
Yue Zhang

Penetration testing is an effective way to test and evaluate cybersecurity by simulating a cyberattack. However, the traditional methods deeply rely on domain expert knowledge, which requires prohibitive labor and time costs. Autonomous penetration testing is a more efficient and intelligent way to solve this problem. In this paper, we model penetration testing as a Markov decision process problem and use reinforcement learning technology for autonomous penetration testing in large scale networks. We propose an improved deep Q-network (DQN) named NDSPI-DQN to address the sparse reward problem and large action space problem in large-scale scenarios. First, we reasonably integrate five extensions to DQN, including noisy nets, soft Q-learning, dueling architectures, prioritized experience replay, and intrinsic curiosity model to improve the exploration efficiency. Second, we decouple the action and split the estimators of the neural network to calculate two elements of action separately, so as to decrease the action space. Finally, the performance of algorithms is investigated in a range of scenarios. The experiment results demonstrate that our methods have better convergence and scaling performance.


Sign in / Sign up

Export Citation Format

Share Document