A Smart Cache Content Update Policy Based on Deep Reinforcement Learning

This paper proposes a DRL-based cache content update policy in the cache-enabled network to improve the cache hit ratio and reduce the average latency. In contrast to the existing policies, a more practical cache scenario is considered in this work, in which the content requests vary by both time and location. Considering the constraint of the limited cache capacity, the dynamic content update problem is modeled as a Markov decision process (MDP). Besides that, the deep Q-learning network (DQN) algorithm is utilised to solve the MDP problem. Specifically, the neural network is optimised to approximate the Q value where the training data are chosen from the experience replay memory. The DQN agent derives the optimal policy for the cache decision. Compared with the existing policies, the simulation results show that our proposed policy is 56%–64% improved in terms of the cache hit ratio and 56%–59% decreased in terms of the average latency.

Download Full-text

A New Decentralized Approach of Multiagent Cooperative Pursuit Based on the Iterated Elimination of Dominated Strategies Model

Mathematical Problems in Engineering ◽

10.1155/2016/5192423 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Mohammed El Habib Souidi ◽

Songhao Piao

Keyword(s):

Game Theory ◽

Coalition Formation ◽

Decision Process ◽

Pursuit Evasion ◽

Dominated Strategies ◽

Markov Decision ◽

Simulation Results ◽

The Given ◽

Dynamic Formation ◽

Motion Strategy

Game Theory is a promising approach to acquire coalition formations in multiagent systems. This paper is focused on the importance of the distributed computation and the dynamic formation and reformation of pursuit groups in pursuit-evasion problems. In order to address this task, we propose a decentralized coalition formation algorithm based on the Iterated Elimination of Dominated Strategies (IEDS). This Game Theory process is common to solve problems requiring the withdrawal of dominated strategies iteratively. Furthermore, we have used the Markov Decision Process (MDP) principles to control the motion strategy of the agents in the environment. The simulation results demonstrate the feasibility and the validity of the given approach in comparison with different decentralized methods.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Intelligent Stretch Optimization in Information Centric Networking-Based Tactile Internet Applications

Applied Sciences ◽

10.3390/app11167351 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7351

Author(s):

Hussain Ahmad ◽

Muhammad Zubair Islam ◽

Rashid Ali ◽

Amir Haider ◽

Hyungseok Kim

Keyword(s):

Learning Algorithm ◽

Mobile Network ◽

Optimal Routing ◽

Information Centric Networking ◽

Internet Applications ◽

Q Learning ◽

Communication Paradigm ◽

Markov Decision ◽

Data Content ◽

Simulation Results

The fifth-generation (5G) mobile network services are currently being made available for different use case scenarios like enhanced mobile broadband, ultra-reliable and low latency communication, and massive machine-type communication. The ever-increasing data requests from the users have shifted the communication paradigm to be based on the type of the requested data content or the so-called information-centric networking (ICN). The ICN primarily aims to enhance the performance of the network infrastructure in terms of the stretch to opt for the best routing path. Reduction in stretch merely reduces the end-to-end (E2E) latency to ensure the requirements of the 5G-enabled tactile internet (TI) services. The foremost challenge tackled by the ICN-based system is to minimize the stretch while selecting an optimal routing path. Therefore, in this work, a reinforcement learning-based intelligent stretch optimization (ISO) strategy has been proposed to reduce stretch and obtain an optimal routing path in ICN-based systems for the realization of 5G-enabled TI services. A Q-learning algorithm is utilized to explore and exploit the different routing paths within the ICN infrastructure. The problem is designed as a Markov decision process and solved with the help of the Q-learning algorithm. The simulation results indicate that the proposed strategy finds the optimal routing path for the delay-sensitive haptic-driven services of 5G-enabled TI based upon their stretch profile over ICN, such as the augmented reality /virtual reality applications. Moreover, we compare and evaluate the simulation results of propsoed ISO strategy with random routing strategy and history aware routing protocol (HARP). The proposed ISO strategy reduces 33.33% and 33.69% delay as compared to random routing and HARP, respectively. Thus, the proposed strategy suggests an optimal routing path with lesser stretch to minimize the E2E latency.

Download Full-text

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

2019 American Control Conference (ACC) ◽

10.23919/acc.2019.8814849 ◽

2019 ◽

Author(s):

Hyung-Jin Yoon ◽

Donghwan Lee ◽

Naira Hovakimyan

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Markov Decision Process ◽

Decision Process ◽

Hidden Markov ◽

Model Estimation ◽

Q Learning ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Reinforcement Learning with Non-Markovian Rewards

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5814 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3980-3987

Author(s):

Maor Gaon ◽

Ronen Brafman

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Decision Process ◽

Policy Learning ◽

Learning From Experience ◽

World Model ◽

Basic Premise ◽

Q Learning ◽

Markov Decision ◽

Automata Learning

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.

Download Full-text

A Q-learning Approach to a Consumption-Investment Problem

International Journal of Statistics and Probability ◽

10.5539/ijsp.v10n2p110 ◽

2021 ◽

Vol 10 (2) ◽

pp. 110

Author(s):

Ruy Lopez-Rios

Keyword(s):

Machine Learning ◽

Decision Process ◽

Infinite Horizon ◽

Learning Approach ◽

Q Learning ◽

Time Consumption ◽

Investment Problem ◽

Discounted Utility ◽

Learning Technique ◽

Markov Decision

The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulated as a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presents a procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical results of the problem are provided.

Download Full-text

A Q-learning Approach to a Consumption-Investment Problem

International Journal of Statistics and Probability ◽

10.5539/ijsp.v10n2p109 ◽

2021 ◽

Vol 10 (2) ◽

pp. 109

Author(s):

Ruy Lopez-Rios

Keyword(s):

Machine Learning ◽

Decision Process ◽

Infinite Horizon ◽

Learning Approach ◽

Q Learning ◽

Time Consumption ◽

Investment Problem ◽

Discounted Utility ◽

Learning Technique ◽

Markov Decision

Download Full-text

Adaptive Cross-layer Optimization Based on Markov Decision Process

Elektronika ir Elektrotechnika ◽

10.5755/j01.eee.108.2.141 ◽

1970 ◽

Vol 108 (2) ◽

pp. 39-42

Author(s):

Z. Velickovic ◽

M. Jevtovic

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Cross Layer ◽

Cross Layer Design ◽

Protocol Stack ◽

Wireless Multimedia ◽

Average Value ◽

Cross Layer Optimization ◽

Markov Decision ◽

Simulation Results

In order to satisfy QoS demands of wireless multimedia application it is necessary to make an optimization on several ISO-OSI layers in the protocol stack. In this paper an optimization cross-layer algorithm has been applied based on Markov decision process (MDP). The wireless communication system with one user has been optimized by the transmitting policies in order to maximize the throughput along with the optimization of the average value of the engaged power, satisfying the demanded BER and the average value of rejected packets. Simulation results show that the application of cross-layer design based on MDP is justified. Ill. 2, bibl. 9 (in English; abstracts in English and Lithuanian).http://dx.doi.org/10.5755/j01.eee.108.2.141

Download Full-text

A Hierarchical Mapping System for Flat Identifier to Locator Resolution Based on Active Degree

Future Internet ◽

10.3390/fi10080075 ◽

2018 ◽

Vol 10 (8) ◽

pp. 75

Author(s):

Jianqiang Liu ◽

Shuai Huo ◽

Yi Wang

Keyword(s):

Decision Process ◽

Network Architecture ◽

Local Level ◽

Global Level ◽

Ip Address ◽

Mapping System ◽

Markov Decision ◽

Simulation Results ◽

Logical Topology ◽

Different Levels

Overloading of IP address semantics appeals for a new network architecture based on Identifier (ID)/Locator separation. The challenge of Identifier (ID)/Locator separation is how to solve the scalability and efficiency challenges of identity-to-location resolution. By analyzing the requirements of the Identifier (ID)/Locator separation protocol, this paper proposes a hierarchical mapping architecture on active-degree (HMAA). This HMAA was divided into three levels: active local level, neutral transfer level, and inert global level. Each mapping item is dynamically allocated to different levels to ensure minimizing delay according to its activity characteristics. The top layer CHORD is constructed by the Markov Decision Process, which can keep consistency between the physical topology and the logical topology. The simulation results on delay time show that HMAA can satisfy the scalability and efficiency requirements of an Identifier (ID)/Locator separation network.

Download Full-text

Autonomous Penetration Testing Based on Improved Deep Q-Network

Applied Sciences ◽

10.3390/app11198823 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8823

Author(s):

Shicheng Zhou ◽

Jingju Liu ◽

Dongdong Hou ◽

Xiaofeng Zhong ◽

Yue Zhang

Keyword(s):

Large Scale ◽

Expert Knowledge ◽

Action Space ◽

Learning Technology ◽

Space Problem ◽

Penetration Testing ◽

The Neural Network ◽

Markov Decision ◽

Experience Replay ◽

Large Scale Networks

Penetration testing is an effective way to test and evaluate cybersecurity by simulating a cyberattack. However, the traditional methods deeply rely on domain expert knowledge, which requires prohibitive labor and time costs. Autonomous penetration testing is a more efficient and intelligent way to solve this problem. In this paper, we model penetration testing as a Markov decision process problem and use reinforcement learning technology for autonomous penetration testing in large scale networks. We propose an improved deep Q-network (DQN) named NDSPI-DQN to address the sparse reward problem and large action space problem in large-scale scenarios. First, we reasonably integrate five extensions to DQN, including noisy nets, soft Q-learning, dueling architectures, prioritized experience replay, and intrinsic curiosity model to improve the exploration efficiency. Second, we decouple the action and split the estimators of the neural network to calculate two elements of action separately, so as to decrease the action space. Finally, the performance of algorithms is investigated in a range of scenarios. The experiment results demonstrate that our methods have better convergence and scaling performance.

Download Full-text