Deep Reinforcement Learning for Task Offloading and Power Allocation in UAV-assisted MEC System

Mobile edge computing (MEC) can provide computing services for mobile users (MUs) by offloading computing tasks to edge clouds through wireless access networks. Unmanned aerial vehicles (UAVs) are deployed as supplementary edge clouds to provide effective MEC services for MUs with poor wireless communication condition. In this paper, a joint task offloading and power allocation (TOPA) optimization problem is investigated in UAV-assisted MEC system. Since the joint TOPA problem has a strong non-convex characteristic, a method based on deep reinforcement learning is proposed. Specifically, the joint TOPA problem is modeled as Markov decision process. Then, considering the large state space and continuous action space, a twin delayed deep deterministic policy gradient algorithm is proposed. Simulation results show that the proposed scheme has lower smoothing training cost than other optimization methods.

Download Full-text

Task Offloading with Power Control for Mobile Edge Computing Using Reinforcement Learning-Based Markov Decision Process

Mobile Information Systems ◽

10.1155/2020/7630275 ◽

2020 ◽

Vol 2020 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Bingxin Zhang ◽

Guopeng Zhang ◽

Weice Sun ◽

Kun Yang

Keyword(s):

Reinforcement Learning ◽

Power Control ◽

Markov Decision Process ◽

Decision Process ◽

Search Algorithm ◽

Edge Computing ◽

Mobile Edge Computing ◽

Multiple User ◽

Markov Decision ◽

Task Offloading

This paper proposes an efficient computation task offloading mechanism for mobile edge computing (MEC) systems. The studied MEC system consists of multiple user equipment (UEs) and multiple radio interfaces. In order to maximize the number of UEs benefitting from the MEC, the task offloading and power control strategy for a UE is optimized in a joint manner. However, the problem of finding the optimal solution is NP-hard. We then reformulate the problem as a Markov decision process (MDP) and develop a reinforcement learning- (RL-) based algorithm to solve the MDP. Simulation results show that the proposed RL-based algorithm achieves a near-optimal performance compared to the exhaustive search algorithm, and it also outperforms the received signal strength- (RSS-) based method no matter from the standpoint of the system (as it leads to a larger number of beneficial UEs) or an individual (as it generates a lower computation overhead for a UE).

Download Full-text

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Journal of Artificial Intelligence and Technology ◽

10.37965/jait.2021.12003 ◽

2021 ◽

Author(s):

Shuangxia Bai ◽

Shaomei Song ◽

Shiyang Liang ◽

Jianmei Wang ◽

Bo Li ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Simulation Experiment ◽

Gradient Algorithm ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Air Combat ◽

Policy Gradient ◽

Markov Decision ◽

Combat Problems

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.

Download Full-text

Priority-Aware Task Offloading in Vehicular Fog Computing Based on Deep Reinforcement Learning

IEEE Transactions on Vehicular Technology ◽

10.1109/tvt.2020.3041929 ◽

2020 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Jinming Shi ◽

Jun Du ◽

Jingjing Wang ◽

Jian Wang ◽

Jian Yuan

Keyword(s):

Reinforcement Learning ◽

Fog Computing ◽

Task Offloading

Download Full-text

Reinforcement Learning-based Energy-Efficient Power Allocation for Underwater Full-Duplex Relay Network with Energy Harvesting

2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) ◽

10.1109/vtc2020-fall49728.2020.9348634 ◽

2020 ◽

Author(s):

Ranning Wang ◽

Esraa A. Makled ◽

Animesh Yadav ◽

Octavia A. Dobre ◽

Ruiqin Zhao

Keyword(s):

Reinforcement Learning ◽

Energy Harvesting ◽

Power Allocation ◽

Energy Efficient ◽

Full Duplex ◽

Relay Network ◽

Efficient Power ◽

Full Duplex Relay

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Computation offloading through mobile vehicles in IoT-edge-cloud network

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-020-01848-5 ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Jun Long ◽

Yueyi Luo ◽

Xiaoyu Zhu ◽

Entao Luo ◽

Mingfeng Huang

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Low Cost ◽

Computation Offloading ◽

Mobile Edge Computing ◽

Challenging Problem ◽

Learning Technique ◽

Cloud Network ◽

The City ◽

Task Offloading

AbstractWith the developing of Internet of Things (IoT) and mobile edge computing (MEC), more and more sensing devices are widely deployed in the smart city. These sensing devices generate various kinds of tasks, which need to be sent to cloud to process. Usually, the sensing devices do not equip with wireless modules, because it is neither economical nor energy saving. Thus, it is a challenging problem to find a way to offload tasks for sensing devices. However, many vehicles are moving around the city, which can communicate with sensing devices in an effective and low-cost way. In this paper, we propose a computation offloading scheme through mobile vehicles in IoT-edge-cloud network. The sensing devices generate tasks and transmit the tasks to vehicles, then the vehicles decide to compute the tasks in the local vehicle, MEC server or cloud center. The computation offloading decision is made based on the utility function of the energy consumption and transmission delay, and the deep reinforcement learning technique is adopted to make decisions. Our proposed method can make full use of the existing infrastructures to implement the task offloading of sensing devices, the experimental results show that our proposed solution can achieve the maximum reward and decrease delay.

Download Full-text

Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles

Applied Sciences ◽

10.3390/app11020546 ◽

2021 ◽

Vol 11 (2) ◽

pp. 546

Author(s):

Jiajia Xie ◽

Rui Zhou ◽

Yuan Liu ◽

Jun Luo ◽

Shaorong Xie ◽

...

Keyword(s):

Reinforcement Learning ◽

Formation Control ◽

Rapid Development ◽

Gradient Algorithm ◽

Robot System ◽

Physical Relationship ◽

Unmanned Surface Vehicles ◽

Main Challenge ◽

Control Scheme ◽

Multi Robot

The high performance and efficiency of multiple unmanned surface vehicles (multi-USV) promote the further civilian and military applications of coordinated USV. As the basis of multiple USVs’ cooperative work, considerable attention has been spent on developing the decentralized formation control of the USV swarm. Formation control of multiple USV belongs to the geometric problems of a multi-robot system. The main challenge is the way to generate and maintain the formation of a multi-robot system. The rapid development of reinforcement learning provides us with a new solution to deal with these problems. In this paper, we introduce a decentralized structure of the multi-USV system and employ reinforcement learning to deal with the formation control of a multi-USV system in a leader–follower topology. Therefore, we propose an asynchronous decentralized formation control scheme based on reinforcement learning for multiple USVs. First, a simplified USV model is established. Simultaneously, the formation shape model is built to provide formation parameters and to describe the physical relationship between USVs. Second, the advantage deep deterministic policy gradient algorithm (ADDPG) is proposed. Third, formation generation policies and formation maintenance policies based on the ADDPG are proposed to form and maintain the given geometry structure of the team of USVs during movement. Moreover, three new reward functions are designed and utilized to promote policy learning. Finally, various experiments are conducted to validate the performance of the proposed formation control scheme. Simulation results and contrast experiments demonstrate the efficiency and stability of the formation control scheme.

Download Full-text

QoE-driven Task Offloading with Deep Reinforcement Learning in Edge intelligent IoV

GLOBECOM 2020 - 2020 IEEE Global Communications Conference ◽

10.1109/globecom42002.2020.9348050 ◽

2020 ◽

Author(s):

Xiaoming He ◽

Haodong Lu ◽

Yingchi Mao ◽

Kun Wang

Keyword(s):

Reinforcement Learning ◽

Task Offloading

Download Full-text

Sustainable Task Offloading in UAV Networks via Multi-Agent Reinforcement Learning

IEEE Transactions on Vehicular Technology ◽

10.1109/tvt.2021.3074304 ◽

2021 ◽

pp. 1-1

Author(s):

Alessio Sacco ◽

Flavio Esposito ◽

Guido Marchetto ◽

Paolo Montuschi

Keyword(s):

Reinforcement Learning ◽

Multi Agent ◽

Task Offloading

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text