Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning

Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a content centric network. Power control and optimal scheduling can significantly improve the wireless multicast network’s performance under fading. However, the model-based approaches for power control and scheduling studied earlier are not scalable to large state spaces or changing system dynamics. In this paper, we use deep reinforcement learning, where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learned for reasonably large systems via this approach. Further, we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-time scale approach to simultaneously learn the optimal queuing strategy along with power control. We demonstrate the scalability, tracking and cross-layer optimization capabilities of our algorithms via simulations. The proposed multi-time scale approach can be used in general large state-space dynamical systems with multiple objectives and constraints, and may be of independent interest.

Download Full-text

Graphical Minimax Game and Off-Policy Reinforcement Learning for Heterogeneous MASs with Spanning Tree Condition

Guidance, Navigation and Control ◽

10.1142/s2737480721500114 ◽

2021 ◽

pp. 2150011

Author(s):

Wei Dong ◽

Jianan Wang ◽

Chunyan Wang ◽

Zhenqiang Qi ◽

Zhengtao Ding

Keyword(s):

Reinforcement Learning ◽

Spanning Tree ◽

Learning Algorithm ◽

Control Policy ◽

Game Problem ◽

Algebraic Riccati Equation ◽

Multi Agent Systems ◽

Rank Condition ◽

Minimax Game ◽

Tree Condition

In this paper, the optimal consensus control problem is investigated for heterogeneous linear multi-agent systems (MASs) with spanning tree condition based on game theory and reinforcement learning. First, the graphical minimax game algebraic Riccati equation (ARE) is derived by converting the consensus problem into a zero-sum game problem between each agent and its neighbors. The asymptotic stability and minimax validation of the closed-loop systems are proved theoretically. Then, a data-driven off-policy reinforcement learning algorithm is proposed to online learn the optimal control policy without the information of the system dynamics. A certain rank condition is established to guarantee the convergence of the proposed algorithm to the unique solution of the ARE. Finally, the effectiveness of the proposed method is demonstrated through a numerical simulation.

Download Full-text

Resource Allocation and Power Control Policy for Device-to-Device Communication Using Multi-Agent Reinforcement Learning

Computers Materials & Continua ◽

10.32604/cmc.2020.09130 ◽

2020 ◽

Vol 63 (3) ◽

pp. 1515-1532

Author(s):

Yifei Wei ◽

Yinxiang Qu ◽

Min Zhao ◽

Lianping Zhang ◽

F. Richard Yu

Keyword(s):

Resource Allocation ◽

Reinforcement Learning ◽

Power Control ◽

Control Policy ◽

Device To Device ◽

Multi Agent ◽

Device To Device Communication

Download Full-text

Deep Reinforcement Learning Based Power Control for Wireless Multicast Systems

2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2019.8919748 ◽

2019 ◽

Author(s):

Ramkumar Raghu ◽

Pratheek Upadhyaya ◽

Mahadesh Panju ◽

Vaneet Agarwal ◽

Vinod Sharma

Keyword(s):

Reinforcement Learning ◽

Power Control ◽

Wireless Multicast

Download Full-text

A Pre-Trained Fuzzy Reinforcement Learning Method for the Pursuing Satellite in a One-to-One Game in Space

Sensors ◽

10.3390/s20082253 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2253

Author(s):

Xiao Wang ◽

Peng Shi ◽

Yushan Zhao ◽

Yue Sun

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Fuzzy Inference ◽

Learning Algorithm ◽

Control Policy ◽

Descent Method ◽

Gradient Descent Method ◽

One To One ◽

Inference Systems ◽

First Time

In order to help the pursuer find its advantaged control policy in a one-to-one game in space, this paper proposes an innovative pre-trained fuzzy reinforcement learning algorithm, which is conducted in the x, y, and z channels separately. Compared with the previous algorithms applied in ground games, this is the first time reinforcement learning has been introduced to help the pursuer in space optimize its control policy. The known part of the environment is utilized to help the pursuer pre-train its consequent set before learning. An actor-critic framework is built in each moving channel of the pursuer. The consequent set of the pursuer is updated through the gradient descent method in fuzzy inference systems. The numerical experimental results validate the effectiveness of the proposed algorithm in improving the game ability of the pursuer.

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text

Fast reinforcement learning algorithm for mobile power control in cellular communication systems

1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation ◽

10.1109/icsmc.1997.633277 ◽

2002 ◽

Cited By ~ 4

Author(s):

X.Z. Gao ◽

X.M. Gao ◽

S.J. Ovaska

Keyword(s):

Reinforcement Learning ◽

Power Control ◽

Communication Systems ◽

Learning Algorithm ◽

Cellular Communication ◽

Cellular Communication Systems ◽

Reinforcement Learning Algorithm

Download Full-text

Maximum Power Point Tracking of Photovoltaic System Based on Reinforcement Learning

Sensors ◽

10.3390/s19225054 ◽

2019 ◽

Vol 19 (22) ◽

pp. 5054 ◽

Cited By ~ 9

Author(s):

Chou ◽

Yang ◽

Chen

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Maximum Power Point Tracking ◽

Average Power ◽

Maximum Power ◽

Photovoltaic System ◽

Maximum Power Point ◽

Point Tracking ◽

Power Point Tracking ◽

Power Point

The maximum power point tracking (MPPT) technique is often used in photovoltaic (PV) systems to extract the maximum power in various environmental conditions. The perturbation and observation (P&O) method is one of the most well-known MPPT methods; however, it may face problems of large oscillations around maximum power point (MPP) or low-tracking efficiency. In this paper, two reinforcement learning-based maximum power point tracking (RL MPPT) methods are proposed by the use of the Q-learning algorithm. One constructs the Q-table and the other adopts the Q-network. These two proposed methods do not require the information of an actual PV module in advance and can track the MPP through offline training in two phases, the learning phase and the tracking phase. From the experimental results, both the reinforcement learning-based Q-table maximum power point tracking (RL-QT MPPT) and the reinforcement learning-based Q-network maximum power point tracking (RL-QN MPPT) methods have smaller ripples and faster tracking speeds when compared with the P&O method. In addition, for these two proposed methods, the RL-QT MPPT method performs with smaller oscillation and the RL-QN MPPT method achieves higher average power.

Download Full-text

Training Champion-level Race Car Drivers Using Deep Reinforcement Learning

10.21203/rs.3.rs-795954/v1 ◽

2021 ◽

Author(s):

Peter Wurman ◽

Samuel Barrett ◽

Kenta Kawamoto ◽

James MacGlashan ◽

Kaushik Subramanian ◽

...

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Learning Algorithm ◽

Integrated Control ◽

Extreme Case ◽

Control Policy ◽

Reward Function ◽

Race Car ◽

Model Free ◽

Race Car Drivers

Abstract Many potential applications of artificial intelligence involve making real-time decisions in physical systems. Automobile racing represents an extreme case of real-time decision making in close proximity to other highly-skilled drivers while near the limits of vehicular control. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the nonlinear control challenges of real race cars while also encapsulating the complex multi-agent interactions. We attack, and solve for the first time, the simulated racing challenge using model-free deep reinforcement learning. We introduce a novel reinforcement learning algorithm and enhance the learning process with mixed scenario training to encourage the agent to incorporate racing tactics into an integrated control policy. In addition, we construct a reward function that enables the agent to adhere to the sport's under-specified racing etiquette rules. We demonstrate the capabilities of our agent, GT Sophy, by winning two of three races against four of the world's best Gran Turismo drivers and being competitive in the overall team score. By showing that these techniques can be successfully used to train championship-level race car drivers, we open up the possibility of their use in other complex dynamical systems and real-world applications.

Download Full-text

Active Length Control of Shape Memory Alloy Wires Via Reinforcement Learning

Volume 1: Active Materials, Mechanics and Behavior; Modeling, Simulation and Control ◽

10.1115/smasis2009-1430 ◽

2009 ◽

Author(s):

Kenton Kirkpatrick ◽

John Valasek ◽

Dimitris Lagoudas

Keyword(s):

Numerical Simulation ◽

Reinforcement Learning ◽

Shape Memory Alloy ◽

Shape Memory ◽

Learning Algorithm ◽

Control Policy ◽

Active Length ◽

Strain Relationship ◽

Temperature Strain ◽

Reinforcement Learning Algorithm

The ability to actively control the shape of aerospace structures has initiated research regarding the use of Shape Memory Alloy actuators. These actuators can be used for morphing or shape change by controlling their temperature, which is effectively done by applying a voltage difference across their length. The ability to characterize this temperature-strain relationship using Reinforcement Learning has been previously accomplished, but in order to control Shape Memory Alloy wires it is more beneficial to learn the voltage-position relationship. Numerical simulation using Reinforcement Learning has been used for determining the temperature-strain relationship for characterizing the major and minor hysteresis loops, and determining a limited control policy relating applied temperature to desired strain. Since Reinforcement Learning creates a non-parametric control policy, and there is not currently a general parametric model for this control policy, determining the voltage-position relationship for a Shape Memory Alloy is done separately. This paper extends earlier numerical simulation results and experimental results in temperature-strain space by applying a similar Reinforcement Learning algorithm to voltage-position space using an experimental hardware apparatus. Results presented in the paper show the ability to converge on a near-optimal control policy for Shape Memory Alloy length control by means of an improved Reinforcement Learning algorithm. These results demonstrate the power of Reinforcement Learning as a method of constructing a policy capable of controlling Shape Memory Alloy wire length.

Download Full-text

An Artificial Intelligence Approach for Online Optimization of Flexible Manufacturing Systems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.882.96 ◽

2018 ◽

Vol 882 ◽

pp. 96-108 ◽

Cited By ~ 3

Author(s):

Jupiter Bakakeu ◽

Schirin Tolksdorf ◽

Jochen Bauer ◽

Hans-Henning Klos ◽

Jörn Peschke ◽

...

Keyword(s):

Reinforcement Learning ◽

Flexible Manufacturing ◽

Manufacturing Systems ◽

Learning Algorithm ◽

Electricity Consumption ◽

Control Policy ◽

Online Optimization ◽

Energy Prices ◽

Sequential Decision ◽

Time Step

This paper addresses the problem of efficiently operating a flexible manufacturing machine in an electricity micro-grid featuring a high volatility of electricity prices. The problem of finding the optimal control policy is formulated as a sequential decision making problem under uncertainty where, at every time step the uncertainty comes from the lack of knowledge about fu-ture electricity consumption and future weather dependent energy prices. We propose to address this problem using deep reinforcement learning. To this purpose, we designed a deep learning architecture to forecast the load profile of future manufacturing schedule from past production time series. Combined with the forecast of future energy prices, the reinforcement-learning algorithm is trained to perform an online optimization of the production ma-chine in order to reduce the long-term energy costs. The concept is empirical-ly validated on a flexible production machine, where the machine speed can be optimized during the production.

Download Full-text