Multi-Agent Actor Critic for Channel Allocation in Heterogeneous Networks

Author(s):  
Nan Zhao ◽  
Zehua Liu ◽  
Yiqiang Cheng ◽  
Chao Tian

Heterogeneous networks (HetNets) can equalize traffic loads and cut down the cost of deploying cells. Thus, it is regarded to be the significant technique of the next-generation communication networks. Due to the non-convexity nature of the channel allocation problem in HetNets, it is difficult to design an optimal approach for allocating channels. To ensure the user quality of service as well as the long-term total network utility, this article proposes a new method through utilizing multi-agent reinforcement learning. Moreover, for the purpose of solving computational complexity problem caused by the large action space, deep reinforcement learning is put forward to learn optimal policy. A nearly-optimal solution with high efficiency and rapid convergence speed could be obtained by this learning method. Simulation results reveal that this new method has the best performance than other methods.

Author(s):  
Kazuteru Miyazaki ◽  
Koudai Furukawa ◽  
Hiroaki Kobayashi ◽  
◽  
◽  
...  

When multiple agents learn a task simultaneously in an environment, the learning results often become unstable. This problem is known as the concurrent learning problem and to date, several methods have been proposed to resolve it. In this paper, we propose a new method that incorporates expected failure probability (EFP) into the action selection strategy to give agents a kind of mutual adaptability. The effectiveness of the proposed method is confirmed using Keepaway task.


Author(s):  
Nan Zhao ◽  
Chao Tian ◽  
Menglin Fan ◽  
Minghu Wu ◽  
Xiao He ◽  
...  

Heterogeneous cellular networks can balance mobile video loads and reduce cell arrangement costs, which is an important technology of future mobile video communication networks. Because of the characteristics of non-convexity of the mobile offloading problem, the design of the optimal strategy is an essential issue. For the sake of ensuring users' quality of service and the long-term overall network utility, this article proposes the distributive optimal method by means of multiple agent reinforcement learning in the downlink heterogeneous cellular networks. In addition, to solve the computational load issue generated by the large action space, deep reinforcement learning is introduced to gain the optimal policy. The learning policy can provide a near-optimal solution efficiently with a fast convergence speed. Simulation results show that the proposed approach is more efficient at improving the performance than the Q-learning method.


2020 ◽  
Vol 17 (2) ◽  
pp. 619-646
Author(s):  
Chao Wang ◽  
Xing Qiu ◽  
Hui Liu ◽  
Dan Li ◽  
Kaiguang Zhao ◽  
...  

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in the above areas where an agent may fall into dangerous states that are irreversible, causing great damage. To solve the safety problem, in this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based on Constrained Markov Game. In this method, safety constraints are added to the set of actions, and each agent, when interacting with the environment to search for optimal values, should be restricted by the safety rules, so as to obtain an optimal policy that satisfies the security requirements. Since traditional Multi-Agent reinforcement learning algorithm is no more suitable for the proposed model in this paper, a new solution is introduced for calculating the global optimum state-action function that satisfies the safety constraints. We take advantage of the Lagrange multiplier method to determine the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, under conditions that the state-action function and the constraint function are both differentiable, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.


2020 ◽  
Vol 34 (05) ◽  
pp. 7160-7168
Author(s):  
Benjamin Freed ◽  
Guillaume Sartoretti ◽  
Jiaheng Hu ◽  
Howie Choset

This work focuses on multi-agent reinforcement learning (RL) with inter-agent communication, in which communication is differentiable and optimized through backpropagation. Such differentiable approaches tend to converge more quickly to higher-quality policies compared to techniques that treat communication as actions in a traditional RL framework. However, modern communication networks (e.g., Wi-Fi or Bluetooth) rely on discrete communication channels, for which existing differentiable approaches that consider real-valued messages cannot be directly applied, or require biased gradient estimators. Some works have overcome this problem by treating the message space as an extension of the action space, and use standard RL to optimize message selection, but these methods tend to converge slower and to inferior policies. In this paper, we propose a stochastic message encoding/decoding procedure that makes a discrete communication channel mathematically equivalent to an analog channel with additive noise, through which gradients can be backpropagated. Additionally, we introduce an encryption step for use in noisy channels that forces channel noise to be message-independent, allowing us to compute unbiased derivative estimates even in the presence of unknown channel noise. To the best of our knowledge, this work presents the first differentiable communication learning approach that can compute unbiased derivatives through channels with unknown noise. We demonstrate the effectiveness of our approach in two example multi-robot tasks: a path finding and a collaborative search problem. There, we show that our approach achieves learning speed and performance similar to differentiable communication learning with real-valued messages (i.e., unlimited communication bandwidth), while naturally handling more realistic real-world communication constraints. Content Areas: Multi-Agent Communication, Reinforcement Learning.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Qianhong Cong ◽  
Wenhui Lang

We consider the problem of dynamic multichannel access for transmission maximization in multiuser wireless communication networks. The objective is to find a multiuser strategy that maximizes global channel utilization with a low collision in a centralized manner without any prior knowledge. Obtaining an optimal solution for centralized dynamic multichannel access is an extremely difficult problem due to the large-state and large-action space. To tackle this problem, we develop a centralized dynamic multichannel access framework based on double deep recurrent Q-network. The centralized node first maps current state directly to channel assignment actions, which can overcome prohibitive computation compared with reinforcement learning. Then, the centralized node can be easy to select multiple channels by maximizing the sum of value functions based on a trained neural network. Finally, the proposed method avoids collisions between secondary users through centralized allocation policy.


Sign in / Sign up

Export Citation Format

Share Document