Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network. In the proposed algorithm, agents hold local estimations of the global state and joint actions and update them with local observations and the messages received from neighbors. Under the hypothesis of the global value decomposition, the gradient of the global objective function to an individual agent is derived. The convergence of the proposed algorithm with linear function approximation is guaranteed according to the stochastic approximation theory. In the experiments, the proposed algorithm was applied to a passive location task multiagent environment and achieved superior performance compared to state-of-the-art algorithms.

Download Full-text

Optimising Performance for NB-IoT UE Devices through Data Driven Models

Journal of Sensor and Actuator Networks ◽

10.3390/jsan10010021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 21

Author(s):

Omar Nassef ◽

Toktam Mahmoodi ◽

Foivos Michelinakis ◽

Kashif Mahmood ◽

Ahmed Elmokashfi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Gradient Descent ◽

Deep Neural Network ◽

Narrow Band ◽

Learning Algorithm ◽

Base Station ◽

User Equipment ◽

Data Driven ◽

Superior Performance

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.

Download Full-text

Intelligent Ramp Control for Incident Response Using Dyna-QArchitecture

Mathematical Problems in Engineering ◽

10.1155/2015/896943 ◽

2015 ◽

Vol 2015 ◽

pp. 1-16

Author(s):

Chao Lu ◽

Yanan Zhao ◽

Jianwei Gong

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Single Agent ◽

Superior Performance ◽

Model Free ◽

Road Users ◽

Total Travel Time ◽

The Uk ◽

Traffic Operation ◽

Ramp Control

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.

Download Full-text

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

IEEE Access ◽

10.1109/access.2020.3043806 ◽

2020 ◽

Vol 8 ◽

pp. 223743-223755

Author(s):

Yuannan Jiang ◽

Fuxiao Tan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Cooperative Game ◽

Learning Algorithm ◽

Model Free ◽

Multi Agent ◽

Reinforcement Learning Algorithm

Download Full-text

FMRQ—A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2016.2544866 ◽

2017 ◽

Vol 47 (6) ◽

pp. 1367-1379 ◽

Cited By ~ 27

Author(s):

Zhen Zhang ◽

Dongbin Zhao ◽

Junwei Gao ◽

Dongqing Wang ◽

Yujie Dai

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Cooperative Tasks ◽

Multiagent Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

Journal of Artificial Intelligence Research ◽

10.1613/jair.1666 ◽

2005 ◽

Vol 24 ◽

pp. 81-108 ◽

Cited By ~ 65

Author(s):

P. Geibel ◽

F. Wysotzki

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Optimal Solution ◽

Feed Tank ◽

Model Free ◽

Constrained Problem ◽

Risk Sensitive ◽

Markov Decision ◽

The Value Function

In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.

Download Full-text

A multiagent reinforcement learning algorithm to solve the maximum independent set problem

Multiagent and Grid Systems ◽

10.3233/mgs-200323 ◽

2020 ◽

Vol 16 (1) ◽

pp. 101-115

Author(s):

Mir Mohammad Alipour ◽

Mohsen Abdolhosseinzadeh

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Independent Set ◽

Maximum Independent Set ◽

Independent Set Problem ◽

Multiagent Reinforcement Learning ◽

Maximum Independent Set Problem ◽

Reinforcement Learning Algorithm

Download Full-text

A new multiagent reinforcement learning algorithm to solve the symmetric traveling salesman problem

Multiagent and Grid Systems ◽

10.3233/mgs-150232 ◽

2015 ◽

Vol 11 (2) ◽

pp. 107-119 ◽

Cited By ~ 4

Author(s):

Mir Mohammad Alipour ◽

Seyed Naser Razavi

Keyword(s):

Reinforcement Learning ◽

Traveling Salesman Problem ◽

Learning Algorithm ◽

Traveling Salesman ◽

Multiagent Reinforcement Learning ◽

Symmetric Traveling Salesman Problem ◽

Reinforcement Learning Algorithm

Download Full-text

A Novel Multiagent Reinforcement Learning Algorithm Combination with Quantum Computation

2006 6th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2006.1712835 ◽

2006 ◽

Author(s):

Xiangping Meng ◽

Yu Chen ◽

Yuzhen Pi ◽

Quande Yuan

Keyword(s):

Reinforcement Learning ◽

Quantum Computation ◽

Learning Algorithm ◽

Multiagent Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

FlexPool: A Distributed Model-Free Deep Reinforcement Learning Algorithm for Joint Passengers and Goods Transportation

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2020.3048361 ◽

2021 ◽

pp. 1-13

Author(s):

Kaushik Manchella ◽

Abhishek K. Umrawal ◽

Vaneet Aggarwal

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Distributed Model ◽

Model Free ◽

Reinforcement Learning Algorithm

Download Full-text

Minibatch Recursive Least Squares Q-Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5370281 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Chunyuan Zhang ◽

Qi Song ◽

Zeng Meng

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimization Technique ◽

Recursive Least Squares ◽

Q Learning ◽

Linear Function Approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text