Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6212 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7219-7226

Author(s):

Hangyu Mao ◽

Wulong Liu ◽

Jianye Hao ◽

Jun Luo ◽

Dong Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Football Player ◽

Superior Performance ◽

Human Society ◽

Packet Routing ◽

Q Learning ◽

Cognitive Consistency ◽

Challenging Tasks ◽

Multi Agent

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/430 ◽

2020 ◽

Author(s):

Mohammadamin Barekatain ◽

Ryo Yonetani ◽

Masashi Hamaya

Keyword(s):

Reinforcement Learning ◽

Task Performance ◽

Experimental Evaluation ◽

Control Problems ◽

Target Task ◽

Learning Efficiency ◽

Simulated Environments ◽

Discrete Action ◽

Key Techniques ◽

Action Spaces

Transfer reinforcement learning (RL) aims at improving the learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The demo videos and code are available on the project webpage: https://omron-sinicx.github.io/multipolar/.

Download Full-text

Binary Black-Box Attacks Against Static Malware Detectors with Reinforcement Learning in Discrete Action Spaces

2021 IEEE Security and Privacy Workshops (SPW) ◽

10.1109/spw53761.2021.00021 ◽

2021 ◽

Author(s):

Mohammadreza Ebrahimi ◽

Jason Pacheco ◽

Weifeng Li ◽

James Lee Hu ◽

Hsinchun Chen

Keyword(s):

Reinforcement Learning ◽

Black Box ◽

Discrete Action ◽

Action Spaces

Download Full-text

Energy Optimization of Solar Micro-Grid Using Multi Agent Reinforcement Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.787.843 ◽

2015 ◽

Vol 787 ◽

pp. 843-847

Author(s):

Leo Raju ◽

R.S. Milton ◽

S. Sakthiyanandan

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Learning Method ◽

Solar Pv ◽

Q Learning ◽

Pv Systems ◽

Model Free ◽

Individual Unit ◽

Multi Agent ◽

Micro Grid

In this paper, two solar Photovoltaic (PV) systems are considered; one in the department with capacity of 100 kW and the other in the hostel with capacity of 200 kW. Each one has battery and load. The capital cost and energy savings by conventional methods are compared and it is proved that the energy dependency from grid is reduced in solar micro-grid element, operating in distributed environment. In the smart grid frame work, the grid energy consumption is further reduced by optimal scheduling of the battery, using Reinforcement Learning. Individual unit optimization is done by a model free reinforcement learning method, called Q-Learning and it is compared with distributed operations of solar micro-grid using a Multi Agent Reinforcement Learning method, called Joint Q-Learning. The energy planning is designed according to the prediction of solar PV energy production and observed load pattern of department and the hostel. A simulation model was developed using Python programming.

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Online Tuning of a PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit

Electronics ◽

10.3390/electronics8020231 ◽

2019 ◽

Vol 8 (2) ◽

pp. 231 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Reinforcement Learning ◽

Flow Rate ◽

Pid Controller ◽

Hybrid Control ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Flow Rate Control ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.

Download Full-text

Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces

Journal of Marine Science and Technology ◽

10.1007/s00773-020-00755-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ryohei Sawada ◽

Keiji Sato ◽

Takahiro Majima

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Continuous Action ◽

Ship Collision ◽

Action Spaces

Download Full-text

FUZZY STATE AGGREGATION AND POLICY HILL CLIMBING FOR STOCHASTIC ENVIRONMENTS

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026806001903 ◽

2006 ◽

Vol 06 (03) ◽

pp. 413-428 ◽

Cited By ~ 1

Author(s):

DEAN C. WARDELL ◽

GILBERT L. PETERSON

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Hill Climbing ◽

Individual Agent ◽

Q Learning ◽

Stochastic Environments ◽

State Aggregation ◽

Multi Agent ◽

Fuzzy State ◽

Better Than

Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the fastest policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing and fuzzy state aggregation function approximation is tested in two stochastic environments: Tileworld and the simulated robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning reinforcement learning alone. Results from the multi-agent RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing.

Download Full-text