Keeping in Touch with Collaborative UAVs: A Deep Reinforcement Learning Approach

Effective collaborations among autonomous unmanned aerial vehicles (UAVs) rely on timely information sharing. However, the time-varying flight environment and the intermittent link connectivity pose great challenges to message delivery. In this paper, we leverage the deep reinforcement learning (DRL) technique to address the UAVs' optimal links discovery and selection problem in uncertain environments. As the multi-agent learning efficiency is constrained by the high-dimensional and continuous action spaces, we slice the whole action spaces into a number of tractable fractions to achieve efficient convergences of optimal policies in continuous domains. Moreover, for the nonstationarity issue that particularly challenges the multi-agent DRL with local perceptions, we present a multi-agent mutual sampling method that jointly interacts the intra-agent and inter-agent state-action information to stabilize and expedite the training procedure. We evaluate the proposed algorithm on the UAVs' continuous network connection task. Results show that the associated UAVs can quickly select the optimal connected links, which facilitate the UAVs' teamwork significantly.

Download Full-text

Automated Driving Highway Traffic Merging using Deep Multi-Agent Reinforcement Learning in Continuous State-Action Spaces

10.1109/iv48863.2021.9575676 ◽

2021 ◽

Author(s):

Larry Schester ◽

Luis E. Ortiz

Keyword(s):

Reinforcement Learning ◽

Highway Traffic ◽

Automated Driving ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Action Spaces

Download Full-text

Hierarchical Reinforcement Learning

ACM Computing Surveys ◽

10.1145/3453160 ◽

2021 ◽

Vol 54 (5) ◽

pp. 1-35

Author(s):

Shubham Pateria ◽

Budhitama Subagdja ◽

Ah-hwee Tan ◽

Chai Quek

Keyword(s):

Reinforcement Learning ◽

Future Research ◽

Comprehensive Overview ◽

Open Problems ◽

Practical Applications ◽

Hierarchical Reinforcement Learning ◽

The Past ◽

Agent Learning ◽

Multi Agent ◽

Supplementary Material

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

Download Full-text

A Two-Layer Approach to Developing Self-Adaptive Multi-Agent Systems in Open Environment

International Journal of Agent Technologies and Systems ◽

10.4018/ijats.2014010104 ◽

2014 ◽

Vol 6 (1) ◽

pp. 65-85 ◽

Cited By ~ 2

Author(s):

Xinjun Mao ◽

Menggao Dong ◽

Haibin Zhu

Keyword(s):

Reinforcement Learning ◽

Adaptive Systems ◽

Multi Agent Systems ◽

Uncertain Environments ◽

Implementation Framework ◽

Fine Grain ◽

Adaptation Mechanisms ◽

Multi Agent ◽

Self Adaptation ◽

Self Adaptive

Development of self-adaptive systems situated in open and uncertain environments is a great challenge in the community of software engineering due to the unpredictability of environment changes and the variety of self-adaptation manners. Explicit specification of expected changes and various self-adaptations at design-time, an approach often adopted by developers, seems ineffective. This paper presents an agent-based approach that combines two-layer self-adaptation mechanisms and reinforcement learning together to support the development and running of self-adaptive systems. The approach takes self-adaptive systems as multi-agent organizations and enables the agent itself to make decisions on self-adaptation by learning at run-time and at different levels. The proposed self-adaptation mechanisms that are based on organization metaphors enable self-adaptation at two layers: fine-grain behavior level and coarse-grain organization level. Corresponding reinforcement learning algorithms on self-adaptation are designed and integrated with the two-layer self-adaptation mechanisms. This paper further details developmental technologies, based on the above approach, in establishing self-adaptive systems, including extended software architecture for self-adaptation, an implementation framework, and a development process. A case study and experiment evaluations are conducted to illustrate the effectiveness of the proposed approach.

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Online Tuning of a PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit

Electronics ◽

10.3390/electronics8020231 ◽

2019 ◽

Vol 8 (2) ◽

pp. 231 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Reinforcement Learning ◽

Flow Rate ◽

Pid Controller ◽

Hybrid Control ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Flow Rate Control ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.

Download Full-text

Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6216 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7253-7260 ◽

Cited By ~ 2

Author(s):

Yuhang Song ◽

Andrzej Wojcicki ◽

Thomas Lukasiewicz ◽

Jianyi Wang ◽

Abi Aryan ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Research Community ◽

Learning Agents ◽

General Evaluation ◽

Agent Learning ◽

Multi Agent ◽

Agent Intelligence ◽

Training Schemes ◽

Evaluation Platform

Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/.

Download Full-text

Combat Robot Strategy Adaptation Using Multiple Learning Agents

Volume 4: Dynamics, Control and Uncertainty, Parts A and B ◽

10.1115/imece2012-87521 ◽

2012 ◽

Author(s):

Thomas Recchia ◽

Jae Chung ◽

Kishore Pochiraju

Keyword(s):

Reinforcement Learning ◽

Robotic Systems ◽

Multi Agent System ◽

Learning Agents ◽

Loosely Coupled ◽

Reward Function ◽

Strategy Adaptation ◽

Agent Learning ◽

Multi Agent ◽

Reward Functions

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.

Download Full-text

Multi-agent cooperation Q-learning algorithm based on constrained Markov Game

Computer Science and Information Systems ◽

10.2298/csis191220009g ◽

2020 ◽

Vol 17 (2) ◽

pp. 647-664

Author(s):

Yangyang Ge ◽

Fei Zhu ◽

Wei Huang ◽

Peiyao Zhao ◽

Quan Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent System ◽

Agent System ◽

Action Function ◽

Q Learning ◽

State Action ◽

Markov Game ◽

Safety Constraints ◽

Multi Agent

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in the above areas where an agent may fall into dangerous states that are irreversible, causing great damage. To solve the safety problem, in this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based on Constrained Markov Game. In this method, safety constraints are added to the set of actions, and each agent, when interacting with the environment to search for optimal values, should be restricted by the safety rules, so as to obtain an optimal policy that satisfies the security requirements. Since traditional Multi-Agent reinforcement learning algorithm is no more suitable for the proposed model in this paper, a new solution is introduced for calculating the global optimum state-action function that satisfies the safety constraints. We take advantage of the Lagrange multiplier method to determine the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, under conditions that the state-action function and the constraint function are both differentiable, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

Download Full-text

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Haotian Fu ◽

Hongyao Tang ◽

Jianye Hao ◽

Zihan Lei ◽

Yingfeng Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Q Learning ◽

Challenging Tasks ◽

Discrete Action ◽

Multi Agent ◽

Decentralized Execution ◽

Novel Algorithms ◽

Action Spaces ◽

Different Levels

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text

MAPS: Multi-Agent reinforcement learning-based Portfolio management System.

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/623 ◽

2020 ◽

Author(s):

Jinho Lee ◽

Raehyun Kim ◽

Seok-Won Yi ◽

Jaewoo Kang

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Portfolio Management ◽

Management System ◽

Investment Strategy ◽

Sharpe Ratio ◽

Training Procedure ◽

Learning Methods ◽

Multi Agent ◽

Diversified Portfolio

Generating an investment strategy using advanced deep learning methods in stock markets has recently been a topic of interest. Most existing deep learning methods focus on proposing an optimal model or network architecture by maximizing return. However, these models often fail to consider and adapt to the continuously changing market conditions. In this paper, we propose the Multi-Agent reinforcement learning-based Portfolio management System (MAPS). MAPS is a cooperative system in which each agent is an independent "investor" creating its own portfolio. In the training procedure, each agent is guided to act as diversely as possible while maximizing its own return with a carefully designed loss function. As a result, MAPS as a system ends up with a diversified portfolio. Experiment results with 12 years of US market data show that MAPS outperforms most of the baselines in terms of Sharpe ratio. Furthermore, our results show that adding more agents to our system would allow us to get a higher Sharpe ratio by lowering risk with a more diversified portfolio.

Download Full-text