Learning to Model Opponent Learning (Student Abstract)

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

Download Full-text

Cooperative Multi-Agent Joint Action Learning Algorithm (CMJAL) for Decision Making in Retail Shop Application

International Journal of Agent Technologies and Systems ◽

10.4018/ijats.2017010101 ◽

2017 ◽

Vol 9 (1) ◽

pp. 1-19

Author(s):

Deepak Annasaheb Vidhate

Keyword(s):

Decision Making ◽

Joint Action ◽

Learning Algorithm ◽

Action Learning ◽

Arrival Process ◽

Novel Approach ◽

Multi Agent ◽

Cooperative Decision Making ◽

Stock Policy ◽

Retail Shop

This article gives a novel approach to cooperative decision-making algorithms by Joint Action learning for the retail shop application. Accordingly, this approach presents three retailer stores in the retail marketplace. Retailers can help to each other and can obtain profit from cooperation knowledge through learning their own strategies that just stand for their aims and benefit. The vendors are the knowledgeable agents to employ cooperative learning to train in the circumstances. Assuming a significant hypothesis on the vendor's stock policy, restock period, and arrival process of the consumers, the approach was formed as a Markov model. The proposed algorithms learn dynamic consumer performance. Moreover, the article illustrates the results of cooperative reinforcement learning algorithms by joint action learning of three shop agents for the period of one-year sale duration. Two approaches have been compared in the article, i.e. multi-agent Q Learning and joint action learning.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Real-time Multi-agent-based Decision-making Approach for Dynamic Machine Tool Selection Problem

Proceedings of the 4th International Conference on Computer Science and Application Engineering ◽

10.1145/3424978.3425033 ◽

2020 ◽

Author(s):

Qiong Yan ◽

Haijun Zhang

Keyword(s):

Decision Making ◽

Machine Tool ◽

Real Time ◽

Selection Problem ◽

Tool Selection ◽

Agent Based ◽

Machine Tool Selection ◽

Multi Agent

Download Full-text

Spending Political Capital*

The Economic Journal ◽

10.1093/ej/ueab040 ◽

2021 ◽

Author(s):

Arthur Campbell

Keyword(s):

Decision Making ◽

Incentive Contracts ◽

Important Task ◽

Political Capital ◽

Alternative Mechanism ◽

Team Size ◽

Current Period ◽

Multi Agent ◽

Participation In Decision Making ◽

Number Of Individuals

Abstract An important task for organizations is establishing truthful communication between parties with differing interests. This task is made particularly challenging when the accuracy of the information is poorly observed or not at all. In these settings, incentive contracts based on the accuracy of information will not be very effective. This paper considers an alternative mechanism that does not require any signal of the accuracy of any information communicated to provide incentives for truthful communication. Rather, an expert sacrifices future participation in decision-making to influence the current period’s decision in favour of their preferred project. This mechanism captures a notion often described as ‘political capital’ whereby an individual is able to achieve their own preferred decision in the current period at the expense of being able to exert influence in future decisions (‘spending political capital’). When the first-best is not possible in this setting, I show that experts hold more influence than under the first-best and that, in a multi-agent extension, a finite team size is optimal. Together these results suggest that a small number of individuals hold excessive influence in organizations.

Download Full-text

Research on Decision-making Scheduling Model of Distributed Multi-agent Collaborative Group in Supply Chain Based on Multi-agent System

Proceedings of the 2019 2nd International Conference on Computers in Management and Business - ICCMB 2019 ◽

10.1145/3328886.3328894 ◽

2019 ◽

Author(s):

Zhu Li ◽

Liang Xiao ◽

Yiming Xiao

Keyword(s):

Decision Making ◽

Supply Chain ◽

Collaborative Group ◽

Multi Agent System ◽

Agent System ◽

Scheduling Model ◽

Multi Agent

Download Full-text

Integrating Value Considerations in the Decision Making for the Design of Biorefineries

Science and Engineering Ethics ◽

10.1007/s11948-020-00251-z ◽

2020 ◽

Vol 26 (6) ◽

pp. 2927-2955

Author(s):

Mar Palmeros Parada ◽

Lotte Asveld ◽

Patricia Osseweijer ◽

John Alexander Posada

Keyword(s):

Decision Making ◽

Complex Systems ◽

Research And Development ◽

Design Process ◽

Design Decision ◽

Value Sensitive Design ◽

Design Practice ◽

Novel Approach ◽

Design Alternatives ◽

Biobased Production

AbstractBiobased production has been promoted as a sustainable alternative to fossil resources. However, controversies over its impact on sustainability highlight societal concerns, value tensions and uncertainties that have not been taken into account during its development. In this work, the consideration of stakeholders’ values in a biorefinery design project is investigated. Value sensitive design (VSD) is a promising approach to the design of technologies with consideration of stakeholders’ values, however, it is not directly applicable for complex systems like biorefineries. Therefore, some elements of VSD, such as the identification of relevant values and their connection to a technology’s features, are brought into biorefinery design practice. Midstream modulation (MM), an approach to promoting the consideration of societal aspects during research and development activities, is applied to promote reflection and value considerations during the design decision making. As result, it is shown that MM interventions during the design process led to new design alternatives in support of stakeholders' values, and allowed to recognize and respond to emerging value tensions within the scope of the project. In this way, the present work shows a novel approach for the technical investigation of VSD, especially for biorefineries. Also, based on this work it is argued that not only reflection, but also flexibility and openness are important for the application of VSD in the context of biorefinery design.

Download Full-text

A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System

Symmetry ◽

10.3390/sym12040631 ◽

2020 ◽

Vol 12 (4) ◽

pp. 631

Author(s):

Chunyang Hu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Knowledge Transfer ◽

Large Scale ◽

Effective Control ◽

Small Scale ◽

Learning Agent ◽

Multi Agent ◽

Transfer Method ◽

Parameter Sharing

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Download Full-text