scholarly journals Learning to Model Opponent Learning (Student Abstract)

2020 ◽  
Vol 34 (10) ◽  
pp. 13771-13772
Author(s):  
Ian Davies ◽  
Zheng Tian ◽  
Jun Wang

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

2017 ◽  
Vol 9 (1) ◽  
pp. 1-19
Author(s):  
Deepak Annasaheb Vidhate

This article gives a novel approach to cooperative decision-making algorithms by Joint Action learning for the retail shop application. Accordingly, this approach presents three retailer stores in the retail marketplace. Retailers can help to each other and can obtain profit from cooperation knowledge through learning their own strategies that just stand for their aims and benefit. The vendors are the knowledgeable agents to employ cooperative learning to train in the circumstances. Assuming a significant hypothesis on the vendor's stock policy, restock period, and arrival process of the consumers, the approach was formed as a Markov model. The proposed algorithms learn dynamic consumer performance. Moreover, the article illustrates the results of cooperative reinforcement learning algorithms by joint action learning of three shop agents for the period of one-year sale duration. Two approaches have been compared in the article, i.e. multi-agent Q Learning and joint action learning.


2019 ◽  
Vol 1 (2) ◽  
pp. 590-610
Author(s):  
Zohreh Akbari ◽  
Rainer Unland

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.


2021 ◽  
Author(s):  
Arthur Campbell

Abstract An important task for organizations is establishing truthful communication between parties with differing interests. This task is made particularly challenging when the accuracy of the information is poorly observed or not at all. In these settings, incentive contracts based on the accuracy of information will not be very effective. This paper considers an alternative mechanism that does not require any signal of the accuracy of any information communicated to provide incentives for truthful communication. Rather, an expert sacrifices future participation in decision-making to influence the current period’s decision in favour of their preferred project. This mechanism captures a notion often described as ‘political capital’ whereby an individual is able to achieve their own preferred decision in the current period at the expense of being able to exert influence in future decisions (‘spending political capital’). When the first-best is not possible in this setting, I show that experts hold more influence than under the first-best and that, in a multi-agent extension, a finite team size is optimal. Together these results suggest that a small number of individuals hold excessive influence in organizations.


2020 ◽  
Vol 26 (6) ◽  
pp. 2927-2955
Author(s):  
Mar Palmeros Parada ◽  
Lotte Asveld ◽  
Patricia Osseweijer ◽  
John Alexander Posada

AbstractBiobased production has been promoted as a sustainable alternative to fossil resources. However, controversies over its impact on sustainability highlight societal concerns, value tensions and uncertainties that have not been taken into account during its development. In this work, the consideration of stakeholders’ values in a biorefinery design project is investigated. Value sensitive design (VSD) is a promising approach to the design of technologies with consideration of stakeholders’ values, however, it is not directly applicable for complex systems like biorefineries. Therefore, some elements of VSD, such as the identification of relevant values and their connection to a technology’s features, are brought into biorefinery design practice. Midstream modulation (MM), an approach to promoting the consideration of societal aspects during research and development activities, is applied to promote reflection and value considerations during the design decision making. As result, it is shown that MM interventions during the design process led to new design alternatives in support of stakeholders' values, and allowed to recognize and respond to emerging value tensions within the scope of the project. In this way, the present work shows a novel approach for the technical investigation of VSD, especially for biorefineries. Also, based on this work it is argued that not only reflection, but also flexibility and openness are important for the application of VSD in the context of biorefinery design.


Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 631
Author(s):  
Chunyang Hu

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.


Sign in / Sign up

Export Citation Format

Share Document